1
|
Saito N, Chen S, Kitajima K, Zhou Z, Koide Y, Encabo JR, Diaz MGQ, Choi IR, Koyanagi KO, Kishima Y. Phylogenetic analysis of endogenous viral elements in the rice genome reveals local chromosomal evolution in Oryza AA-genome species. FRONTIERS IN PLANT SCIENCE 2023; 14:1261705. [PMID: 37965031 PMCID: PMC10641527 DOI: 10.3389/fpls.2023.1261705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 09/29/2023] [Indexed: 11/16/2023]
Abstract
Introduction Rice genomes contain endogenous viral elements homologous to rice tungro bacilliform virus (RTBV) from the pararetrovirus family Caulimoviridae. These viral elements, known as endogenous RTBV-like sequences (eRTBVLs), comprise five subfamilies, eRTBVL-A, -B, -C, -D, and -X. Four subfamilies (A, B, C, and X) are present to a limited degree in the genomes of the Asian cultivated rice Oryza sativa (spp. japonica and indica) and the closely related wild species Oryza rufipogon. Methods The eRTBVL-D sequences are widely distributed within these and other Oryza AA-genome species. Fifteen eRTBVL-D segments identified in the japonica (Nipponbare) genome occur mostly at orthologous chromosomal positions in other AA-genome species. The eRTBVL-D sequences were inserted into the genomes just before speciation of the AA-genome species. Results and discussion Ten eRTBVL-D segments are located at six loci, which were used for our evolutionary analyses during the speciation of the AA-genome species. The degree of genetic differentiation varied among the eRTBVL-D segments. Of the six loci, three showed phylogenetic trees consistent with the standard speciation pattern (SSP) of the AA-genome species (Type A), and the other three represented phylogenies different from the SSP (Type B). The atypical phylogenetic trees for the Type B loci revealed chromosome region-specific evolution among the AA-genome species that is associated with phylogenetic incongruences: complex genome rearrangements between eRTBVL-D segments, an introgression between the distant species, and low genetic diversity of a shared eRTBVL-D segment. Using eRTBVL-D as an indicator, this study revealed the phylogenetic incongruence of local chromosomal regions with different topologies that developed during speciation.
Collapse
Affiliation(s)
- Nozomi Saito
- Research Faculty of Agriculture, Hokkaido University, Sapporo, Japan
| | - Sunlu Chen
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Jiangsu Province Engineering Research Center of Seed Industry Science and Technology, Cyrus Tang Innovation Center for Seed Industry, Nanjing Agricultural University, Nanjing, China
| | - Katsuya Kitajima
- Research Faculty of Agriculture, Hokkaido University, Sapporo, Japan
| | - Zhitong Zhou
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, Japan
| | - Yohei Koide
- Research Faculty of Agriculture, Hokkaido University, Sapporo, Japan
| | - Jaymee R. Encabo
- Institute of Biological Sciences, College of Arts and Sciences, University of the Philippines, Los Baños, Laguna, Philippines
| | - Maria Genaleen Q. Diaz
- Institute of Biological Sciences, College of Arts and Sciences, University of the Philippines, Los Baños, Laguna, Philippines
| | - Il-Ryong Choi
- Rice Breeding Platform, International Rice Research Institute, Los Baños, Laguna, Philippines
| | - Kanako O. Koyanagi
- Faculty of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, Japan
| | - Yuji Kishima
- Research Faculty of Agriculture, Hokkaido University, Sapporo, Japan
| |
Collapse
|
2
|
Corel E, Méheust R, Watson AK, McInerney JO, Lopez P, Bapteste E. Bipartite Network Analysis of Gene Sharings in the Microbial World. Mol Biol Evol 2019; 35:899-913. [PMID: 29346651 PMCID: PMC5888944 DOI: 10.1093/molbev/msy001] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Extensive microbial gene flows affect how we understand virology, microbiology, medical sciences, genetic modification, and evolutionary biology. Phylogenies only provide a narrow view of these gene flows: plasmids and viruses, lacking core genes, cannot be attached to cellular life on phylogenetic trees. Yet viruses and plasmids have a major impact on cellular evolution, affecting both the gene content and the dynamics of microbial communities. Using bipartite graphs that connect up to 149,000 clusters of homologous genes with 8,217 related and unrelated genomes, we can in particular show patterns of gene sharing that do not map neatly with the organismal phylogeny. Homologous genes are recycled by lateral gene transfer, and multiple copies of homologous genes are carried by otherwise completely unrelated (and possibly nested) genomes, that is, viruses, plasmids and prokaryotes. When a homologous gene is present on at least one plasmid or virus and at least one chromosome, a process of "gene externalization," affected by a postprocessed selected functional bias, takes place, especially in Bacteria. Bipartite graphs give us a view of vertical and horizontal gene flow beyond classic taxonomy on a single very large, analytically tractable, graph that goes beyond the cellular Web of Life.
Collapse
Affiliation(s)
- Eduardo Corel
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - Raphaël Méheust
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - Andrew K Watson
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - James O McInerney
- Chair in Evolutionary Biology, The University of Manchester, United Kingdom
| | - Philippe Lopez
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - Eric Bapteste
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| |
Collapse
|
3
|
Jain S, Panda A, Colson P, Raoult D, Pontarotti P. MimiLook: A Phylogenetic Workflow for Detection of Gene Acquisition in Major Orthologous Groups of Megavirales. Viruses 2017; 9:v9040072. [PMID: 28387730 PMCID: PMC5408678 DOI: 10.3390/v9040072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Revised: 04/03/2017] [Accepted: 04/03/2017] [Indexed: 12/20/2022] Open
Abstract
With the inclusion of new members, understanding about evolutionary mechanisms and processes by which members of the proposed order, Megavirales, have evolved has become a key area of interest. The central role of gene acquisition has been shown in previous studies. However, the major drawback in gene acquisition studies is the focus on few MV families or putative families with large variation in their genetic structure. Thus, here we have tried to develop a methodology by which we can detect horizontal gene transfers (HGTs), taking into consideration orthologous groups of distantly related Megavirale families. Here, we report an automated workflow MimiLook, prepared as a Perl command line program, that deduces orthologous groups (OGs) from ORFomes of Megavirales and constructs phylogenetic trees by performing alignment generation, alignment editing and protein-protein BLAST (BLASTP) searching across the National Center for Biotechnology Information (NCBI) non-redundant (nr) protein sequence database. Finally, this tool detects statistically validated events of gene acquisitions with the help of the T-REX algorithm by comparing individual gene tree with NCBI species tree. In between the steps, the workflow decides about handling paralogs, filtering outputs, identifying Megavirale specific OGs, detection of HGTs, along with retrieval of information about those OGs that are monophyletic with organisms from cellular domains of life. By implementing MimiLook, we noticed that nine percent of Megavirale gene families (i.e., OGs) have been acquired by HGT, 80% OGs were Megaviralespecific and eight percent were found to be sharing common ancestry with members of cellular domains (Eukaryote, Bacteria, Archaea, Phages or other viruses) and three percent were ambivalent. The results are briefly discussed to emphasize methodology. Also, MimiLook is relevant for detecting evolutionary scenarios in other targeted phyla with user defined modifications. It can be accessed at following link 10.6084/m9.figshare.4653622.
Collapse
Affiliation(s)
- Sourabh Jain
- Aix-Marseille Université, Ecole Centrale de Marseille, I2M UMR 7373, CNRS équipe Evolution Biologique et Modélisation, 13284 Marseille, France.
- Aix-Marseille Université, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes (URMITE), UM63 CNRS 7278 INSERM U1095IRD 198, Faculté de Médecine, 13284 Marseille, France.
| | - Arup Panda
- Aix-Marseille Université, Ecole Centrale de Marseille, I2M UMR 7373, CNRS équipe Evolution Biologique et Modélisation, 13284 Marseille, France.
- Aix-Marseille Université, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes (URMITE), UM63 CNRS 7278 INSERM U1095IRD 198, Faculté de Médecine, 13284 Marseille, France.
| | - Philippe Colson
- Aix-Marseille Université, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes (URMITE), UM63 CNRS 7278 INSERM U1095IRD 198, Faculté de Médecine, 13284 Marseille, France.
- IHU Méditerranée Infection, Assistance Publique-Hôpitaux de Marseille, Centre Hospitalo-universitaire Timone, Pôle des Maladies Infectieuses et Tropicales Clinique et Biologique, Fédération de Bactériologie-Hygiène-Virologie, 13385 Marseille, France.
| | - Didier Raoult
- Aix-Marseille Université, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes (URMITE), UM63 CNRS 7278 INSERM U1095IRD 198, Faculté de Médecine, 13284 Marseille, France.
- IHU Méditerranée Infection, Assistance Publique-Hôpitaux de Marseille, Centre Hospitalo-universitaire Timone, Pôle des Maladies Infectieuses et Tropicales Clinique et Biologique, Fédération de Bactériologie-Hygiène-Virologie, 13385 Marseille, France.
| | - Pierre Pontarotti
- Aix-Marseille Université, Ecole Centrale de Marseille, I2M UMR 7373, CNRS équipe Evolution Biologique et Modélisation, 13284 Marseille, France.
| |
Collapse
|
4
|
Abstract
Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events.
Collapse
Affiliation(s)
| | - Nives Škunca
- ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | | | - Christophe Dessimoz
- University College London, London, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
5
|
Whidden C, Zeh N, Beiko RG. Supertrees Based on the Subtree Prune-and-Regraft Distance. Syst Biol 2014; 63:566-81. [PMID: 24695589 PMCID: PMC4055872 DOI: 10.1093/sysbio/syu023] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Accepted: 03/18/2014] [Indexed: 11/14/2022] Open
Abstract
Supertree methods reconcile a set of phylogenetic trees into a single structure that is often interpreted as a branching history of species. A key challenge is combining conflicting evolutionary histories that are due to artifacts of phylogenetic reconstruction and phenomena such as lateral gene transfer (LGT). Many supertree approaches use optimality criteria that do not reflect underlying processes, have known biases, and may be unduly influenced by LGT. We present the first method to construct supertrees by using the subtree prune-and-regraft (SPR) distance as an optimality criterion. Although calculating the rooted SPR distance between a pair of trees is NP-hard, our new maximum agreement forest-based methods can reconcile trees with hundreds of taxa and>50 transfers in fractions of a second, which enables repeated calculations during the course of an iterative search. Our approach can accommodate trees in which uncertain relationships have been collapsed to multifurcating nodes. Using a series of benchmark datasets simulated under plausible rates of LGT, we show that SPR supertrees are more similar to correct species histories than supertrees based on parsimony or Robinson-Foulds distance criteria. We successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla. Our SPR-based approach also allowed direct inference of highways of gene transfer between bacterial classes and genera. A Small number of these highways connect genera in different phyla and can highlight specific genes implicated in long-distance LGT. [Lateral gene transfer; matrix representation with parsimony; phylogenomics; prokaryotic phylogeny; Robinson-Foulds; subtree prune-and-regraft; supertrees.].
Collapse
Affiliation(s)
- Christopher Whidden
- Faculty of Computer Science, Dalhousie University, 6050 University Avenue, PO Box 15000, Halifax, Nova Scotia, Canada B3H 4R2
| | - Norbert Zeh
- Faculty of Computer Science, Dalhousie University, 6050 University Avenue, PO Box 15000, Halifax, Nova Scotia, Canada B3H 4R2
| | - Robert G Beiko
- Faculty of Computer Science, Dalhousie University, 6050 University Avenue, PO Box 15000, Halifax, Nova Scotia, Canada B3H 4R2
| |
Collapse
|
6
|
Yu Y, Barnett RM, Nakhleh L. Parsimonious inference of hybridization in the presence of incomplete lineage sorting. Syst Biol 2013; 62:738-51. [PMID: 23736104 PMCID: PMC3739885 DOI: 10.1093/sysbio/syt037] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Revised: 11/26/2012] [Accepted: 05/28/2013] [Indexed: 12/31/2022] Open
Abstract
Hybridization plays an important evolutionary role in several groups of organisms. A phylogenetic approach to detect hybridization entails sequencing multiple loci across the genomes of a group of species of interest, reconstructing their gene trees, and taking their differences as indicators of hybridization. However, methods that follow this approach mostly ignore population effects, such as incomplete lineage sorting (ILS). Given that hybridization occurs between closely related organisms, ILS may very well be at play and, hence, must be accounted for in the analysis framework. To address this issue, we present a parsimony criterion for reconciling gene trees within the branches of a phylogenetic network, and a local search heuristic for inferring phylogenetic networks from collections of gene-tree topologies under this criterion. This framework enables phylogenetic analyses while accounting for both hybridization and ILS. Further, we propose two techniques for incorporating information about uncertainty in gene-tree estimates. Our simulation studies demonstrate the good performance of our framework in terms of identifying the location of hybridization events, as well as estimating the proportions of genes that underwent hybridization. Also, our framework shows good performance in terms of efficiency on handling large data sets in our experiments. Further, in analysing a yeast data set, we demonstrate issues that arise when analysing real data sets. Although a probabilistic approach was recently introduced for this problem, and although parsimonious reconciliations have accuracy issues under certain settings, our parsimony framework provides a much more computationally efficient technique for this type of analysis. Our framework now allows for genome-wide scans for hybridization, while also accounting for ILS.
Collapse
Affiliation(s)
- Yun Yu
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - R. Matthew Barnett
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
7
|
Jenkins PA, Song YS, Brem RB. Genealogy-based methods for inference of historical recombination and gene flow and their application in Saccharomyces cerevisiae. PLoS One 2012; 7:e46947. [PMID: 23226196 PMCID: PMC3511476 DOI: 10.1371/journal.pone.0046947] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Accepted: 09/10/2012] [Indexed: 11/17/2022] Open
Abstract
Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance.
Collapse
Affiliation(s)
- Paul A. Jenkins
- Computer Science Division, University of California, Berkeley, California, United States of America
| | - Yun S. Song
- Computer Science Division, University of California, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, California, United States of America
| | - Rachel B. Brem
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
| |
Collapse
|
8
|
Liu L, Chen X, Skogerbø G, Zhang P, Chen R, He S, Huang DW. The human microbiome: A hot spot of microbial horizontal gene transfer. Genomics 2012; 100:265-70. [DOI: 10.1016/j.ygeno.2012.07.012] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Revised: 07/06/2012] [Accepted: 07/16/2012] [Indexed: 12/19/2022]
|
9
|
Boc A, Diallo AB, Makarenkov V. T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. Nucleic Acids Res 2012; 40:W573-9. [PMID: 22675075 PMCID: PMC3394261 DOI: 10.1093/nar/gks485] [Citation(s) in RCA: 278] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
T-REX (Tree and reticulogram REConstruction) is a web server dedicated to the reconstruction of phylogenetic trees, reticulation networks and to the inference of horizontal gene transfer (HGT) events. T-REX includes several popular bioinformatics applications such as MUSCLE, MAFFT, Neighbor Joining, NINJA, BioNJ, PhyML, RAxML, random phylogenetic tree generator and some well-known sequence-to-distance transformation models. It also comprises fast and effective methods for inferring phylogenetic trees from complete and incomplete distance matrices as well as for reconstructing reticulograms and HGT networks, including the detection and validation of complete and partial gene transfers, inference of consensus HGT scenarios and interactive HGT identification, developed by the authors. The included methods allows for validating and visualizing phylogenetic trees and networks which can be built from distance or sequence data. The web server is available at: www.trex.uqam.ca.
Collapse
Affiliation(s)
- Alix Boc
- Département de sciences biologiques, Université de Montréal, C.P. 6128, Succ. Centre-ville, Montréal, QC, H3C 3J7, Canada
| | | | | |
Collapse
|
10
|
Abstract
Methods for identifying alien genes in genomes fall into two general classes. Phylogenetic methods examine the distribution of a gene's homologues among genomes to find those with relationships not consistent with vertical inheritance. These approaches include identifying orphan genes which lack homologues in closely related genomes and genes with unduly high levels of similarity to genes in otherwise unrelated genomes. Rigorous statistical tests are available to place confidence intervals for predicted alien genes. Parametric methods examine the compositional properties of genes within a genome to find those with atypical properties, likely indicating the directional mutational pressures of a donor genome. These methods may compare the properties of genes to genomic averages, properties of genes to each other, or properties of large, multigene regions of the chromosome. Here, we discuss the strengths and weaknesses of each approach.
Collapse
Affiliation(s)
- Rajeev K Azad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | | |
Collapse
|
11
|
THUILLARD MARC, MOULTON VINCENT. IDENTIFYING AND RECONSTRUCTING LATERAL TRANSFERS FROM DISTANCE MATRICES BY COMBINING THE MINIMUM CONTRADICTION METHOD AND NEIGHBOR-NET. J Bioinform Comput Biol 2011; 9:453-70. [DOI: 10.1142/s0219720011005409] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2010] [Revised: 02/01/2011] [Accepted: 02/13/2011] [Indexed: 11/18/2022]
Abstract
Identifying lateral gene transfers is an important problem in evolutionary biology. Under a simple model of evolution, the expected values of an evolutionary distance matrix describing a phylogenetic tree fulfill the so-called Kalmanson inequalities. The Minimum Contradiction method for identifying lateral gene transfers exploits the fact that lateral transfers may generate large deviations from the Kalmanson inequalities. Here a new approach is presented to deal with such cases that combines the Neighbor-Net algorithm for computing phylogenetic networks with the Minimum Contradiction method. A subset of taxa, prescribed using Neighbor-Net, is obtained by measuring how closely the Kalmanson inequalities are fulfilled by each taxon. A criterion is then used to identify the taxa, possibly involved in a lateral transfer between nonconsecutive taxa. We illustrate the utility of the new approach by applying it to a distance matrix for Archaea, Bacteria, and Eukaryota.
Collapse
Affiliation(s)
| | - VINCENT MOULTON
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| |
Collapse
|
12
|
Boc A, Makarenkov V. Towards an accurate identification of mosaic genes and partial horizontal gene transfers. Nucleic Acids Res 2011; 39:e144. [PMID: 21917854 PMCID: PMC3241670 DOI: 10.1093/nar/gkr735] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Many bacteria and viruses adapt to varying environmental conditions through the acquisition of mosaic genes. A mosaic gene is composed of alternating sequence polymorphisms either belonging to the host original allele or derived from the integrated donor DNA. Often, the integrated sequence contains a selectable genetic marker (e.g. marker allowing for antibiotic resistance). An effective identification of mosaic genes and detection of corresponding partial horizontal gene transfers (HGTs) are among the most important challenges posed by evolutionary biology. We developed a method for detecting partial HGT events and related intragenic recombination giving rise to the formation of mosaic genes. A bootstrap procedure incorporated in our method is used to assess the support of each predicted partial gene transfer. The proposed method can be also applied to confirm or discard complete (i.e. traditional) horizontal gene transfers detected by any HGT inferring method. While working on a full-genome scale, the new method can be used to assess the level of mosaicism in the considered genomes as well as the rates of complete and partial HGT underlying their evolution.
Collapse
Affiliation(s)
- Alix Boc
- Département d'Informatique, Université du Québec à Montréal, CP 8888, Succursale Centre Ville, Montreal, QC, Canada H3C 3P8
| | | |
Collapse
|
13
|
Leigh JW, Lapointe FJ, Lopez P, Bapteste E. Evaluating phylogenetic congruence in the post-genomic era. Genome Biol Evol 2011; 3:571-87. [PMID: 21712432 PMCID: PMC3156567 DOI: 10.1093/gbe/evr050] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/27/2011] [Indexed: 12/04/2022] Open
Abstract
Congruence is a broadly applied notion in evolutionary biology used to justify multigene phylogeny or phylogenomics, as well as in studies of coevolution, lateral gene transfer, and as evidence for common descent. Existing methods for identifying incongruence or heterogeneity using character data were designed for data sets that are both small and expected to be rarely incongruent. At the same time, methods that assess incongruence using comparison of trees test a null hypothesis of uncorrelated tree structures, which may be inappropriate for phylogenomic studies. As such, they are ill-suited for the growing number of available genome sequences, most of which are from prokaryotes and viruses, either for phylogenomic analysis or for studies of the evolutionary forces and events that have shaped these genomes. Specifically, many existing methods scale poorly with large numbers of genes, cannot accommodate high levels of incongruence, and do not adequately model patterns of missing taxa for different markers. We propose the development of novel incongruence assessment methods suitable for the analysis of the molecular evolution of the vast majority of life and support the investigation of homogeneity of evolutionary process in cases where markers do not share identical tree structures.
Collapse
Affiliation(s)
- Jessica W Leigh
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand.
| | | | | | | |
Collapse
|
14
|
Rodionov A, Bezginov A, Rose J, Tillier ERM. A new, fast algorithm for detecting protein coevolution using maximum compatible cliques. Algorithms Mol Biol 2011; 6:17. [PMID: 21672226 PMCID: PMC3130660 DOI: 10.1186/1748-7188-6-17] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2010] [Accepted: 06/14/2011] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time. RESULTS In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM. CONCLUSIONS MMMvII will thus allow for more more extensive and intricate analyses of coevolution. AVAILABILITY An implementation of the MMMvII algorithm is available at: http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php.
Collapse
Affiliation(s)
- Alex Rodionov
- The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
| | - Alexandr Bezginov
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | - Jonathan Rose
- The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
| | - Elisabeth RM Tillier
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Ontario Cancer Institute, University Health Network, 101 College Street., Toronto, M5G 1L7, Canada
| |
Collapse
|
15
|
Abstract
Throughout the living world, genetic recombination and nucleotide substitution are the primary processes that create the genetic variation upon which natural selection acts. Just as analyses of substitution patterns can reveal a great deal about evolution, so too can analyses of recombination. Evidence of genetic recombination within the genomes of apparently asexual species can equate with evidence of cryptic sexuality. In sexually reproducing species, nonrandom patterns of sequence exchange can provide direct evidence of population subdivisions that prevent certain individuals from mating. Although an interesting topic in its own right, an important reason for analysing recombination is to account for its potentially disruptive influences on various phylogenetic-based molecular evolution analyses. Specifically, the evolutionary histories of recombinant sequences cannot be accurately described by standard bifurcating phylogenetic trees. Taking recombination into account can therefore be pivotal to the success of selection, molecular clock and various other analyses that require adequate modelling of shared ancestry and draw increased power from accurately inferred phylogenetic trees. Here, we review various computational approaches to studying recombination and provide guidelines both on how to gain insights into this important evolutionary process and on how it can be properly accounted for during molecular evolution studies.
Collapse
Affiliation(s)
- Darren P Martin
- Computational Biology Group, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | | | | |
Collapse
|
16
|
Kannan L, Li H, Mushegian A. A polynomial-time algorithm computing lower and upper bounds of the rooted subtree prune and regraft distance. J Comput Biol 2010; 18:743-57. [PMID: 21166560 DOI: 10.1089/cmb.2010.0045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Rooted, leaf-labeled trees are used in biology to represent hierarchical relationships of various entities, most notably the evolutionary history of molecules and organisms. Rooted Subtree Prune and Regraft (rSPR) operation is a tree rearrangement operation that is used to transform a tree into another tree that has the same set of leaf labels. The minimum number of rSPR operations that transform one tree into another is denoted by d(rSPR) and gives a measure of dissimilarity between the trees, which can be used to compare trees obtained by different approaches, or, in the context of phylogenetic analysis, to detect horizontal gene transfer events by finding incongruences between trees of different evolving characters. The problem of computing the exact d(rSPR) measure is NP-hard, and most algorithms resort to finding sequences of rSPR operations that are sufficient for transforming one tree into another, thereby giving upper bound heuristics for the distance. In this article, we present an O(n⁴) recursive algorithm D-Clust that gives both lower bound and upper bound heuristics for the distance between trees with n shared leaves and also gives a sequence of operations that transforms one tree into another. Our experiments on simulated pairs of trees containing up to 100 leaves showed that the two bounds are almost equal for small distances, thereby giving the nearly-precise actual value, and that the upper bound tends to be close to the upper bounds given by other approaches for all pairs of trees.
Collapse
Affiliation(s)
- Lavanya Kannan
- Bioinformatics Center, Stowers Institute for Medical Research, Kansas City, Missouri, USA.
| | | | | |
Collapse
|
17
|
Hill T, Nordström KJV, Thollesson M, Säfström TM, Vernersson AKE, Fredriksson R, Schiöth HB. SPRIT: Identifying horizontal gene transfer in rooted phylogenetic trees. BMC Evol Biol 2010; 10:42. [PMID: 20152048 PMCID: PMC2829038 DOI: 10.1186/1471-2148-10-42] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2009] [Accepted: 02/13/2010] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Phylogenetic trees based on sequences from a set of taxa can be incongruent due to horizontal gene transfer (HGT). By identifying the HGT events, we can reconcile the gene trees and derive a taxon tree that adequately represents the species' evolutionary history. One HGT can be represented by a rooted Subtree Prune and Regraft (RSPR) operation and the number of RSPRs separating two trees corresponds to the minimum number of HGT events. Identifying the minimum number of RSPRs separating two trees is NP-hard, but the problem can be reduced to fixed parameter tractable. A number of heuristic and two exact approaches to identifying the minimum number of RSPRs have been proposed. This is the first implementation delivering an exact solution as well as the intermediate trees connecting the input trees. RESULTS We present the SPR Identification Tool (SPRIT), a novel algorithm that solves the fixed parameter tractable minimum RSPR problem and its GPL licensed Java implementation. The algorithm can be used in two ways, exhaustive search that guarantees the minimum RSPR distance and a heuristic approach that guarantees finding a solution, but not necessarily the minimum one. We benchmarked SPRIT against other software in two different settings, small to medium sized trees i.e. five to one hundred taxa and large trees i.e. thousands of taxa. In the small to medium tree size setting with random artificial incongruence, SPRIT's heuristic mode outperforms the other software by always delivering a solution with a low overestimation of the RSPR distance. In the large tree setting SPRIT compares well to the alternatives when benchmarked on finding a minimum solution within a reasonable time. SPRIT presents both the minimum RSPR distance and the intermediate trees. CONCLUSIONS When used in exhaustive search mode, SPRIT identifies the minimum number of RSPRs needed to reconcile two incongruent rooted trees. SPRIT also performs quick approximations of the minimum RSPR distance, which are comparable to, and often better than, purely heuristic solutions. Put together, SPRIT is an excellent tool for identification of HGT events and pinpointing which taxa have been involved in HGT.
Collapse
Affiliation(s)
- Tobias Hill
- Department of Neuroscience, Biomedical Centre, Uppsala University, Box 593, SE-751 24 Uppsala, Sweden
| | - Karl JV Nordström
- Department of Neuroscience, Biomedical Centre, Uppsala University, Box 593, SE-751 24 Uppsala, Sweden
| | - Mikael Thollesson
- Department of Evolution, Genomics and Systematics, Uppsala University, Norbyvägen 18C, SE-752 36 Uppsala, Sweden
| | - Tommy M Säfström
- Department of Neuroscience, Biomedical Centre, Uppsala University, Box 593, SE-751 24 Uppsala, Sweden
| | - Andreas KE Vernersson
- Department of Neuroscience, Biomedical Centre, Uppsala University, Box 593, SE-751 24 Uppsala, Sweden
| | - Robert Fredriksson
- Department of Neuroscience, Biomedical Centre, Uppsala University, Box 593, SE-751 24 Uppsala, Sweden
| | - Helgi B Schiöth
- Department of Neuroscience, Biomedical Centre, Uppsala University, Box 593, SE-751 24 Uppsala, Sweden
| |
Collapse
|
18
|
Boc A, Philippe H, Makarenkov V. Inferring and validating horizontal gene transfer events using bipartition dissimilarity. Syst Biol 2010; 59:195-211. [PMID: 20525630 DOI: 10.1093/sysbio/syp103] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Horizontal gene transfer (HGT) is one of the main mechanisms driving the evolution of microorganisms. Its accurate identification is one of the major challenges posed by reticulate evolution. In this article, we describe a new polynomial-time algorithm for inferring HGT events and compare 3 existing and 1 new tree comparison indices in the context of HGT identification. The proposed algorithm can rely on different optimization criteria, including least squares (LS), Robinson and Foulds (RF) distance, quartet distance (QD), and bipartition dissimilarity (BD), when searching for an optimal scenario of subtree prune and regraft (SPR) moves needed to transform the given species tree into the given gene tree. As the simulation results suggest, the algorithmic strategy based on BD, introduced in this article, generally provides better results than those based on LS, RF, and QD. The BD-based algorithm also proved to be more accurate and faster than a well-known polynomial time heuristic RIATA-HGT. Moreover, the HGT recovery results yielded by BD were generally equivalent to those provided by the exponential-time algorithm LatTrans, but a clear gain in running time was obtained using the new algorithm. Finally, a statistical framework for assessing the reliability of obtained HGTs by bootstrap analysis is also presented.
Collapse
Affiliation(s)
- Alix Boc
- Département d'informatique, Université du Québec à Montréal, C.P. 8888, Succ. Centre-ville, Montréal, Québec, Canada.
| | | | | |
Collapse
|
19
|
Choi K, Gomez SM. Comparison of phylogenetic trees through alignment of embedded evolutionary distances. BMC Bioinformatics 2009; 10:423. [PMID: 20003527 PMCID: PMC3087345 DOI: 10.1186/1471-2105-10-423] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Accepted: 12/15/2009] [Indexed: 11/12/2022] Open
Abstract
Background The understanding of evolutionary relationships is a fundamental aspect of modern biology, with the phylogenetic tree being a primary tool for describing these associations. However, comparison of trees for the purpose of assessing similarity and the quantification of various biological processes remains a significant challenge. Results We describe a novel approach for the comparison of phylogenetic distance information based on the alignment of representative high-dimensional embeddings (xCEED: Comparison of Embedded Evolutionary Distances). The xCEED methodology, which utilizes multidimensional scaling and Procrustes-related superimposition approaches, provides the ability to measure the global similarity between trees as well as incongruities between them. We demonstrate the application of this approach to the prediction of coevolving protein interactions and demonstrate its improved performance over the mirrortree, tol-mirrortree, phylogenetic vector projection, and partial correlation approaches. Furthermore, we show its applicability to both the detection of horizontal gene transfer events as well as its potential use in the prediction of interaction specificity between a pair of multigene families. Conclusions These approaches provide additional tools for the study of phylogenetic trees and associated evolutionary processes. Source code is available at http://gomezlab.bme.unc.edu/tools.
Collapse
Affiliation(s)
- Kwangbom Choi
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
| | | |
Collapse
|
20
|
Bapteste E, O'Malley MA, Beiko RG, Ereshefsky M, Gogarten JP, Franklin-Hall L, Lapointe FJ, Dupré J, Dagan T, Boucher Y, Martin W. Prokaryotic evolution and the tree of life are two different things. Biol Direct 2009; 4:34. [PMID: 19788731 PMCID: PMC2761302 DOI: 10.1186/1745-6150-4-34] [Citation(s) in RCA: 128] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Accepted: 09/29/2009] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The concept of a tree of life is prevalent in the evolutionary literature. It stems from attempting to obtain a grand unified natural system that reflects a recurrent process of species and lineage splittings for all forms of life. Traditionally, the discipline of systematics operates in a similar hierarchy of bifurcating (sometimes multifurcating) categories. The assumption of a universal tree of life hinges upon the process of evolution being tree-like throughout all forms of life and all of biological time. In multicellular eukaryotes, the molecular mechanisms and species-level population genetics of variation do indeed mainly cause a tree-like structure over time. In prokaryotes, they do not. Prokaryotic evolution and the tree of life are two different things, and we need to treat them as such, rather than extrapolating from macroscopic life to prokaryotes. In the following we will consider this circumstance from philosophical, scientific, and epistemological perspectives, surmising that phylogeny opted for a single model as a holdover from the Modern Synthesis of evolution. RESULTS It was far easier to envision and defend the concept of a universal tree of life before we had data from genomes. But the belief that prokaryotes are related by such a tree has now become stronger than the data to support it. The monistic concept of a single universal tree of life appears, in the face of genome data, increasingly obsolete. This traditional model to describe evolution is no longer the most scientifically productive position to hold, because of the plurality of evolutionary patterns and mechanisms involved. Forcing a single bifurcating scheme onto prokaryotic evolution disregards the non-tree-like nature of natural variation among prokaryotes and accounts for only a minority of observations from genomes. CONCLUSION Prokaryotic evolution and the tree of life are two different things. Hence we will briefly set out alternative models to the tree of life to study their evolution. Ultimately, the plurality of evolutionary patterns and mechanisms involved, such as the discontinuity of the process of evolution across the prokaryote-eukaryote divide, summons forth a pluralistic approach to studying evolution. REVIEWERS This article was reviewed by Ford Doolittle, John Logsdon and Nicolas Galtier.
Collapse
|
21
|
Abstract
This chapter discusses the pros and cons of the existing computational methods for the detection of horizontal (or lateral) gene transfer and highlights the genome-wide studies utilizing these methods. The impact of horizontal gene transfer (HGT) on prokaryote genome evolution is discussed.
Collapse
|
22
|
Lemey P, Lott M, Martin DP, Moulton V. Identifying recombinants in human and primate immunodeficiency virus sequence alignments using quartet scanning. BMC Bioinformatics 2009; 10:126. [PMID: 19397803 PMCID: PMC2684544 DOI: 10.1186/1471-2105-10-126] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Accepted: 04/27/2009] [Indexed: 12/02/2022] Open
Abstract
Background Recombination has a profound impact on the evolution of viruses, but characterizing recombination patterns in molecular sequences remains a challenging endeavor. Despite its importance in molecular evolutionary studies, identifying the sequences that exhibit such patterns has received comparatively less attention in the recombination detection framework. Here, we extend a quartet-mapping based recombination detection method to enable identification of recombinant sequences without prior specifications of either query and reference sequences. Through simulations we evaluate different recombinant identification statistics and significance tests. We compare the quartet approach with triplet-based methods that employ additional heuristic tests to identify parental and recombinant sequences. Results Analysis of phylogenetic simulations reveal that identifying the descendents of relatively old recombination events is a challenging task for all methods available, and that quartet scanning performs relatively well compared to the triplet based methods. The use of quartet scanning is further demonstrated by analyzing both well-established and putative HIV-1 recombinant strains. In agreement with recent findings, we provide evidence that the presumed circulating recombinant CRF02_AG is a 'pure' lineage, whereas the presumed parental lineage subtype G has a recombinant origin. We also demonstrate HIV-1 intrasubtype recombination, confirm the hybrid origin of SIV in chimpanzees and further disentangle the recombinant history of SIV lineages in a primate immunodeficiency virus data set. Conclusion Quartet scanning makes a valuable addition to triplet-based methods for identifying recombinant sequences without prior specifications of either query and reference sequences. The new method is available in the VisRD v.3.0 package .
Collapse
Affiliation(s)
- Philippe Lemey
- Rega Institute, Katholieke Universiteit Leuven, Minderbroedersstraat 10, 3000 Leuven, Belgium.
| | | | | | | |
Collapse
|
23
|
Langille MGI, Brinkman FSL. Bioinformatic detection of horizontally transferred DNA in bacterial genomes. F1000 BIOLOGY REPORTS 2009; 1:25. [PMID: 20948661 PMCID: PMC2920674 DOI: 10.3410/b1-25] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
We highlight a selection of recent research on computational methods and associated challenges surrounding the prediction of bacterial horizontal gene transfer. This research area continues to face controversy, but is becoming more critical as the importance of horizontal gene transfer in medically and ecologically important prokaryotic evolution is further appreciated.
Collapse
Affiliation(s)
- Morgan G I Langille
- Department of Molecular Biology and Biochemistry, Simon Fraser University Burnaby, BC Canada V5A 1S6
| | | |
Collapse
|
24
|
|
25
|
Walsh DA, Sharma AK. Molecular phylogenetics: testing evolutionary hypotheses. Methods Mol Biol 2009; 502:131-168. [PMID: 19082555 DOI: 10.1007/978-1-60327-565-1_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
A common approach for investigating evolutionary relationships between genes and organisms is to compare extant DNA or protein sequences and infer an evolutionary tree. This methodology is known as molecular phylogenetics and may be the most informative means for exploring phage evolution, since there are few morphological features that can be used to differentiate between these tiny biological entities. In addition, phage genomes can be mosaic, meaning different genes or genomic regions can exhibit conflicting evolutionary histories due to lateral gene transfer or homologous recombination between different phage genomes. Molecular phylogenetics can be used to identify and study such genome mosaicism. This chapter provides a general introduction to the theory and methodology used to reconstruct phylogenetic relationships from molecular data. Also included is a discussion on how the evolutionary history of different genes within the same set of genomes can be compared, using a collection of T4-type phage genomes as an example. A compilation of programs and packages that are available for conducting phylogenetic analyses is supplied as an accompanying appendix.
Collapse
Affiliation(s)
- David A Walsh
- Department of Biochemistry and Molecular Biology, Dalhousie University, Nova Scotia, Canada
| | | |
Collapse
|
26
|
Beiko RG, Ragan MA. Untangling hybrid phylogenetic signals: horizontal gene transfer and artifacts of phylogenetic reconstruction. Methods Mol Biol 2009; 532:241-256. [PMID: 19271189 DOI: 10.1007/978-1-60327-853-9_14] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Phylogenomic methods can be used to investigate the tangled evolutionary relationships among genomes. Building 'all the trees of all the genes' can potentially identify common pathways of horizontal gene transfer (HGT) among taxa at varying levels of phylogenetic depth. Phylogenetic affinities can be aggregated and merged with the information about genetic linkage and biochemical function to examine hypotheses of adaptive evolution via HGT. Additionally, the use of many genetic data sets increases the power of statistical tests for phylogenetic artifacts. However, large-scale phylogenetic analyses pose several challenges, including the necessary abandonment of manual validation techniques, the need to translate inferred phylogenetic discordance into inferred HGT events, and the challenges involved in aggregating results from search-based inference methods. In this chapter we describe a tree search procedure to recover the most parsimonious pathways of HGT, and examine some of the assumptions that are made by this method.
Collapse
Affiliation(s)
- Robert G Beiko
- Department of Computer Science, Dalhousie University, Halifax, NS, Canada
| | | |
Collapse
|
27
|
|
28
|
Abstract
MOTIVATION Subtree prune and regraft (SPR) is one kind of tree rearrangements that has seen applications in solving several computational biology problems. The minimum number of rooted SPR ((r)SPR) operations needed to transform one rooted binary tree to another is called the (r)SPR distance between the two trees. Computing the (r)SPR distance has been actively studied in recent years. Currently, there is a lack of practical software tools for computing the (r)SPR distance for relatively large trees with large (r)SPR distance. RESULTS In this article, we present a simple and practical method that computes the exact (r)SPR distance with integer linear programming. By applying this new method on several simulated and real biological datasets, we show that our new method outperforms existing software tools in term of accuracy and efficiency. Our experimental results indicate that our method can compute the exact (r)SPR distance for many large trees with large (r)SPR distance.
Collapse
Affiliation(s)
- Yufeng Wu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA.
| |
Collapse
|
29
|
Than C, Ruths D, Nakhleh L. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 2008; 9:322. [PMID: 18662388 PMCID: PMC2533029 DOI: 10.1186/1471-2105-9-322] [Citation(s) in RCA: 267] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2008] [Accepted: 07/28/2008] [Indexed: 11/25/2022] Open
Abstract
Background Phylogenies, i.e., the evolutionary histories of groups of taxa, play a major role in representing the interrelationships among biological entities. Many software tools for reconstructing and evaluating such phylogenies have been proposed, almost all of which assume the underlying evolutionary history to be a tree. While trees give a satisfactory first-order approximation for many families of organisms, other families exhibit evolutionary mechanisms that cannot be represented by trees. Processes such as horizontal gene transfer (HGT), hybrid speciation, and interspecific recombination, collectively referred to as reticulate evolutionary events, result in networks, rather than trees, of relationships. Various software tools have been recently developed to analyze reticulate evolutionary relationships, which include SplitsTree4, LatTrans, EEEP, HorizStory, and T-REX. Results In this paper, we report on the PhyloNet software package, which is a suite of tools for analyzing reticulate evolutionary relationships, or evolutionary networks, which are rooted, directed, acyclic graphs, leaf-labeled by a set of taxa. These tools can be classified into four categories: (1) evolutionary network representation: reading/writing evolutionary networks in a newly devised compact form; (2) evolutionary network characterization: analyzing evolutionary networks in terms of three basic building blocks – trees, clusters, and tripartitions; (3) evolutionary network comparison: comparing two evolutionary networks in terms of topological dissimilarities, as well as fitness to sequence evolution under a maximum parsimony criterion; and (4) evolutionary network reconstruction: reconstructing an evolutionary network from a species tree and a set of gene trees. Conclusion The software package, PhyloNet, offers an array of utilities to allow for efficient and accurate analysis of evolutionary networks. The software package will help significantly in analyzing large data sets, as well as in studying the performance of evolutionary network reconstruction methods. Further, the software package supports the proposed eNewick format for compact representation of evolutionary networks, a feature that allows for efficient interoperability of evolutionary network software tools. Currently, all utilities in PhyloNet are invoked on the command line.
Collapse
Affiliation(s)
- Cuong Than
- Department of Computer Science, Rice University, 6100 Main Street, MS 132, Houston, TX, USA.
| | | | | |
Collapse
|
30
|
Martins LDO, Leal E, Kishino H. Phylogenetic detection of recombination with a Bayesian prior on the distance between trees. PLoS One 2008; 3:e2651. [PMID: 18612422 PMCID: PMC2440540 DOI: 10.1371/journal.pone.0002651] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2008] [Accepted: 06/07/2008] [Indexed: 11/18/2022] Open
Abstract
Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transition∶transversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination.
Collapse
|
31
|
Tamames J, Moya A. Estimating the extent of horizontal gene transfer in metagenomic sequences. BMC Genomics 2008; 9:136. [PMID: 18366724 PMCID: PMC2324111 DOI: 10.1186/1471-2164-9-136] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2007] [Accepted: 03/24/2008] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Although the extent of horizontal gene transfer (HGT) in complete genomes has been widely studied, its influence in the evolution of natural communities of prokaryotes remains unknown. The availability of metagenomic sequences allows us to address the study of global patterns of prokaryotic evolution in samples from natural communities. However, the methods that have been commonly used for the study of HGT are not suitable for metagenomic samples. Therefore it is important to develop new methods or to adapt existing ones to be used with metagenomic sequences. RESULTS We have created two different methods that are suitable for the study of HGT in metagenomic samples. The methods are based on phylogenetic and DNA compositional approaches, and have allowed us to assess the extent of possible HGT events in metagenomes for the first time. The methods are shown to be compatible and quite precise, although they probably underestimate the number of possible events. Our results show that the phylogenetic method detects HGT in between 0.8% and 1.5% of the sequences, while DNA compositional methods identify putative HGT in between 2% and 8% of the sequences. These ranges are very similar to these found in complete genomes by related approaches. Both methods act with a different sensitivity since they probably target HGT events of different ages: the compositional method mostly identifies recent transfers, while the phylogenetic is more suitable for the detections of older events. Nevertheless, the study of the number of HGT events in metagenomic sequences from different communities shows a consistent trend for both methods: the lower amount is found for the sequences of the Sargasso Sea metagenome, while the higher quantity is found in the whale fall metagenome from the bottom of the ocean. The significance of these observations is discussed. CONCLUSION The computational approaches that are used to find possible HGT events in complete genomes can be adapted to work with metagenomic samples, where a level of high performance is shown in different metagenomic samples. The percentage of possible HGT events that were observed is close to that found for complete genomes, and different microbiomes show diverse ratios of putative HGT events. This is probably related with both environmental factors and the composition in the species of each particular community.
Collapse
Affiliation(s)
- Javier Tamames
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva. Universidad de Valencia. Polígono La Coma s/n, 46980 Paterna (Valencia), Spain
- CIBER en Epidemiología y Salud Pública (CIBER-ESP), Spain
| | - Andrés Moya
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva. Universidad de Valencia. Polígono La Coma s/n, 46980 Paterna (Valencia), Spain
- CIBER en Epidemiología y Salud Pública (CIBER-ESP), Spain
| |
Collapse
|
32
|
|
33
|
Abstract
How much horizontal gene transfer (HGT) between species influences bacterial phylogenomics is a controversial issue. This debate, however, lacks any quantitative assessment of the impact of HGT on phylogenies and of the ability of tree-building methods to cope with such events. I introduce a Markov model of genome evolution with HGT, accounting for the constraints on time -- an HGT event can only occur between concomitantly living species. This model is used to simulate multigene sequence data sets with or without HGT. The consequences of HGT on phylogenomic inference are analyzed and compared to other well-known phylogenetic artefacts. It is found that supertree methods are quite robust to HGT, keeping high levels of performance even when gene trees are largely incongruent with each other. Gene tree incongruence per se is not indicative of HGT. HGT, however, removes the (otherwise observed) positive relationship between sequence length and gene tree congruence to the estimated species tree. Surprisingly, when applied to a bacterial and a eukaryotic multigene data set, this criterion rejects the HGT hypothesis for the former, but not the latter data set.
Collapse
Affiliation(s)
- Nicolas Galtier
- Institut des Sciences de l'Evolution (UM2-CNRS), Université Montpellier 2, Montpellier, France.
| |
Collapse
|
34
|
Podell S, Gaasterland T. DarkHorse: a method for genome-wide prediction of horizontal gene transfer. Genome Biol 2007; 8:R16. [PMID: 17274820 PMCID: PMC1852411 DOI: 10.1186/gb-2007-8-2-r16] [Citation(s) in RCA: 123] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2006] [Revised: 11/09/2006] [Accepted: 02/02/2007] [Indexed: 12/14/2022] Open
Abstract
DarkHorse is a new approach to rapid, genome-wide identification and ranking of horizontal transfer candidate proteins. A new approach to rapid, genome-wide identification and ranking of horizontal transfer candidate proteins is presented. The method is quantitative, reproducible, and computationally undemanding. It can be combined with genomic signature and/or phylogenetic tree-building procedures to improve accuracy and efficiency. The method is also useful for retrospective assessments of horizontal transfer prediction reliability, recognizing orthologous sequences that may have been previously overlooked or unavailable. These features are demonstrated in bacterial, archaeal, and eukaryotic examples.
Collapse
Affiliation(s)
- Sheila Podell
- Scripps Genome Center, Scripps Institution of Oceanography, University of California at San Diego, Gilman Drive, La Jolla, CA 92093-0202, USA
| | - Terry Gaasterland
- Scripps Genome Center, Scripps Institution of Oceanography, University of California at San Diego, Gilman Drive, La Jolla, CA 92093-0202, USA
| |
Collapse
|
35
|
Than C, Ruths D, Innan H, Nakhleh L. Confounding factors in HGT detection: statistical error, coalescent effects, and multiple solutions. J Comput Biol 2007; 14:517-35. [PMID: 17572027 DOI: 10.1089/cmb.2007.a010] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Prokaryotic organisms share genetic material across species boundaries by means of a process known as horizontal gene transfer (HGT). This process has great significance for understanding prokaryotic genome diversification and unraveling their complexities. Phylogeny-based detection of HGT is one of the most commonly used methods for this task, and is based on the fundamental fact that HGT may cause gene trees to disagree with one another, as well as with the species phylogeny. Using these methods, we can compare gene and species trees, and infer a set of HGT events to reconcile the differences among these trees. In this paper, we address three factors that confound the detection of the true HGT events, including the donors and recipients of horizontally transferred genes. First, we study experimentally the effects of error in the estimated gene trees (statistical error) on the accuracy of inferred HGT events. Our results indicate that statistical error leads to overestimation of the number of HGT events, and that HGT detection methods should be designed with unresolved gene trees in mind. Second, we demonstrate, both theoretically and empirically, that based on topological comparison alone, the number of HGT scenarios that reconcile a pair of species/gene trees may be exponential. This number may be reduced when branch lengths in both trees are estimated correctly. This set of results implies that in the absence of additional biological information, and/or a biological model of how HGT occurs, multiple HGT scenarios must be sought, and efficient strategies for how to enumerate such solutions must be developed. Third, we address the issue of lineage sorting, how it confounds HGT detection, and how to incorporate it with HGT into a single stochastic framework that distinguishes between the two events by extending population genetics theories. This result is very important, particularly when analyzing closely related organisms, where coalescent effects may not be ignored when reconciling gene trees. In addition to these three confounding factors, we consider the problem of enumerating all valid coalescent scenarios that constitute plausible species/gene tree reconciliations, and develop a polynomial-time dynamic programming algorithm for solving it. This result bears great significance on reducing the search space for heuristics that seek reconciliation scenarios. Finally, we show, empirically, that the locality of incongruence between a pair of trees has an impact on the numbers of HGT and coalescent reconciliation scenarios.
Collapse
Affiliation(s)
- Cuong Than
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | | | | | | |
Collapse
|
36
|
Ribeiro SG, Martin DP, Lacorte C, Simões IC, Orlandini DRS, Inoue-Nagata AK. Molecular and Biological Characterization of Tomato chlorotic mottle virus Suggests that Recombination Underlies the Evolution and Diversity of Brazilian Tomato Begomoviruses. PHYTOPATHOLOGY 2007; 97:702-711. [PMID: 18943601 DOI: 10.1094/phyto-97-6-0702] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
ABSTRACT Tomato chlorotic mottle virus (ToCMoV) is an emerging begomovirus species widely distributed throughout tomato-growing regions of Brazil. ToCMoV appears to have expanded its geographic range recently, invading tomato-growing areas that were free of begomovirus infection before 2004. We have determined the first complete genome sequence of an infectious ToCMoV genome (isolate BA-Se1), which is the first begomovirus species isolated in the northeast of Brazil. When introduced by particle bombardment into tomato, the cloned ToCMoV-[BA-Se1] DNA-A and DNA-B components caused typical chlorotic mottle symptoms. The cloned virus was whitefly-transmissible and, although it was infectious in hosts such as Nicotiana benthamiana, pepper, tobacco, and Nicandra physaloides, it was unable to infect Arabidopsis thaliana, bean, N. glutinosa, and Datura metel. Sequence and biological analyses indicate that ToCMoV-[BA-Se1] is a typical New World begomovirus sp. requiring both DNA-A and DNA-B components to establish systemic infections. Although evidence of multiple recombination events was detected within the ToCMoV-[BA-Se1] DNA-A, they apparently occurred relatively long ago, implying that recombination probably has not contributed to the recent emergence of this species.
Collapse
|
37
|
Doolittle WF, Bapteste E. Pattern pluralism and the Tree of Life hypothesis. Proc Natl Acad Sci U S A 2007; 104:2043-9. [PMID: 17261804 PMCID: PMC1892968 DOI: 10.1073/pnas.0610699104] [Citation(s) in RCA: 366] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2006] [Indexed: 11/18/2022] Open
Abstract
Darwin claimed that a unique inclusively hierarchical pattern of relationships between all organisms based on their similarities and differences [the Tree of Life (TOL)] was a fact of nature, for which evolution, and in particular a branching process of descent with modification, was the explanation. However, there is no independent evidence that the natural order is an inclusive hierarchy, and incorporation of prokaryotes into the TOL is especially problematic. The only data sets from which we might construct a universal hierarchy including prokaryotes, the sequences of genes, often disagree and can seldom be proven to agree. Hierarchical structure can always be imposed on or extracted from such data sets by algorithms designed to do so, but at its base the universal TOL rests on an unproven assumption about pattern that, given what we know about process, is unlikely to be broadly true. This is not to say that similarities and differences between organisms are not to be accounted for by evolutionary mechanisms, but descent with modification is only one of these mechanisms, and a single tree-like pattern is not the necessary (or expected) result of their collective operation. Pattern pluralism (the recognition that different evolutionary models and representations of relationships will be appropriate, and true, for different taxa or at different scales or for different purposes) is an attractive alternative to the quixotic pursuit of a single true TOL.
Collapse
Affiliation(s)
- W Ford Doolittle
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada B3H 1X5.
| | | |
Collapse
|
38
|
Filée J, Bapteste E, Susko E, Krisch HM. A selective barrier to horizontal gene transfer in the T4-type bacteriophages that has preserved a core genome with the viral replication and structural genes. Mol Biol Evol 2006; 23:1688-96. [PMID: 16782763 DOI: 10.1093/molbev/msl036] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Genomic analysis of bacteriophages frequently reveals a mosaic structure made up from modules that come from disparate sources. This fact has led to the general acceptance of the notion that rampant and promiscuous lateral gene transfer (LGT) plays a critical role in phage evolution. However, recent sequencing of a series of the T4-type phages has revealed that these large and complex genomes all share 2 substantial syntenous blocks of genes encoding the replication and virion structural genes. To analyze the pattern of inheritance of this core T4 genome, we compared the complete genome sequences of 16 T4-type phages. We identified a set of 24 genes present in all these T4-type genomes. Somewhat surprisingly, only one of these genes, that encodes for ribonucleotide reductase (NrdA), displayed evidence of LGT with the bacterial host. We test the congruence of the inheritance of the other 23 markers using heat map analyses and comparison of a reference topology with the 23 individual gene phylogenies. The vast majority of these core genes share a common evolutionary history. In contrast, analyses of all the noncore genes present in the same 16 genomes, located in the hyperplastic regions of the genome, show considerable evidence of frequent LGT. The similar evolution of the core replication and virion structural genes in the T4-type phage genomes suggests that, unlike the situation in many other phage groups, such portions of T4-type genome have been inherited as a block, without significant LGT, from a distant common ancestor. The preservation of the synteny of the core T4 genome could result from several factors acting in synergy, such as the constraints imposed by the sophisticated regulation of the transcription. Moreover, numerous and complex protein-protein interactions during virion morphogenesis could also impose a supplementary barrier against LGT. Finally, there may be some real evolutionary advantage to maintaining large regions of conserved sequence. Such segments could be a sort of genetic glue that maintains the genetic cohesion of the T4-type phages via recombination within the most conserved sequences. This could mediate the swapping of nonconserved sequences that they flank.
Collapse
Affiliation(s)
- Jonathan Filée
- Laboratoire de Microbiologie et Génétique Moléculaire, CNRS UMR-5100, Toulouse, France.
| | | | | | | |
Collapse
|
39
|
Susko E, Leigh J, Doolittle WF, Bapteste E. Visualizing and assessing phylogenetic congruence of core gene sets: a case study of the gamma-proteobacteria. Mol Biol Evol 2006; 23:1019-30. [PMID: 16495350 DOI: 10.1093/molbev/msj113] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Here, we address a much-debated topic: is there or is there not an organismal tree of gamma-proteobacteria that can be unambiguously inferred from a core of shared genes? We apply several recently developed analytical methods to this problem, for the first time. Our heat map analyses of P values and of bootstrap bipartitions show the presence of conflicting phylogenetic signals among these core genes. Our synthesis reconstruction suggests that at least 10% of these genes have been laterally transferred during the divergence of the gamma-proteobacteria, and that for most of the rest, there is too little phylogenetic signal to permit firm conclusions about the mode of inheritance. Although there is clearly a central tendency in this data set (it is far from random), lateral gene transfers cannot be ruled out. Instead of an organismal tree, we propose that these core genes could be used to define a more subtle and partially reticulated pattern of relationships.
Collapse
Affiliation(s)
- E Susko
- Genome Atlantic, Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | | | | | | |
Collapse
|
40
|
Liu J, Glazko G, Mushegian A. Protein repertoire of double-stranded DNA bacteriophages. Virus Res 2006; 117:68-80. [PMID: 16490276 DOI: 10.1016/j.virusres.2006.01.015] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2005] [Revised: 01/11/2006] [Accepted: 01/18/2006] [Indexed: 01/21/2023]
Abstract
The complexity and diversity of phage gene sets, which are produced by rapid evolution of phage genomes and rampant gene exchanges among phages, hamper the efforts to decipher the evolutionary relationships between individual phage proteins and reconstruct the complete set of evolutionary events leading to the known phages. To start unraveling the natural history of phages, we built the phage orthologous groups (POGs), a natural system of phage protein families that includes 6378 genes from 164 complete genome sequences of double-stranded DNA bacteriophages. Phage proteomes have high POG coverage: on average, 39 genes per phage genome belong to POGs, which is close to half of all genes in most phages. In an agreement with the notion of phage role in horizontal gene transfer, we see many cases of likely gene exchange between phages and their microbial hosts. At the same time, about 80% of all POGs are highly specific to phage genomes and are not commonly found in microbial genomes, indicating coherence and large degree of evolutionary independence of phage gene sets. The information on orthologous genes is essential for evolutionary classification of known bacteriophages and for reconstruction of ancestral phage genomes.
Collapse
Affiliation(s)
- Jing Liu
- Stowers Institute for Medical Research, 1000 E 50th St., Kansas City, MO 64110, USA
| | | | | |
Collapse
|
41
|
Beiko RG, Hamilton N. Phylogenetic identification of lateral genetic transfer events. BMC Evol Biol 2006; 6:15. [PMID: 16472400 PMCID: PMC1431587 DOI: 10.1186/1471-2148-6-15] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2005] [Accepted: 02/11/2006] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Lateral genetic transfer can lead to disagreements among phylogenetic trees comprising sequences from the same set of taxa. Where topological discordance is thought to have arisen through genetic transfer events, tree comparisons can be used to identify the lineages that may have shared genetic information. An 'edit path' of one or more transfer events can be represented with a series of subtree prune and regraft (SPR) operations, but finding the optimal such set of operations is NP-hard for comparisons between rooted trees, and may be so for unrooted trees as well. RESULTS Efficient Evaluation of Edit Paths (EEEP) is a new tree comparison algorithm that uses evolutionarily reasonable constraints to identify and eliminate many unproductive search avenues, reducing the time required to solve many edit path problems. The performance of EEEP compares favourably to that of other algorithms when applied to strictly bifurcating trees with specified numbers of SPR operations. We also used EEEP to recover edit paths from over 19,000 unrooted, incompletely resolved protein trees containing up to 144 taxa as part of a large phylogenomic study. While inferred protein trees were far more similar to a reference supertree than random trees were to each other, the phylogenetic distance spanned by random versus inferred transfer events was similar, suggesting that real transfer events occur most frequently between closely related organisms, but can span large phylogenetic distances as well. While most of the protein trees examined here were very similar to the reference supertree, requiring zero or one edit operations for reconciliation, some trees implied up to 40 transfer events within a single orthologous set of proteins. CONCLUSION Since sequence trees typically have no implied root and may contain unresolved or multifurcating nodes, the strategy implemented in EEEP is the most appropriate for phylogenomic analyses. The high degree of consistency among inferred protein trees shows that vertical inheritance is the dominant pattern of evolution, at least for the set of organisms considered here. However, the edit paths inferred using EEEP suggest an important role for genetic transfer in the evolution of microbial genomes as well.
Collapse
Affiliation(s)
- Robert G Beiko
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia and ARC Centre in Bioinformatics, Australia
| | - Nicholas Hamilton
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia and ARC Centre in Bioinformatics, Australia
- Advanced Computational Modelling Centre, The University of Queensland, Brisbane, Australia
| |
Collapse
|
42
|
Bapteste E, Susko E, Leigh J, MacLeod D, Charlebois RL, Doolittle WF. Do orthologous gene phylogenies really support tree-thinking? BMC Evol Biol 2005; 5:33. [PMID: 15913459 PMCID: PMC1156881 DOI: 10.1186/1471-2148-5-33] [Citation(s) in RCA: 148] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2005] [Accepted: 05/24/2005] [Indexed: 11/17/2022] Open
Abstract
Background Since Darwin's Origin of Species, reconstructing the Tree of Life has been a goal of evolutionists, and tree-thinking has become a major concept of evolutionary biology. Practically, building the Tree of Life has proven to be tedious. Too few morphological characters are useful for conducting conclusive phylogenetic analyses at the highest taxonomic level. Consequently, molecular sequences (genes, proteins, and genomes) likely constitute the only useful characters for constructing a phylogeny of all life. For this reason, tree-makers expect a lot from gene comparisons. The simultaneous study of the largest number of molecular markers possible is sometimes considered to be one of the best solutions in reconstructing the genealogy of organisms. This conclusion is a direct consequence of tree-thinking: if gene inheritance conforms to a tree-like model of evolution, sampling more of these molecules will provide enough phylogenetic signal to build the Tree of Life. The selection of congruent markers is thus a fundamental step in simultaneous analysis of many genes. Results Heat map analyses were used to investigate the congruence of orthologues in four datasets (archaeal, bacterial, eukaryotic and alpha-proteobacterial). We conclude that we simply cannot determine if a large portion of the genes have a common history. In addition, none of these datasets can be considered free of lateral gene transfer. Conclusion Our phylogenetic analyses do not support tree-thinking. These results have important conceptual and practical implications. We argue that representations other than a tree should be investigated in this case because a non-critical concatenation of markers could be highly misleading.
Collapse
Affiliation(s)
- E Bapteste
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Biochemistry & Molecular Biology, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - E Susko
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Mathematics and Statistics, Halifax, Nova Scotia, Canada
| | - J Leigh
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Biochemistry & Molecular Biology, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - D MacLeod
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Biochemistry & Molecular Biology, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - RL Charlebois
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Biochemistry & Molecular Biology, 5850 College St., Halifax, NS, B3H 1X5, Canada
| | - WF Doolittle
- GenomeAtlantic, 1721 Lower Water Street, Suite 401, Halifax, NS, B3J 1S5, Canada
- Dalhousie University, Department of Biochemistry & Molecular Biology, 5850 College St., Halifax, NS, B3H 1X5, Canada
| |
Collapse
|