1
|
López Sánchez A, Lafond M. Predicting horizontal gene transfers with perfect transfer networks. Algorithms Mol Biol 2024; 19:6. [PMID: 38321476 PMCID: PMC10848447 DOI: 10.1186/s13015-023-00242-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 10/25/2023] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. OUR CONTRIBUTIONS We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa under the assumption that characters have unique births, and that once a character is gained it is rarely lost. Examples of such traits include transposable elements, biochemical markers and emergence of organelles, just to name a few. We study the differences between our model and two similar models: perfect phylogenetic networks and ancestral recombination networks. Our goals are to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa. We finally provide lower and upper bounds on the number of transfers required to explain a set of taxa, in the worst case.
Collapse
Affiliation(s)
| | - Manuel Lafond
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, Canada
| |
Collapse
|
2
|
Hellmuth M, Stadler PF. The Theory of Gene Family Histories. Methods Mol Biol 2024; 2802:1-32. [PMID: 38819554 DOI: 10.1007/978-1-0716-3838-5_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Most genes are part of larger families of evolutionary-related genes. The history of gene families typically involves duplications and losses of genes as well as horizontal transfers into other organisms. The reconstruction of detailed gene family histories, i.e., the precise dating of evolutionary events relative to phylogenetic tree of the underlying species has remained a challenging topic despite their importance as a basis for detailed investigations into adaptation and functional evolution of individual members of the gene family. The identification of orthologs, moreover, is a particularly important subproblem of the more general setting considered here. In the last few years, an extensive body of mathematical results has appeared that tightly links orthology, a formal notion of best matches among genes, and horizontal gene transfer. The purpose of this chapter is to broadly outline some of the key mathematical insights and to discuss their implication for practical applications. In particular, we focus on tree-free methods, i.e., methods to infer orthology or horizontal gene transfer as well as gene trees, species trees, and reconciliations between them without using a priori knowledge of the underlying trees or statistical models for the inference of phylogenetic trees. Instead, the initial step aims to extract binary relations among genes.
Collapse
Affiliation(s)
- Marc Hellmuth
- Department of Mathematics, Faculty of Science, Stockholm University, Stockholm, Sweden
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Leipzig University, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad Nacional de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
3
|
Schaller D, Hartmann T, Lafond M, Stadler PF, Wieseke N, Hellmuth M. Relative timing information and orthology in evolutionary scenarios. Algorithms Mol Biol 2023; 18:16. [PMID: 37940998 PMCID: PMC10634191 DOI: 10.1186/s13015-023-00240-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 08/23/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph. RESULTS Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.
Collapse
Affiliation(s)
- David Schaller
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, 04107, Germany
| | - Tom Hartmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, 04107, Germany
| | - Manuel Lafond
- Department of Computer Science, Université de Sherbrooke, 2500 boul. de l'Université, Sherbrooke, J1K 2R1, Canada
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, 04107, Germany
- Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv), and Leipzig Research Center for Civilization Diseases, Universität Leipzig, Augustusplatz 12, Leipzig, 04107, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, 04109, Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, 1090, Austria
- Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Ciudad Universitaria, Bogotá, 111321, DC, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501, USA
| | - Nicolas Wieseke
- Swarm Intelligence and Complex Systems Group, Faculty of Mathematics and Computer Science, Leipzig University, Augustusplatz 10, Leipzig, 04109, Germany
| | - Marc Hellmuth
- Department of Mathematics, Faculty of Science, Stockholm University, Stockholm, 10691, Sweden.
| |
Collapse
|
4
|
Teixeira M, Pillay S, Urhan A, Abeel T. SHIP: identifying antimicrobial resistance gene transfer between plasmids. Bioinformatics 2023; 39:btad612. [PMID: 37796811 PMCID: PMC10598575 DOI: 10.1093/bioinformatics/btad612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 08/29/2023] [Accepted: 10/04/2023] [Indexed: 10/07/2023] Open
Abstract
MOTIVATION Plasmids are carriers for antimicrobial resistance (AMR) genes and can exchange genetic material with other structures, contributing to the spread of AMR. There is no reliable approach to identify the transfer of AMR genes across plasmids. This is mainly due to the absence of a method to assess the phylogenetic distance of plasmids, as they show large DNA sequence variability. Identifying and quantifying such transfer can provide novel insight into the role of small mobile elements and resistant plasmid regions in the spread of AMR. RESULTS We developed SHIP, a novel method to quantify plasmid similarity based on the dynamics of plasmid evolution. This allowed us to find conserved fragments containing AMR genes in structurally different and phylogenetically distant plasmids, which is evidence for lateral transfer. Our results show that regions carrying AMR genes are highly mobilizable between plasmids through transposons, integrons, and recombination events, and contribute to the spread of AMR. Identified transferred fragments include a multi-resistant complex class 1 integron in Escherichia coli and Klebsiella pneumoniae, and a region encoding tetracycline resistance transferred through recombination in Enterococcus faecalis. AVAILABILITY AND IMPLEMENTATION The code developed in this work is available at https://github.com/AbeelLab/plasmidHGT.
Collapse
Affiliation(s)
- Marco Teixeira
- Faculty of Engineering, University of Porto, Porto 4200-465, Portugal
- INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, Porto 4200-465, Portugal
- Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft 2628 XE, The Netherlands
| | - Stephanie Pillay
- Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft 2628 XE, The Netherlands
| | - Aysun Urhan
- Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft 2628 XE, The Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
| | - Thomas Abeel
- Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft 2628 XE, The Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
| |
Collapse
|