1
|
Menet H, Daubin V, Tannier E. Phylogenetic reconciliation. PLoS Comput Biol 2022; 18:e1010621. [PMID: 36327227 PMCID: PMC9632901 DOI: 10.1371/journal.pcbi.1010621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Affiliation(s)
- Hugo Menet
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- * E-mail: (VD); (ET)
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- Inria, centre de recherche de Lyon, Villeurbanne, France
- * E-mail: (VD); (ET)
| |
Collapse
|
2
|
Briand S, Dessimoz C, El-Mabrouk N, Nevers Y. A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem. Syst Biol 2022; 71:1391-1403. [PMID: 35426933 PMCID: PMC9557742 DOI: 10.1093/sysbio/syac028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/02/2022] [Accepted: 04/07/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION A large variety of pairwise measures of similarity or dissimilarity have been developed for comparing phylogenetic trees, e.g. species trees or gene trees. Due to its intuitive definition in terms of tree clades and bipartitions and its computational efficiency, the Robinson-Foulds (RF) distance is the most widely used for trees with unweighted edges and labels restricted to leaves (representing the genetic elements being compared). However, in the case of gene trees, an important information revealing the nature of the homologous relation between gene pairs (orthologs, paralogs, xenologs) is the type of event associated to each internal node of the tree, typically speciations or duplications, but other types of events may also be considered, such as horizontal gene transfers. This labeling of internal nodes is usually inferred from a gene tree/species tree reconciliation method. Here, we address the problem of comparing such event-labeled trees. The problem differs from the classical problem of comparing uniformly labeled trees (all labels belonging to the same alphabet) that may be done using the Tree Edit Distance (TED) mainly due to the fact that, in our case, two different alphabets are considered for the leaves and internal nodes of the tree, and leaves are not affected by edit operations. RESULTS We propose an extension of the RF distance to event-labeled trees, based on edit operations comparable to those considered for TED: node insertion, node deletion and label substitution. We show that this new Labeled Robinson Foulds (LRF) distance can be computed in linear time, in addition of maintaining other desirable properties: being a metric, reducing to RF for trees with no labels on internal nodes and maintaining an intuitive interpretation. The algorithm for computing the LRF distance enables novel analyses on event-label trees such as reconciled gene trees. Here, we use it to study the impact of taxon sampling on labeled gene tree inference, and conclude that denser taxon sampling yields trees with better topology but worse labeling.
Collapse
Affiliation(s)
- Samuel Briand
- Département d'informatique et de recherche opérationnelle (DIRO), Universit de Montral, Canada
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Switzerland.,Centre for Lifes Origins and Evolution, Genetics Evolution and Environment, University College London, UK.,Department of Computer Science, University College London, UK.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nadia El-Mabrouk
- Département d'informatique et de recherche opérationnelle (DIRO), Universit de Montral, Canada
| | - Yannis Nevers
- Department of Computational Biology, University of Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
3
|
Wang Y, Mary A, Sagot MF, Sinaimeri B. Efficiently sparse listing of classes of optimal cophylogeny reconciliations. Algorithms Mol Biol 2022; 17:2. [PMID: 35168648 PMCID: PMC8845303 DOI: 10.1186/s13015-022-00206-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/25/2022] [Indexed: 12/02/2022] Open
Abstract
Background Cophylogeny reconciliation is a powerful method for analyzing host-parasite (or host-symbiont) co-evolution. It models co-evolution as an optimization problem where the set of all optimal solutions may represent different biological scenarios which thus need to be analyzed separately. Despite the significant research done in the area, few approaches have addressed the problem of helping the biologist deal with the often huge space of optimal solutions. Results In this paper, we propose a new approach to tackle this problem. We introduce three different criteria under which two solutions may be considered biologically equivalent, and then we propose polynomial-delay algorithms that enumerate only one representative per equivalence class (without listing all the solutions). Conclusions Our results are of both theoretical and practical importance. Indeed, as shown by the experiments, we are able to significantly reduce the space of optimal solutions while still maintaining important biological information about the whole space.
Collapse
|
4
|
Grueter M, Duran K, Ramalingam R, Libeskind-Hadas R. Reconciliation Reconsidered: In Search of a Most Representative Reconciliation in the Duplication-Transfer-Loss Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2136-2143. [PMID: 31722482 DOI: 10.1109/tcbb.2019.2942015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Maximum parsimony reconciliation is a fundamental technique for studying the evolutionary histories of pairs of entities such as genes and species, parasites and hosts, and species and their biogeographical habitats. In these contexts, reconciliation is generally performed using the duplication-transfer-loss (DTL) model in a maximum parsimony framework. While efficient maximum parsimony reconciliation algorithms are known for the DTL model, the number of such reconciliations can grow exponentially with the sizes of the two phylogenetic trees. Choosing a maximum parsimony reconciliation arbitrarily may lead to conclusions that are not supported, and may even be contradicted, by other equally optimal reconciliations. This paper addresses the fundamental problem of how well a single reconciliation can represent the entire space of optimal reconciliations.
Collapse
|
5
|
Santichaivekin S, Yang Q, Liu J, Mawhorter R, Jiang J, Wesley T, Wu YC, Libeskind-Hadas R. eMPRess: a systematic cophylogeny reconciliation tool. Bioinformatics 2021; 37:2481-2482. [PMID: 33216126 DOI: 10.1093/bioinformatics/btaa978] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 11/05/2020] [Accepted: 11/10/2020] [Indexed: 11/12/2022] Open
Abstract
SUMMARY We describe eMPRess, a software program for phylogenetic tree reconciliation under the duplication-transfer-loss model that systematically addresses the problems of choosing event costs and selecting representative solutions, enabling users to make more robust inferences. AVAILABILITY AND IMPLEMENTATION eMPRess is freely available at http://www.cs.hmc.edu/empress. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Qing Yang
- Department of Computer Science, Harvey Mudd College, Claremont, CA 91711, USA
| | - Jingyi Liu
- Department of Computer Science, Harvey Mudd College, Claremont, CA 91711, USA
| | - Ross Mawhorter
- Department of Computer Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Justin Jiang
- Department of Computer Science, Harvey Mudd College, Claremont, CA 91711, USA
| | - Trenton Wesley
- Department of Computer Science, Harvey Mudd College, Claremont, CA 91711, USA
| | - Yi-Chieh Wu
- Department of Computer Science, Harvey Mudd College, Claremont, CA 91711, USA
| | - Ran Libeskind-Hadas
- Department of Computer Science, Harvey Mudd College, Claremont, CA 91711, USA
| |
Collapse
|
6
|
Wang Y, Mary A, Sagot MF, Sinaimeri B. Capybara: equivalence ClAss enumeration of coPhylogenY event-BAsed ReconciliAtions. Bioinformatics 2021; 36:4197-4199. [PMID: 32556075 DOI: 10.1093/bioinformatics/btaa498] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 05/04/2020] [Accepted: 05/13/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Phylogenetic tree reconciliation is the method of choice in analyzing host-symbiont systems. Despite the many reconciliation tools that have been proposed in the literature, two main issues remain unresolved: (i) listing suboptimal solutions (i.e. whose score is 'close' to the optimal ones) and (ii) listing only solutions that are biologically different 'enough'. The first issue arises because the optimal solutions are not always the ones biologically most significant; providing many suboptimal solutions as alternatives for the optimal ones is thus very useful. The second one is related to the difficulty to analyze an often huge number of optimal solutions. In this article, we propose Capybara that addresses both of these problems in an efficient way. Furthermore, it includes a tool for visualizing the solutions that significantly helps the user in the process of analyzing the results. AVAILABILITY AND IMPLEMENTATION The source code, documentation and binaries for all platforms are freely available at https://capybara-doc.readthedocs.io/. CONTACT yishu.wang@univ-lyon1.fr or blerina.sinaimeri@inria.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yishu Wang
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France.,Inria Grenoble, Montbonnot 38334, France
| | - Arnaud Mary
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France.,Inria Grenoble, Montbonnot 38334, France
| | - Marie-France Sagot
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France.,Inria Grenoble, Montbonnot 38334, France
| | - Blerina Sinaimeri
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France.,Inria Grenoble, Montbonnot 38334, France
| |
Collapse
|
7
|
Santichaivekin S, Mawhorter R, Libeskind-Hadas R. An efficient exact algorithm for computing all pairwise distances between reconciliations in the duplication-transfer-loss model. BMC Bioinformatics 2019; 20:636. [PMID: 31842734 PMCID: PMC6915856 DOI: 10.1186/s12859-019-3203-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Maximum parsimony reconciliation in the duplication-transfer-loss model is widely used in studying the evolutionary histories of genes and species and in studying coevolution of parasites and their hosts and pairs of symbionts. While efficient algorithms are known for finding maximum parsimony reconciliations, the number of reconciliations can grow exponentially in the size of the trees. An understanding of the space of maximum parsimony reconciliations is necessary to determine whether a single reconciliation can adequately represent the space or whether multiple representative reconciliations are needed. RESULTS We show that for any instance of the reconciliation problem, the distribution of pairwise distances can be computed exactly by an efficient polynomial-time algorithm with respect to several different distance metrics. We describe the algorithm, analyze its asymptotic worst-case running time, and demonstrate its utility and viability on a large biological dataset. CONCLUSIONS This result provides new insights into the structure of the space of maximum parsimony reconciliations. These insights are likely to be useful in the wide range of applications that employ reconciliation methods.
Collapse
Affiliation(s)
| | - Ross Mawhorter
- Department of Computer Science, Harvey Mudd College, Claremont, CA, US
| | | |
Collapse
|
8
|
Mawhorter R, Libeskind-Hadas R. Hierarchical clustering of maximum parsimony reconciliations. BMC Bioinformatics 2019; 20:612. [PMID: 31775628 PMCID: PMC6882150 DOI: 10.1186/s12859-019-3223-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 11/14/2019] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Maximum parsimony reconciliation in the duplication-transfer-loss model is a widely-used method for analyzing the evolutionary histories of pairs of entities such as hosts and parasites, symbiont species, and species and genes. While efficient algorithms are known for finding maximum parsimony reconciliations, the number of such reconciliations can be exponential in the size of the trees. Since these reconciliations can differ substantially from one another, making inferences from any one reconciliation may lead to conclusions that are not supported, or may even be contradicted, by other maximum parsimony reconciliations. Therefore, there is a need to find small sets of best representative reconciliations when the space of solutions is large and diverse. RESULTS We provide a general framework for hierarchical clustering the space of maximum parsimony reconciliations. We demonstrate this framework for two specific linkage criteria, one that seeks to maximize the average support of the events found in the reconciliations in each cluster and the other that seeks to minimize the distance between reconciliations in each cluster. We analyze the asymptotic worst-case running times and provide experimental results that demonstrate the viability and utility of this approach. CONCLUSIONS The hierarchical clustering algorithm method proposed here provides a new approach to find a set of representative reconciliations in the potentially vast and diverse space of maximum parsimony reconciliations.
Collapse
Affiliation(s)
- Ross Mawhorter
- Department of Computer Science, Harvey Mudd College, Claremont, California, USA
| | - Ran Libeskind-Hadas
- Department of Computer Science, Harvey Mudd College, Claremont, California, USA.
| |
Collapse
|
9
|
Huber KT, Moulton V, Sagot MF, Sinaimeri B. Exploring and Visualizing Spaces of Tree Reconciliations. Syst Biol 2018; 68:607-618. [DOI: 10.1093/sysbio/syy075] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Revised: 10/31/2018] [Accepted: 11/02/2018] [Indexed: 12/22/2022] Open
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, UK
| | - Marie-France Sagot
- Inria Grenoble - Rhône-Alpes; Inovallée 655, Avenue de l’Europe, Montbonnot, 38334 Saint Ismier Cedex, France
- Université de Lyon, F-69000 Lyon, France
- Université Lyon 1, CNRS, UMR5558, 43 Boulevard du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
| | - Blerina Sinaimeri
- Inria Grenoble - Rhône-Alpes; Inovallée 655, Avenue de l’Europe, Montbonnot, 38334 Saint Ismier Cedex, France
- Université de Lyon, F-69000 Lyon, France
- Université Lyon 1, CNRS, UMR5558, 43 Boulevard du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
| |
Collapse
|