1
|
Borkhert EV, Pushkova EN, Nasimovich YA, Kostina MV, Vasilieva NV, Murataev RA, Novakovskiy RO, Dvorianinova EM, Povkhova LV, Zhernova DA, Turba AA, Sigova EA, Snezhkina AV, Kudryavtseva AV, Bolsheva NL, Krasnov GS, Dmitriev AA, Melnikova NV. Sex-determining region complements traditionally used in phylogenetic studies nuclear and chloroplast sequences in investigation of Aigeiros Duby and Tacamahaca Spach poplars (genus Populus L., Salicaceae). FRONTIERS IN PLANT SCIENCE 2023; 14:1204899. [PMID: 37860260 PMCID: PMC10582643 DOI: 10.3389/fpls.2023.1204899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/07/2023] [Indexed: 10/21/2023]
Abstract
Members of the genus Populus L. play an important role in the formation of forests in the northern hemisphere and are used in urban landscaping and timber production. Populus species of closely related sections show extensive hybridization. Therefore, the systematics of the genus is rather complicated, especially for poplars of hybrid origin. We aimed to assess the efficiency of application of the sex-determining region (SDR) in addition to the nuclear and chloroplast genome loci traditionally used in phylogenetic studies of poplars to investigate relationships in sections Aigeiros Duby and Tacamahaca Spach. Targeted deep sequencing of NTS 5S rDNA, ITS, DSH 2, DSH 5, DSH 8, DSH 12, DSH 29, 6, 15, 16, X18, trnG-psbK-psbI, rps2-rpoC2, rpoC2-rpoC1, as well as SDR and ARR17 gene was performed for 379 poplars. The SDR and ARR17 gene together with traditionally used multicopy and single-copy loci of nuclear and chloroplast DNA allowed us to obtain a clustering that is most consistent with poplar systematics based on morphological data and to shed light on several controversial hypotheses about the origin of the studied taxa (for example, the inexpediency of separating P. koreana, P. maximowiczii, and P. suaveolens into different species). We present a scheme of relationships between species and hybrids of sections Aigeiros and Tacamahaca based on molecular genetic, morphological, and geographical data. The geographical proximity of species and, therefore, the possibility of hybridization between them appear to be more important than the affiliation of species to the same section. We speculate that sections Aigeiros and Tacamahaca are distinguished primarily on an ecological principle (plain and mountain poplars) rather than on a genetic basis. Joint analysis of sequencing data for the SDR and chloroplast genome loci allowed us to determine the ancestors of P. × petrovskoe - P. laurifolia (female tree) × P. × canadensis (male tree), and P. × rasumovskoe - P. nigra (female tree) × P. suaveolens (male tree). Thus, the efficiency of using the SDR for the study of poplars of sections Aigeiros and Tacamahaca and the prospects of its use for the investigation of species of the genus Populus were shown.
Collapse
Affiliation(s)
- Elena V. Borkhert
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Elena N. Pushkova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Yuri A. Nasimovich
- State Environmental Protection Budgetary Institution of Moscow “Mospriroda”, Moscow, Russia
| | - Marina V. Kostina
- Institute of Biology and Chemistry, Moscow Pedagogical State University, Moscow, Russia
| | | | - Ramil A. Murataev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Roman O. Novakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Ekaterina M. Dvorianinova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
| | - Liubov V. Povkhova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
| | - Daiana A. Zhernova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Anastasia A. Turba
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Elizaveta A. Sigova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
| | | | - Anna V. Kudryavtseva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Nadezhda L. Bolsheva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - George S. Krasnov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Alexey A. Dmitriev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
| | - Nataliya V. Melnikova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
2
|
Menet H, Daubin V, Tannier E. Phylogenetic reconciliation. PLoS Comput Biol 2022; 18:e1010621. [PMID: 36327227 PMCID: PMC9632901 DOI: 10.1371/journal.pcbi.1010621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Affiliation(s)
- Hugo Menet
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- * E-mail: (VD); (ET)
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- Inria, centre de recherche de Lyon, Villeurbanne, France
- * E-mail: (VD); (ET)
| |
Collapse
|
3
|
Phylogeny and evolution of the genus Cervus (Cervidae, Mammalia) as revealed by complete mitochondrial genomes. Sci Rep 2022; 12:16381. [PMID: 36180508 PMCID: PMC9525267 DOI: 10.1038/s41598-022-20763-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 09/19/2022] [Indexed: 11/24/2022] Open
Abstract
Mitochondrial DNA (mtDNA) lineages are recognized as important components of intra- and interspecific biodiversity, and allow to reveal colonization routes and phylogeographic structure of many taxa. Among these is the genus Cervus that is widely distributed across the Holarctic. We obtained sequences of complete mitochondrial genomes from 13 Cervus taxa and included them in global phylogenetic analyses of 71 Cervinae mitogenomes. The well-resolved phylogenetic trees confirmed Cervus to be monophyletic. Molecular dating based on several fossil calibration points revealed that ca. 2.6 Mya two main mitochondrial lineages of Cervus separated in Central Asia, the Western (including C. hanglu and C. elaphus) and the Eastern (comprising C. albirostris, C. canadensis and C. nippon). We also observed convergent changes in the composition of some mitochondrial genes in C. hanglu of the Western lineage and representatives of the Eastern lineage. Several subspecies of C. nippon and C. hanglu have accumulated a large portion of deleterious substitutions in their mitochondrial protein-coding genes, probably due to drift in the wake of decreasing population size. In contrast to previous studies, we found that the relic haplogroup B of C. elaphus was sister to all other red deer lineages and that the Middle-Eastern haplogroup E shared a common ancestor with the Balkan haplogroup C. Comparison of the mtDNA phylogenetic tree with a published nuclear genome tree may imply ancient introgressions of mtDNA between different Cervus species as well as from the common ancestor of South Asian deer, Rusa timorensis and R. unicolor, to the Cervus clade.
Collapse
|
4
|
Tabaszewski P, Gorecki P, Markin A, Anderson T, Eulenstein O. Consensus of All Solutions for Intractable Phylogenetic Tree Inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:149-161. [PMID: 31613775 DOI: 10.1109/tcbb.2019.2947051] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Solving median tree problems is a classic approach for inferring species trees from a collection of discordant gene trees. Median tree problems are typically NP-hard and dealt with by local search heuristics. Unfortunately, such heuristics generally lack provable correctness and precision. Algorithmic advances addressing this uncertainty have led to exact dynamic programming formulations suitable to solve a well-studied group of median tree problems for smaller phylogenetic analyses. However, these formulations allow computing only very few optimal species trees out of possibly many such trees, and phylogenetic studies often require the analysis of all optimal solutions through their consensus tree. Here, we describe a significant algorithmic modification of the dynamic programming formulations that compute the cluster counts of all optimal species trees from which various types of consensus trees can be efficiently computed. Through experimental studies, we demonstrate that our parallel implementation of the modified dynamic programming formulation is more efficient than a previous implementation of the original formulation. Finally, we show that the parallel implementation can rapidly identify novel reassorted influenza A viruses potentially facilitating pandemic preparedness efforts.
Collapse
|
5
|
Urantówka AD, Kroczak A, Mackiewicz P. New view on the organization and evolution of Palaeognathae mitogenomes poses the question on the ancestral gene rearrangement in Aves. BMC Genomics 2020; 21:874. [PMID: 33287726 PMCID: PMC7720580 DOI: 10.1186/s12864-020-07284-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 11/26/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Bird mitogenomes differ from other vertebrates in gene rearrangement. The most common avian gene order, identified first in Gallus gallus, is considered ancestral for all Aves. However, other rearrangements including a duplicated control region and neighboring genes have been reported in many representatives of avian orders. The repeated regions can be easily overlooked due to inappropriate DNA amplification or genome sequencing. This raises a question about the actual prevalence of mitogenomic duplications and the validity of the current view on the avian mitogenome evolution. In this context, Palaeognathae is especially interesting because is sister to all other living birds, i.e. Neognathae. So far, a unique duplicated region has been found in one palaeognath mitogenome, that of Eudromia elegans. RESULTS Therefore, we applied an appropriate PCR strategy to look for omitted duplications in other palaeognaths. The analyses revealed the duplicated control regions with adjacent genes in Crypturellus, Rhea and Struthio as well as ND6 pseudogene in three moas. The copies are very similar and were subjected to concerted evolution. Mapping the presence and absence of duplication onto the Palaeognathae phylogeny indicates that the duplication was an ancestral state for this avian group. This feature was inherited by early diverged lineages and lost two times in others. Comparison of incongruent phylogenetic trees based on mitochondrial and nuclear sequences showed that two variants of mitogenomes could exist in the evolution of palaeognaths. Data collected for other avian mitogenomes revealed that the last common ancestor of all birds and early diverging lineages of Neoaves could also possess the mitogenomic duplication. CONCLUSIONS The duplicated control regions with adjacent genes are more common in avian mitochondrial genomes than it was previously thought. These two regions could increase effectiveness of replication and transcription as well as the number of replicating mitogenomes per organelle. In consequence, energy production by mitochondria may be also more efficient. However, further physiological and molecular analyses are necessary to assess the potential selective advantages of the mitogenome duplications.
Collapse
Affiliation(s)
- Adam Dawid Urantówka
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, 7 Kozuchowska Street, 51-631 Wroclaw, Poland
| | - Aleksandra Kroczak
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, 7 Kozuchowska Street, 51-631 Wroclaw, Poland
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, 14a Fryderyka Joliot-Curie Street, 50-383 Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, 14a Fryderyka Joliot-Curie Street, 50-383 Wrocław, Poland
| |
Collapse
|
6
|
Górecki P, Markin A, Eulenstein O. Exact median-tree inference for unrooted reconciliation costs. BMC Evol Biol 2020; 20:136. [PMID: 33115401 PMCID: PMC7593691 DOI: 10.1186/s12862-020-01700-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Solving median tree problems under tree reconciliation costs is a classic and well-studied approach for inferring species trees from collections of discordant gene trees. These problems are NP-hard, and therefore are, in practice, typically addressed by local search heuristics. So far, however, such heuristics lack any provable correctness or precision. Further, even for small phylogenetic studies, it has been demonstrated that local search heuristics may only provide sub-optimal solutions. Obviating such heuristic uncertainties are exact dynamic programming solutions that allow solving tree reconciliation problems for smaller phylogenetic studies. Despite these promises, such exact solutions are only suitable for credibly rooted input gene trees, which constitute only a tiny fraction of the readily available gene trees. Standard gene tree inference approaches provide only unrooted gene trees and accurately rooting such trees is often difficult, if not impossible. Results Here, we describe complex dynamic programming solutions that represent the first nonnaïve exact solutions for solving the tree reconciliation problems for unrooted input gene trees. Further, we show that the asymptotic runtime of the proposed solutions does not increase when compared to the most time-efficient dynamic programming solutions for rooted input trees. Conclusions In an experimental evaluation, we demonstrate that the described solutions for unrooted gene trees are, like the solutions for rooted input gene trees, suitable for smaller phylogenetic studies. Finally, for the first time, we study the accuracy of classic local search heuristics for unrooted tree reconciliation problems.
Collapse
Affiliation(s)
- Paweł Górecki
- University of Warsaw, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, 02-097, Poland.
| | - Alexey Markin
- Department of Computer Science, Iowa State University, Atanasoff Hall 212, Ames, 50011, USA
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Atanasoff Hall 212, Ames, 50011, USA
| |
Collapse
|
7
|
How to Study Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
8
|
Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
9
|
Systematics Association Special Volumes. Cladistics 2020. [DOI: 10.1017/9781139047678.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
10
|
Relationship Diagrams. Cladistics 2020. [DOI: 10.1017/9781139047678.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
11
|
The Separation of Classification and Phylogenetics. Cladistics 2020. [DOI: 10.1017/9781139047678.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
12
|
Beyond Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
13
|
The Interrelationships of Organisms. Cladistics 2020. [DOI: 10.1017/9781139047678.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
14
|
How to Study Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
15
|
Modern Artificial Methods and Raw Data. Cladistics 2020. [DOI: 10.1017/9781139047678.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
16
|
Further Myths and More Misunderstandings. Cladistics 2020. [DOI: 10.1017/9781139047678.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
17
|
Afterword. Cladistics 2020. [DOI: 10.1017/9781139047678.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
18
|
Systematics: Exposing Myths. Cladistics 2020. [DOI: 10.1017/9781139047678.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
19
|
Essentialism and Typology. Cladistics 2020. [DOI: 10.1017/9781139047678.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
20
|
Beyond Classification: How to Study Phylogeny. Cladistics 2020. [DOI: 10.1017/9781139047678.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
21
|
How to Study Classification: ‘Total Evidence’ vs. ‘Consensus’, Character Congruence vs. Taxonomic Congruence, Simultaneous Analysis vs. Partitioned Data. Cladistics 2020. [DOI: 10.1017/9781139047678.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
|
22
|
What This Book Is About. Cladistics 2020. [DOI: 10.1017/9781139047678.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
23
|
How to Study Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
24
|
The Cladistic Programme. Cladistics 2020. [DOI: 10.1017/9781139047678.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
25
|
Index. Cladistics 2020. [DOI: 10.1017/9781139047678.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
26
|
Parameters of Classification: Ordo Ab Chao. Cladistics 2020. [DOI: 10.1017/9781139047678.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
27
|
Monothetic and Polythetic Taxa. Cladistics 2020. [DOI: 10.1017/9781139047678.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
28
|
How to Study Classification: Consensus Techniques and General Classifications. Cladistics 2020. [DOI: 10.1017/9781139047678.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
29
|
Non-taxa or the Absence of –Phyly: Paraphyly and Aphyly. Cladistics 2020. [DOI: 10.1017/9781139047678.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
30
|
Introduction: Carving Nature at Its Joints, or Why Birds Are Not Dinosaurs and Men Are Not Apes. Cladistics 2020. [DOI: 10.1017/9781139047678.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
31
|
Preface. Cladistics 2020. [DOI: 10.1017/9781139047678.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
32
|
Mackiewicz P, Urantówka AD, Kroczak A, Mackiewicz D. Resolving Phylogenetic Relationships within Passeriformes Based on Mitochondrial Genes and Inferring the Evolution of Their Mitogenomes in Terms of Duplications. Genome Biol Evol 2019; 11:2824-2849. [PMID: 31580435 PMCID: PMC6795242 DOI: 10.1093/gbe/evz209] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/30/2019] [Indexed: 12/29/2022] Open
Abstract
Mitochondrial genes are placed on one molecule, which implies that they should carry consistent phylogenetic information. Following this advantage, we present a well-supported phylogeny based on mitochondrial genomes from almost 300 representatives of Passeriformes, the most numerous and differentiated Aves order. The analyses resolved the phylogenetic position of paraphyletic Basal and Transitional Oscines. Passerida occurred divided into two groups, one containing Paroidea and Sylvioidea, whereas the other, Passeroidea and Muscicapoidea. Analyses of mitogenomes showed four types of rearrangements including a duplicated control region (CR) with adjacent genes. Mapping the presence and absence of duplications onto the phylogenetic tree revealed that the duplication was the ancestral state for passerines and was maintained in early diverged lineages. Next, the duplication could be lost and occurred independently at least four times according to the most parsimonious scenario. In some lineages, two CR copies have been inherited from an ancient duplication and highly diverged, whereas in others, the second copy became similar to the first one due to concerted evolution. The second CR copies accumulated over twice as many substitutions as the first ones. However, the second CRs were not completely eliminated and were retained for a long time, which suggests that both regions can fulfill an important role in mitogenomes. Phylogenetic analyses based on CR sequences subjected to the complex evolution can produce tree topologies inconsistent with real evolutionary relationships between species. Passerines with two CRs showed a higher metabolic rate in relation to their body mass.
Collapse
Affiliation(s)
- Paweł Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Poland
| | - Adam Dawid Urantówka
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, Poland
| | - Aleksandra Kroczak
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Poland
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, Poland
| | - Dorota Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Poland
| |
Collapse
|
33
|
Paszek J, Gorecki P. Efficient Algorithms for Genomic Duplication Models. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1515-1524. [PMID: 28541223 DOI: 10.1109/tcbb.2017.2706679] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
An important issue in evolutionary molecular biology is to discover genomic duplication episodes and their correspondence to the species tree. Existing approaches vary in the two fundamental aspects: the choice of evolutionary scenarios that model allowed locations of duplications in the species tree, and the rules of clustering gene duplications from gene trees into a single multiple duplication event. Here we study the method of clustering called minimum episodes for several models of allowed evolutionary scenarios with a focus on interval models in which every gene duplication has an interval consisting of allowed locations in the species tree. We present mathematical foundations for general genomic duplication problems. Next, we propose the first linear time and space algorithm for minimum episodes clustering jointly for any interval model and the algorithm for the most general model in which every evolutionary scenario is allowed. We also present a comparative study of different models of genomic duplication based on simulated and empirical datasets. We provided algorithms and tools that could be applied to solve efficiently minimum episodes clustering problems. Our comparative study helps to identify which model is the most reasonable choice in inferring genomic duplication events.
Collapse
|
34
|
Abstract
This chapter covers the theory and practice of ortholog gene set computation. In the theoretical part we give detailed and formal descriptions of the relevant concepts. We also cover the topic of graph-based clustering as a tool to compute ortholog gene sets. In the second part we provide an overview of practical considerations intended for researchers who need to determine orthologous genes from a collection of annotated genomes, briefly describing some of the most popular programs and resources currently available for this task.
Collapse
|
35
|
Urantowka AD, Kroczak A, Mackiewicz P. The influence of molecular markers and methods on inferring the phylogenetic relationships between the representatives of the Arini (parrots, Psittaciformes), determined on the basis of their complete mitochondrial genomes. BMC Evol Biol 2017; 17:166. [PMID: 28705202 PMCID: PMC5513162 DOI: 10.1186/s12862-017-1012-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 07/04/2017] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Conures are a morphologically diverse group of Neotropical parrots classified as members of the tribe Arini, which has recently been subjected to a taxonomic revision. The previously broadly defined Aratinga genus of this tribe has been split into the 'true' Aratinga and three additional genera, Eupsittula, Psittacara and Thectocercus. Popular markers used in the reconstruction of the parrots' phylogenies derive from mitochondrial DNA. However, current phylogenetic analyses seem to indicate conflicting relationships between Aratinga and other conures, and also among other Arini members. Therefore, it is not clear if the mtDNA phylogenies can reliably define the species tree. The inconsistencies may result from the variable evolution rate of the markers used or their weak phylogenetic signal. To resolve these controversies and to assess to what extent the phylogenetic relationships in the tribe Arini can be inferred from mitochondrial genomes, we compared representative Arini mitogenomes as well as examined the usefulness of the individual mitochondrial markers and the efficiency of various phylogenetic methods. RESULTS Single molecular markers produced inconsistent tree topologies, while different methods offered various topologies even for the same marker. A significant disagreement in these tree topologies occurred for cytb, nd2 and nd6 genes, which are commonly used in parrot phylogenies. The strongest phylogenetic signal was found in the control region and RNA genes. However, these markers cannot be used alone in inferring Arini phylogenies because they do not provide fully resolved trees. The most reliable phylogeny of the parrots under study is obtained only on the concatenated set of all mitochondrial markers. The analyses established significantly resolved relationships within the former Aratinga representatives and the main genera of the tribe Arini. Such mtDNA phylogeny can be in agreement with the species tree, owing to its match with synapomorphic features in plumage colouration. CONCLUSIONS Phylogenetic relationships inferred from single mitochondrial markers can be incorrect and contradictory. Therefore, such phylogenies should be considered with caution. Reliable results can be produced by concatenated sets of all or at least the majority of mitochondrial genes and the control region. The results advance a new view on the relationships among the main genera of Arini and resolve the inconsistencies between the taxa that were previously classified as the broadly defined genus Aratinga. Although gene and species trees do not always have to be consistent, the mtDNA phylogenies for Arini can reflect the species tree.
Collapse
Affiliation(s)
- Adam Dawid Urantowka
- Department of Genetics, Wroclaw University of Environmental and Life Sciences, ul. Kożuchowska7, 51-631, Wroclaw, Poland
| | - Aleksandra Kroczak
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Fryderyka Joliot-Curie 14a, 50-383 Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Fryderyka Joliot-Curie 14a, 50-383 Wrocław, Poland
| |
Collapse
|
36
|
Moon J, Eulenstein O. Synthesizing large-scale species trees using the strict consensus approach. J Bioinform Comput Biol 2017; 15:1740002. [PMID: 28513253 DOI: 10.1142/s0219720017400029] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Supertree problems are a standard tool for synthesizing large-scale species trees from a given collection of gene trees under some problem-specific objective. Unfortunately, these problems are typically NP-hard, and often remain so when their instances are restricted to rooted gene trees sampled from the same species. While a class of restricted supertree problems has been effectively addressed by the parameterized strict consensus approach, in practice, most gene trees are unrooted and sampled from different species. Here, we overcome this stringent limitation by describing efficient algorithms that are adopting the strict consensus approach to also handle unrestricted supertree problems. Finally, we demonstrate the performance of our algorithms in a comparative study with classic supertree heuristics using simulated and empirical data sets.
Collapse
Affiliation(s)
- Jucheol Moon
- 1 Department of Computer Science, Iowa State University Ames, Iowa 50010, USA
| | - Oliver Eulenstein
- 1 Department of Computer Science, Iowa State University Ames, Iowa 50010, USA
| |
Collapse
|
37
|
Drinkwater B, Charleston MA. Towards sub-quadratic time and space complexity solutions for the dated tree reconciliation problem. Algorithms Mol Biol 2016; 11:15. [PMID: 27213010 PMCID: PMC4875752 DOI: 10.1186/s13015-016-0077-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Accepted: 05/03/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recent coevolutionary analysis has considered tree topology as a means to reduce the asymptotic complexity associated with inferring the complex coevolutionary interrelationships that arise between phylogenetic trees. Targeted algorithmic design for specific tree topologies has to date been highly successful, with one recent formulation providing a logarithmic space complexity reduction for the dated tree reconciliation problem. METHODS In this work we build on this prior analysis providing a further asymptotic space reduction, by providing a new formulation for the dynamic programming table used by a number of popular coevolutionary analysis techniques. This model gives rise to a sub quadratic running time solution for the dated tree reconciliation problem for selected tree topologies, and is shown to be, in practice, the fastest method for solving the dated tree reconciliation problem for expected evolutionary trees. This result is achieved through the analysis of not only the topology of the trees considered for coevolutionary analysis, but also the underlying structure of the dynamic programming algorithms that are traditionally applied to such analysis. CONCLUSION The newly inferred theoretical complexity bounds introduced herein are then validated using a combination of synthetic and biological data sets, where the proposed model is shown to provide an [Formula: see text] space saving, while it is observed to run in half the time compared to the fastest known algorithm for solving the dated tree reconciliation problem. What is even more significant is that the algorithm derived herein is able to guarantee the optimality of its inferred solution, something that algorithms of comparable speed have to date been unable to achieve.
Collapse
Affiliation(s)
- Benjamin Drinkwater
- />School of Information Technologies, University of Sydney, 1 Cleveland St, Sydney, 2006 NSW Australia
| | - Michael A. Charleston
- />School of Information Technologies, University of Sydney, 1 Cleveland St, Sydney, 2006 NSW Australia
- />School of Physical Sciences, University Of Tasmania, Hobart, 7005 Tasmania Australia
| |
Collapse
|
38
|
Moon J, Lin HT, Eulenstein O. Consensus properties and their large-scale applications for the gene duplication problem. J Bioinform Comput Biol 2016; 14:1642005. [PMID: 27122201 DOI: 10.1142/s0219720016420051] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Solving the gene duplication problem is a classical approach for species tree inference from gene trees that are confounded by gene duplications. This problem takes a collection of gene trees and seeks a species tree that implies the minimum number of gene duplications. Wilkinson et al. posed the conjecture that the gene duplication problem satisfies the desirable Pareto property for clusters. That is, for every instance of the problem, all clusters that are commonly present in the input gene trees of this instance, called strict consensus, will also be found in every solution to this instance. We prove that this conjecture does not generally hold. Despite this negative result we show that the gene duplication problem satisfies a weaker version of the Pareto property where the strict consensus is found in at least one solution (rather than all solutions). This weaker property contributes to our design of an efficient scalable algorithm for the gene duplication problem. We demonstrate the performance of our algorithm in analyzing large-scale empirical datasets. Finally, we utilize the algorithm to evaluate the accuracy of standard heuristics for the gene duplication problem using simulated datasets.
Collapse
Affiliation(s)
- Jucheol Moon
- 1 Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, Iowa 50010, USA
| | - Harris T Lin
- 1 Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, Iowa 50010, USA
| | - Oliver Eulenstein
- 1 Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, Iowa 50010, USA
| |
Collapse
|
39
|
Abstract
BACKGROUND Discovering the location of gene duplications and multiple gene duplication episodes is a fundamental issue in evolutionary molecular biology. The problem introduced by Guigó et al. in 1996 is to map gene duplication events from a collection of rooted, binary gene family trees onto theirs corresponding rooted binary species tree in such a way that the total number of multiple gene duplication episodes is minimized. There are several models in the literature that specify how gene duplications from gene families can be interpreted as one duplication episode. However, in all duplication episode problems gene trees are rooted. This restriction limits the applicability, since unrooted gene family trees are frequently inferred by phylogenetic methods. RESULTS In this article we show the first solution to the open problem of episode clustering where the input gene family trees are unrooted. In particular, by using theoretical properties of unrooted reconciliation, we show an efficient algorithm that reduces this problem into the episode clustering problems defined for rooted trees. We show theoretical properties of the reduction algorithm and evaluation of empirical datasets. CONCLUSIONS We provided algorithms and tools that were successfully applied to several empirical datasets. In particular, our comparative study shows that we can improve known results on genomic duplication inference from real datasets.
Collapse
Affiliation(s)
- Jarosław Paszek
- University of Warsaw, Institute of Informatics, Banacha 2, Warsaw, 02-097, Poland.
| | - Paweł Górecki
- University of Warsaw, Institute of Informatics, Banacha 2, Warsaw, 02-097, Poland.
| |
Collapse
|
40
|
Impact of gene family evolutionary histories on phylogenetic species tree inference by gene tree parsimony. Mol Phylogenet Evol 2015; 96:9-16. [PMID: 26702957 DOI: 10.1016/j.ympev.2015.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 10/11/2015] [Accepted: 12/03/2015] [Indexed: 11/21/2022]
Abstract
Complicated history of gene duplication and loss brings challenge to molecular phylogenetic inference, especially in deep phylogenies. However, phylogenomic approaches, such as gene tree parsimony (GTP), show advantage over some other approaches in its ability to use gene families with duplications. GTP searches the 'optimal' species tree by minimizing the total cost of biological events such as duplications, but accuracy of GTP and phylogenetic signal in the context of different gene families with distinct histories of duplication and loss are unclear. To evaluate how different evolutionary properties of different gene families can impact on species tree inference, 3900 gene families from seven angiosperms encompassing a wide range of gene content, lineage-specific expansions and contractions were analyzed. It was found that the gene content and total duplication number in a gene family strongly influence species tree inference accuracy, with the highest accuracy achieved at either very low or very high gene content (or duplication number) and lowest accuracy centered in intermediate gene content (or duplication number), as the relationship can fit a binomial regression. Besides, for gene families of similar level of average gene content, those with relatively higher lineage-specific expansion or duplication rates tend to show lower accuracy. Additional correlation tests support that high accuracy for those gene families with large gene content may rely on abundant ancestral copies to provide many subtrees to resolve conflicts, whereas high accuracy for single or low copy gene families are just subject to sequence substitution per se. Very low accuracy reached by gene families of intermediate gene content or duplication number can be due to insufficient subtrees to resolve the conflicts from loss of alternative copies. As these evolutionary properties can significantly influence species tree accuracy, I discussed the potential weighting of the duplication cost by evolutionary properties of gene families in future GTP analyses.
Collapse
|
41
|
Abstract
BACKGROUND Evolutionary studies are complicated by discordance between gene trees and the species tree in which they evolved. Dealing with discordant trees often relies on comparison costs between gene and species trees, including the well-established Robinson-Foulds, gene duplication, and deep coalescence costs. While these costs have provided credible results for binary rooted gene trees, corresponding cost definitions for non-binary unrooted gene trees, which are frequently occurring in practice, are challenged by biological realism. RESULT We propose a natural extension of the well-established costs for comparing unrooted and non-binary gene trees with rooted binary species trees using a binary refinement model. For the duplication cost we describe an efficient algorithm that is based on a linear time reduction and also computes an optimal rooted binary refinement of the given gene tree. Finally, we show that similar reductions lead to solutions for computing the deep coalescence and the Robinson-Foulds costs. CONCLUSION Our binary refinement of Robinson-Foulds, gene duplication, and deep coalescence costs for unrooted and non-binary gene trees together with the linear time reductions provided here for computing these costs significantly extends the range of trees that can be incorporated into approaches dealing with discordance.
Collapse
Affiliation(s)
- Pawel Górecki
- Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Atanasoff Hall 212, 50011 Ames, USA
| |
Collapse
|
42
|
Wang Z, Du S, Dayanandan S, Wang D, Zeng Y, Zhang J. Phylogeny reconstruction and hybrid analysis of populus (Salicaceae) based on nucleotide sequences of multiple single-copy nuclear genes and plastid fragments. PLoS One 2014; 9:e103645. [PMID: 25116432 PMCID: PMC4130529 DOI: 10.1371/journal.pone.0103645] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 06/30/2014] [Indexed: 11/18/2022] Open
Abstract
Populus (Salicaceae) is one of the most economically and ecologically important genera of forest trees. The complex reticulate evolution and lack of highly variable orthologous single-copy DNA markers have posed difficulties in resolving the phylogeny of this genus. Based on a large data set of nuclear and plastid DNA sequences, we reconstructed robust phylogeny of Populus using parsimony, maximum likelihood and Bayesian inference methods. The resulting phylogenetic trees showed better resolution at both inter- and intra-sectional level than previous studies. The results revealed that (1) the plastid-based phylogenetic tree resulted in two main clades, suggesting an early divergence of the maternal progenitors of Populus; (2) three advanced sections (Populus, Aigeiros and Tacamahaca) are of hybrid origin; (3) species of the section Tacamahaca could be divided into two major groups based on plastid and nuclear DNA data, suggesting a polyphyletic nature of the section; and (4) many species proved to be of hybrid origin based on the incongruence between plastid and nuclear DNA trees. Reticulate evolution may have played a significant role in the evolution history of Populus by facilitating rapid adaptive radiations into different environments.
Collapse
Affiliation(s)
- Zhaoshan Wang
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Silviculture of the State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, People's Republic of China
| | - Shuhui Du
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Silviculture of the State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, People's Republic of China
| | - Selvadurai Dayanandan
- Forest and Evolutionary Genomics Laboratory, and the Centre for Structural and Functional Genomics, Biology Department, Concordia University, Montreal, Quebec, Canada
| | - Dongsheng Wang
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Silviculture of the State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, People's Republic of China
| | - Yanfei Zeng
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Silviculture of the State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, People's Republic of China
| | - Jianguo Zhang
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Silviculture of the State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, People's Republic of China
| |
Collapse
|
43
|
Chang WC, Górecki P, Eulenstein O. Exact solutions for species tree inference from discordant gene trees. J Bioinform Comput Biol 2013; 11:1342005. [PMID: 24131054 DOI: 10.1142/s0219720013420055] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions.
Collapse
|
44
|
Abstract
DrML is a software program for inferring evolutionary scenarios from a gene tree and a species tree with speciation time estimates that is based on a general maximum likelihood model. The program implements novel algorithms that efficiently infer most likely scenarios of gene duplication and loss events. Our comparative studies suggest that the general maximum likelihood model provides more credible estimates than standard parsimony reconciliation, especially when speciation times differ significantly. DrML is an open source project written in Python, and along with an on-line manual and sample data sets publicly available.
Collapse
Affiliation(s)
- Paweł Górecki
- 1 Department of Mathematics, Informatics and Mechanics, University of Warsaw , Warsaw, Poland
| | | |
Collapse
|
45
|
Hu JY, Zhang YP, Yu L. Summary of Laurasiatheria (mammalia) phylogeny. DONG WU XUE YAN JIU = ZOOLOGICAL RESEARCH 2013; 33:E65-74. [PMID: 23266984 DOI: 10.3724/sp.j.1141.2012.e05-06e65] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Laurasiatheria is one of the richest and most diverse superorders of placental mammals. Because this group had a rapid evolutionary radiation, the phylogenetic relationships among the six orders of Laurasiatheria remain a subject of heated debate and several issues related to its phylogeny remain open. Reconstructing the true phylogenetic relationships of Laurasiatheria is a significant case study in evolutionary biology due to the diversity of this suborder and such research will have significant implications for biodiversity conservation. We review the higher-level (inter-ordinal) phylogenies of Laurasiatheria based on previous cytogenetic, morphological and molecular data, and discuss the controversies of its phylogenetic relationship. This review aims to outline future researches on Laurasiatheria phylogeny and adaptive evolution.
Collapse
|
46
|
Starrett J, Hedin M, Ayoub N, Hayashi CY. Hemocyanin gene family evolution in spiders (Araneae), with implications for phylogenetic relationships and divergence times in the infraorder Mygalomorphae. Gene 2013; 524:175-86. [DOI: 10.1016/j.gene.2013.04.037] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Revised: 03/18/2013] [Accepted: 04/15/2013] [Indexed: 10/26/2022]
|
47
|
Bansal MS, Eulenstein O. Algorithms for genome-scale phylogenetics using gene tree parsimony. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:939-956. [PMID: 24334388 DOI: 10.1109/tcbb.2013.103] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The use of genomic data sets for phylogenetics is complicated by the fact that evolutionary processes such as gene duplication and loss, or incomplete lineage sorting (deep coalescence) cause incongruence among gene trees. One well-known approach that deals with this complication is gene tree parsimony, which, given a collection of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, a lack of efficient algorithms has limited the use of this approach. Here, we present efficient algorithms for SPR and TBR-based local search heuristics for gene tree parsimony under the 1) duplication, 2) loss, 3) duplication-loss, and 4) deep coalescence reconciliation costs. These novel algorithms improve upon the time complexities of previous algorithms for these problems by a factor of n, where n is the number of species in the collection of gene trees. Our algorithms provide a substantial improvement in runtime and scalability compared to previous implementations and enable large-scale gene tree parsimony analyses using any of the four reconciliation costs. Our algorithms have been implemented in the software packages DupTree and iGTP, and have already been used to perform several compelling phylogenetic studies.
Collapse
|
48
|
Nguyen TH, Ranwez V, Pointet S, Chifolleau AMA, Doyon JP, Berry V. Reconciliation and local gene tree rearrangement can be of mutual profit. Algorithms Mol Biol 2013; 8:12. [PMID: 23566548 PMCID: PMC3871789 DOI: 10.1186/1748-7188-8-12] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Accepted: 02/05/2013] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Reconciliation methods compare gene trees and species trees to recover evolutionary events such as duplications, transfers and losses explaining the history and composition of genomes. It is well-known that gene trees inferred from molecular sequences can be partly erroneous due to incorrect sequence alignments as well as phylogenetic reconstruction artifacts such as long branch attraction. In practice, this leads reconciliation methods to overestimate the number of evolutionary events. Several methods have been proposed to circumvent this problem, by collapsing the unsupported edges and then resolving the obtained multifurcating nodes, or by directly rearranging the binary gene trees. Yet these methods have been defined for models of evolution accounting only for duplications and losses, i.e. can not be applied to handle prokaryotic gene families. RESULTS We propose a reconciliation method accounting for gene duplications, losses and horizontal transfers, that specifically takes into account the uncertainties in gene trees by rearranging their weakly supported edges. Rearrangements are performed on edges having a low confidence value, and are accepted whenever they improve the reconciliation cost. We prove useful properties on the dynamic programming matrix used to compute reconciliations, which allows to speed-up the tree space exploration when rearrangements are generated by Nearest Neighbor Interchanges (NNI) edit operations. Experiments on synthetic data show that gene trees modified by such NNI rearrangements are closer to the correct simulated trees and lead to better event predictions on average. Experiments on real data demonstrate that the proposed method leads to a decrease in the reconciliation cost and the number of inferred events. Finally on a dataset of 30 k gene families, this reconciliation method shows a ranking of prokaryotic phyla by transfer rates identical to that proposed by a different approach dedicated to transfer detection [BMCBIOINF 11:324, 2010, PNAS 109(13):4962-4967, 2012]. CONCLUSIONS Prokaryotic gene trees can now be reconciled with their species phylogeny while accounting for the uncertainty of the gene tree. More accurate and more precise reconciliations are obtained with respect to previous parsimony algorithms not accounting for such uncertainties [LNCS 6398:93-108, 2010, BIOINF 28(12): i283-i291, 2012].A software implementing the method is freely available at http://www.atgc-montpellier.fr/Mowgli/.
Collapse
Affiliation(s)
- Thi Hau Nguyen
- LIRMM, UMR 5506 CNRS - Université Montpellier 2, Montpellier Cédex 5, France
- Montpellier SupAgro (UMR AGAP), Montpellier, France
- Institut de Biologie Computationnelle, 95 rue de la Galéra, 34095 Montpellier cédex, France
| | - Vincent Ranwez
- Montpellier SupAgro (UMR AGAP), Montpellier, France
- Institut de Biologie Computationnelle, 95 rue de la Galéra, 34095 Montpellier cédex, France
| | - Stéphanie Pointet
- LIRMM, UMR 5506 CNRS - Université Montpellier 2, Montpellier Cédex 5, France
- Institut de Biologie Computationnelle, 95 rue de la Galéra, 34095 Montpellier cédex, France
| | - Anne-Muriel Arigon Chifolleau
- LIRMM, UMR 5506 CNRS - Université Montpellier 2, Montpellier Cédex 5, France
- Institut de Biologie Computationnelle, 95 rue de la Galéra, 34095 Montpellier cédex, France
| | - Jean-Philippe Doyon
- LIRMM, UMR 5506 CNRS - Université Montpellier 2, Montpellier Cédex 5, France
- Institut de Biologie Computationnelle, 95 rue de la Galéra, 34095 Montpellier cédex, France
| | - Vincent Berry
- LIRMM, UMR 5506 CNRS - Université Montpellier 2, Montpellier Cédex 5, France
- Institut de Biologie Computationnelle, 95 rue de la Galéra, 34095 Montpellier cédex, France
| |
Collapse
|
49
|
Wang JF, Zhang YP, Yu L. [Summary of phylogeny in family Felidae of Carnivora]. YI CHUAN = HEREDITAS 2012. [PMID: 23208134 DOI: 10.3724/sp.j.1005.2012.01365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Felidae (cats) is one of the strict carnivorous groups in the order Carnivora, many of which are most familiar and spectacular to us. They are the top predators in the world. Thirty-six of 37 living cat species are considered as either "endangered" or "threatened". The relationships among species of the family Felidae, which evolved recently and rapidly, are difficult to resolve, and have been the subject of debate. Construction of a reliable Felidae phylogeny will be of evolutionarily significance and conservation value. In this paper, we summarized phylogeny of Felidae, including cytological, morphological and molecular evidence, and pointed out the existing phylogenetic problems. This review is expected to guide future researches of Felidae phylogeny, and to lay a theoretic foundation for the protection of this animal group.
Collapse
Affiliation(s)
- Jin-Feng Wang
- Laboratory for Conservation and Utilization of Bio-resource, Yunnan University, Kunming, China.
| | | | | |
Collapse
|
50
|
Wu YC, Rasmussen MD, Bansal MS, Kellis M. TreeFix: statistically informed gene tree error correction using species trees. Syst Biol 2012; 62:110-20. [PMID: 22949484 PMCID: PMC3526801 DOI: 10.1093/sysbio/sys076] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Accurate gene tree reconstruction is a fundamental problem in phylogenetics, with many important applications. However, sequence data alone often lack enough information to confidently support one gene tree topology over many competing alternatives. Here, we present a novel framework for combining sequence data and species tree information, and we describe an implementation of this framework in TreeFix, a new phylogenetic program for improving gene tree reconstructions. Given a gene tree (preferably computed using a maximum-likelihood phylogenetic program), TreeFix finds a “statistically equivalent” gene tree that minimizes a species tree-based cost function. We have applied TreeFix to 2 clades of 12 Drosophila and 16 fungal genomes, as well as to simulated phylogenies and show that it dramatically improves reconstructions compared with current state-of-the-art programs. Given its accuracy, speed, and simplicity, TreeFix should be applicable to a wide range of analyses and have many important implications for future investigations of gene evolution. The source code and a sample data set are available at http://compbio.mit.edu/treefix.
Collapse
Affiliation(s)
- Yi-Chieh Wu
- Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | |
Collapse
|