1
|
Sharma S, Kumar S. Discovering Fragile Clades and Causal Sequences in Phylogenomics by Evolutionary Sparse Learning. Mol Biol Evol 2024; 41:msae131. [PMID: 38916040 PMCID: PMC11247346 DOI: 10.1093/molbev/msae131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 05/30/2024] [Accepted: 06/20/2024] [Indexed: 06/26/2024] Open
Abstract
Phylogenomic analyses of long sequences, consisting of many genes and genomic segments, reconstruct organismal relationships with high statistical confidence. But, inferred relationships can be sensitive to excluding just a few sequences. Currently, there is no direct way to identify fragile relationships and the associated individual gene sequences in species. Here, we introduce novel metrics for gene-species sequence concordance and clade probability derived from evolutionary sparse learning models. We validated these metrics using fungi, plant, and animal phylogenomic datasets, highlighting the ability of the new metrics to pinpoint fragile clades and the sequences responsible. The new approach does not necessitate the investigation of alternative phylogenetic hypotheses, substitution models, or repeated data subset analyses. Our methodology offers a streamlined approach to evaluating major inferred clades and identifying sequences that may distort reconstructed phylogenies using large datasets.
Collapse
Affiliation(s)
- Sudip Sharma
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|
2
|
Pontarp M, Lundberg P, Ripa J. The succession of ecological divergence and reproductive isolation in adaptive radiations. J Theor Biol 2024; 587:111819. [PMID: 38589008 DOI: 10.1016/j.jtbi.2024.111819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 03/28/2024] [Accepted: 04/03/2024] [Indexed: 04/10/2024]
Abstract
Adaptive radiation is a major source of biodiversity but the way in which known components of ecological opportunity, ecological differentiation, and reproductive isolation underpin such biodiversity patterns remains elusive. Much is known about the evolution of ecological differentiation and reproductive isolation during single speciation events, but exactly how those processes scale up to complete adaptive radiations is less understood. Do we expect complete reproductive barriers between newly formed species before the ecological differentiation continues, or does proper species formation occur much later, long after the ecological diversification? Our goal is to improve our mechanistic understanding of adaptive radiations by analyzing an individual-based model that includes a suite of mechanisms that are known to contribute to biodiversity. The model includes variable biogeographic settings, ecological opportunities, and types of mate choice, which makes several different scenarios of an adaptive radiation possible. We find that evolving clades tend to exploit ecological opportunities early whereas reproductive barriers evolve later, demonstrating a decoupling of ecological differentiation and species formation. In many cases, we also find a long-term trend where assortative mating associated with ecological traits is replaced by sexual selection of neutral display traits as the primary mechanism for reproductive isolation. Our results propose that reticulate phylogenies are likely common and stem from initially low reproductive barriers, rather than the previously suggested idea of repeated hybridization events between well-separated species.
Collapse
Affiliation(s)
- Mikael Pontarp
- Department of Biology, Lund University, Sölvegatan 37, SE-223 62 Lund, Sweden.
| | - Per Lundberg
- Department of Biology, Lund University, Sölvegatan 37, SE-223 62 Lund, Sweden
| | - Jörgen Ripa
- Department of Biology, Lund University, Sölvegatan 37, SE-223 62 Lund, Sweden
| |
Collapse
|
3
|
Zhang Z, Liu G, Li M. Incomplete lineage sorting and gene flow within Allium (Amayllidaceae). Mol Phylogenet Evol 2024; 195:108054. [PMID: 38471599 DOI: 10.1016/j.ympev.2024.108054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/01/2024] [Accepted: 03/07/2024] [Indexed: 03/14/2024]
Abstract
The phylogeny and systematics of the genus Allium have been studied with a variety of diverse data types, including an increasing amount of molecular data. However, strong phylogenetic discordance and high levels of uncertainty have prevented the identification of a consistent phylogeny. The difficulty in establishing phylogenetic consensus and evidence for genealogical discordance make Allium a compelling test case to assess the relative contribution of incomplete lineage sorting (ILS), gene flow and gene tree estimation error on phylogenetic reconstruction. In this study, we obtained 75 transcriptomes of 38 Allium species across 10 subgenera. Whole plastid genome, single copy genes and consensus CDS were generated to estimate phylogenetic trees both using coalescence and concatenation methods. Multiple approaches including coalescence simulation, quartet sampling, reticulate network inference, sequence simulation, theta of ILS and reticulation index were carried out across the CDS gene trees to investigate the degrees of ILS, gene flow and gene tree estimation error. Afterward, a regression analysis was used to test the relative contributions of each of these forms of uncertainty to the final phylogeny. Despite extensive topological discordance among gene trees, we found a fully supported species tree that agrees with the most of well-accepted relationships and establishes monophyly of the genus Allium. We presented clear evidence for substantial ILS across the phylogeny of Allium. Further, we identified two ancient hybridization events for the formation of the second evolutionary line and subg. Butomissa as well as several introgression events between recently diverged species. Our regression analysis revealed that gene tree inference error and gene flow were the two most dominant factors explaining for the overall gene tree variation, with the difficulty in disentangling the effects of ILS and gene tree estimation error due to a positive correlation between them. Based on our efforts to mitigate the methodological errors in reconstructing trees, we believed ILS and gene flow are two principal reasons for the oft-reported phylogenetic heterogeneity of Allium. This study presents a strongly-supported and well-resolved phylogenetic backbone for the sampled Allium species, and exemplifies how to untangle heterogeneity in phylogenetic signal and reconstruct the true evolutionary history of the target taxa.
Collapse
Affiliation(s)
- ZengZhu Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, People's Republic of China
| | - Gang Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, People's Republic of China
| | - Minjie Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, People's Republic of China.
| |
Collapse
|
4
|
Assis R, Conant G, Holland B, Liberles DA, O'Reilly MM, Wilson AE. Models for the retention of duplicate genes and their biological underpinnings. F1000Res 2024; 12:1400. [PMID: 38173826 PMCID: PMC10762295 DOI: 10.12688/f1000research.141786.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/08/2024] [Indexed: 01/05/2024] Open
Abstract
Gene content in genomes changes through several different processes, with gene duplication being an important contributor to such changes. Gene duplication occurs over a range of scales from individual genes to whole genomes, and the dynamics of this process can be context dependent. Still, there are rules by which genes are retained or lost from genomes after duplication, and probabilistic modeling has enabled characterization of these rules, including their context-dependence. Here, we describe the biology and corresponding mathematical models that are used to understand duplicate gene retention and its contribution to the set of biochemical functions encoded in a genome.
Collapse
Affiliation(s)
- Raquel Assis
- Florida Atlantic University, Boca Raton, Florida, USA
| | - Gavin Conant
- North Carolina State University, Raleigh, North Carolina, USA
| | | | | | | | | |
Collapse
|
5
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
6
|
Hill M, Legried B, Roch S. Species tree estimation under joint modeling of coalescence and duplication: Sample complexity of quartet methods. ANN APPL PROBAB 2022. [DOI: 10.1214/22-aap1799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Max Hill
- Department of Mathematics, University of Wisconsin–Madison
| | | | - Sebastien Roch
- Department of Mathematics, University of Wisconsin–Madison
| |
Collapse
|
7
|
Menet H, Daubin V, Tannier E. Phylogenetic reconciliation. PLoS Comput Biol 2022; 18:e1010621. [PMID: 36327227 PMCID: PMC9632901 DOI: 10.1371/journal.pcbi.1010621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Affiliation(s)
- Hugo Menet
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- * E-mail: (VD); (ET)
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- Inria, centre de recherche de Lyon, Villeurbanne, France
- * E-mail: (VD); (ET)
| |
Collapse
|
8
|
Shi F, Li H, Rong G, Zhang Z, Wang J. Improved Fixed-Parameter Algorithm for the Tree Containment Problem on Unrooted Phylogenetic Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3539-3552. [PMID: 34506290 DOI: 10.1109/tcbb.2021.3111660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Phylogenetic trees are unable to represent the evolutionary process for a collection of species if reticulation events happened, and a generalized model named phylogenetic network was introduced consequently. However, the representation of the evolutionary process for one gene is actually a phylogenetic tree that is "contained" in the phylogenetic network for the considered species containing the gene. Thus a fundamental computational problem named Tree Containment problem arises, which asks whether a phylogenetic tree is contained in a phylogenetic network. The previous research on the problem mainly focused on its rooted version of which the considered tree and network are rooted, and several algorithms were proposed when the considered network is binary or structure-restricted. There is almost no algorithm for its unrooted version except the recent fixed-parameter algorithm with runtime O(4kn2), where k and n are the reticulation number and size of the considered unrooted binary phylogenetic network N, respectively. As the runtime is a little expensive when considering big values of k, we aim to improve it and successfully propose a fixed-parameter algorithm with runtime O(2.594kn2) in the paper. Additionally, we experimentally show its effectiveness on biological data and simulated data.
Collapse
|
9
|
Berbel-Filho WM, Pacheco G, Tatarenkov A, Lira MG, Garcia de Leaniz C, Rodríguez López CM, Lima SMQ, Consuegra S. Phylogenomics reveals extensive introgression and a case of mito-nuclear discordance in the killifish genus Kryptolebias. Mol Phylogenet Evol 2022; 177:107617. [PMID: 36038055 DOI: 10.1016/j.ympev.2022.107617] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 08/03/2022] [Accepted: 08/17/2022] [Indexed: 10/15/2022]
Abstract
Introgression is a widespread evolutionary process leading to phylogenetic inconsistencies among distinct parts of the genomes, particularly between mitochondrial and nuclear-based phylogenetic reconstructions (e.g., mito-nuclear discordances). Here, we used mtDNA and genome-wide nuclear sites to provide the first phylogenomic-based hypothesis on the evolutionary relationships within the killifish genus Kryptolebias. In addition, we tested for evidence of past introgression in the genus given the multiple reports of undergoing hybridization between its members. Our mtDNA phylogeny generally agreed with the relationships previously proposed for the genus. However, our reconstruction based on nuclear DNA revealed an unknown lineage - Kryptolebias sp. 'ESP' - as the sister group of the self-fertilizing mangrove killifishes, K. marmoratus and K. hermaphroditus. All individuals sequenced of Kryptolebias sp. 'ESP' had the same mtDNA haplotype commonly observed in K. hermaphroditus, demonstrating a clear case of mito-nuclear discordance. Our analysis further confirmed extensive history of introgression between Kryptolebias sp. 'ESP' and K. hermaphroditus. Population genomics analyses indicate no current gene flow between the two lineages, despite their current sympatry and history of introgression. We also confirmed introgression between other species pairs in the genus that have been recently reported to form hybrid zones. Overall, our study provides a phylogenomic reconstruction covering most of the Kryptolebias species, reveals a new lineage hidden in a case of mito-nuclear discordance, and provides evidence of multiple events of ancestral introgression in the genus. These findings underscore the importance of investigating different genomic information in a phylogenetic framework, particularly in taxa where introgression is common as in the sexually diverse mangrove killifishes.
Collapse
Affiliation(s)
- Waldir M Berbel-Filho
- Department of Biology, University of Oklahoma, Norman, OK, USA(1); Department of Biosciences, College of Science, Swansea University, Swansea, UK.
| | - George Pacheco
- Section for Marine Living Resources, National Institute of Aquatic Resources, Technical University of Denmark, Vejlsøvej 39, 8600 Silkeborg, Denmark
| | - Andrey Tatarenkov
- Department of Ecology and Evolutionary Biology, University of California, Irvine, USA
| | - Mateus G Lira
- Laboratório de Ictiologia Sistemática e Evolutiva, Departamento de Botânica e Zoologia, Universidade Federal do Rio Grande, Natal, Brazil
| | | | - Carlos M Rodríguez López
- Environmental Epigenetics and Genetics Group, Department of Horticulture, College of Agriculture, Food and Environment, University of Kentucky, Lexington, KY, USA
| | - Sergio M Q Lima
- Laboratório de Ictiologia Sistemática e Evolutiva, Departamento de Botânica e Zoologia, Universidade Federal do Rio Grande, Natal, Brazil
| | - Sofia Consuegra
- Department of Biosciences, College of Science, Swansea University, Swansea, UK
| |
Collapse
|
10
|
Yang X, Sun G, Xia T, Cha M, Zhang L, Pang B, Tang Q, Dou H, Zhang H. Transcriptome analysis provides new insights into cold adaptation of corsac fox (
Vulpes Corsac
). Ecol Evol 2022; 12:e8866. [PMID: 35462974 PMCID: PMC9019142 DOI: 10.1002/ece3.8866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 12/10/2021] [Accepted: 04/06/2022] [Indexed: 11/11/2022] Open
Abstract
Vulpesare widely distributed throughout the world and have undergone drastic physiological and phenotypic changes in response to their environment. However, little is known about the underlying genetic causes of these traits, especially Vulpes corsac. In this study, RNA‐Seq was used to obtain a comprehensive dataset for multiple pooled tissues of corsac fox, and selection analysis of orthologous genes was performed to identify the genes that may be influenced by the low‐temperature environment. More than 6.32 Gb clean reads were obtained and assembled into a total of 173,353 unigenes with an average length of 557 bp for corsac fox. Selective pressure analysis showed that 16 positively selected genes (PSGs) were identified in corsac fox, red fox, and arctic fox. Enrichment analysis of PSGs showed that the LRP11 gene was enriched in several pathways related to the low‐temperature response and might play a key role in response to environmental stimuli of foxes. In addition, several positively selected genes were related to DNA damage repair (ELP2 and CHAF1A), innate immunity (ARRDC4 and S100A12), and the respiratory chain (NDUFA5), and these positively selected genes might play a role in adaptation to harsh wild fox environments. The results of common orthologous gene analysis showed that gene flow or convergent evolution might be an important factor in promoting regional differentiation of foxes. Our study provides a valuable transcriptomic resource for the evolutionary history of the corsac fox and the adaptations to the extreme environments.
Collapse
Affiliation(s)
- Xiufeng Yang
- College of Life Science Qufu Normal University Qufu China
| | - Guolei Sun
- College of Life Science Qufu Normal University Qufu China
| | - Tian Xia
- College of Life Science Qufu Normal University Qufu China
| | - Muha Cha
- Hulunbuir Academy of Inland Lakes in Northern Cold & Arid Areas Hulunbuir China
| | - Lei Zhang
- College of Life Science Qufu Normal University Qufu China
| | - Bo Pang
- Hulunbuir Academy of Inland Lakes in Northern Cold & Arid Areas Hulunbuir China
| | - Qingming Tang
- Hulun Buir Forestry and Grassland Business Development Center Hulunbuir China
| | - Huashan Dou
- Hulunbuir Academy of Inland Lakes in Northern Cold & Arid Areas Hulunbuir China
| | - Honghai Zhang
- College of Life Science Qufu Normal University Qufu China
| |
Collapse
|
11
|
A stochastic Farris transform for genetic data under the multispecies coalescent with applications to data requirements. J Math Biol 2022; 84:36. [PMID: 35394192 DOI: 10.1007/s00285-022-01731-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 02/15/2022] [Accepted: 02/17/2022] [Indexed: 10/18/2022]
Abstract
Species tree estimation faces many significant hurdles. Chief among them is that the trees describing the ancestral lineages of each individual gene-the gene trees-often differ from the species tree. The multispecies coalescent is commonly used to model this gene tree discordance, at least when it is believed to arise from incomplete lineage sorting, a population-genetic effect. Another significant challenge in this area is that molecular sequences associated to each gene typically provide limited information about the gene trees themselves. While the modeling of sequence evolution by single-site substitutions is well-studied, few species tree reconstruction methods with theoretical guarantees actually address this latter issue. Instead, a standard-but unsatisfactory-assumption is that gene trees are perfectly reconstructed before being fed into a so-called summary method. Hence much remains to be done in the development of inference methodologies that rigorously account for gene tree estimation error-or completely avoid gene tree estimation in the first place. In previous work, a data requirement trade-off was derived between the number of loci m needed for an accurate reconstruction and the length of the locus sequences k. It was shown that to reconstruct an internal branch of length f, one needs m to be of the order of [Formula: see text]. That previous result was obtained under the restrictive assumption that mutation rates as well as population sizes are constant across the species phylogeny. Here we further generalize this result beyond this assumption. Our main contribution is a novel reduction to the molecular clock case under the multispecies coalescent, which we refer to as a stochastic Farris transform. As a corollary, we also obtain a new identifiability result of independent interest: for any species tree with [Formula: see text] species, the rooted topology of the species tree can be identified from the distribution of its unrooted weighted gene trees even in the absence of a molecular clock.
Collapse
|
12
|
Cornuault J, Sanmartín I. A road map for phylogenetic models of species trees. Mol Phylogenet Evol 2022; 173:107483. [DOI: 10.1016/j.ympev.2022.107483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/09/2022] [Accepted: 04/05/2022] [Indexed: 10/18/2022]
|
13
|
Sanderson MJ, Búrquez A, Copetti D, McMahon MM, Zeng Y, Wojciechowski MF. Origin and diversification of the saguaro cactus (Carnegiea gigantea): a within-species phylogenomic analysis. Syst Biol 2022; 71:1178-1194. [PMID: 35244183 DOI: 10.1093/sysbio/syac017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 02/18/2022] [Accepted: 02/25/2022] [Indexed: 11/14/2022] Open
Abstract
Reconstructing accurate historical relationships within a species poses numerous challenges, not least in many plant groups in which gene flow is high enough to extend well beyond species boundaries. Nonetheless, the extent of tree-like history within a species is an empirical question on which it is now possible to bring large amounts of genome sequence to bear. We assess phylogenetic structure across the geographic range of the saguaro cactus, an emblematic member of Cactaceae, a clade known for extensive hybridization and porous species boundaries. Using 200 Gb of whole genome resequencing data from 20 individuals sampled from 10 localities, we assembled two data sets comprising 150,000 biallelic single nucleotide polymorphisms (SNPs) from protein coding sequences. From these we inferred within-species trees and evaluated their significance and robustness using five qualitatively different inference methods. Despite the low sequence diversity, large census population sizes, and presence of wide-ranging pollen and seed dispersal agents, phylogenetic trees were well resolved and highly consistent across both data sets and all methods. We inferred that the most likely root, based on marginal likelihood comparisons, is to the east and south of the region of highest genetic diversity, which lies along the coast of the Gulf of California in Sonora, Mexico. Together with striking decreases in marginal likelihood found to the north, this supports hypotheses that saguaro's current range reflects post-glacial expansion from the refugia in the south of its range. We conclude with observations about practical and theoretical issues raised by phylogenomic data sets within species, in which SNP-based methods must be used rather than gene tree methods that are widely used when sequence divergence is higher. These include computational scalability, inference of gene flow, and proper assessment of statistical support in the presence of linkage effects.
Collapse
Affiliation(s)
- Michael J Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Alberto Búrquez
- Instituto de Ecología, Unidad Hermosillo, Universidad Nacional Autónoma de México, Hermosillo, Sonora, Mexico
| | - Dario Copetti
- Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ, 85721 USA
| | | | - Yichao Zeng
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | | |
Collapse
|
14
|
McLean BS, Bell KC, Cook JA. SNP-based Phylogenomic Inference in Holarctic Ground Squirrels (Urocitellus). Mol Phylogenet Evol 2022; 169:107396. [PMID: 35031463 DOI: 10.1016/j.ympev.2022.107396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 12/02/2021] [Accepted: 12/08/2021] [Indexed: 11/24/2022]
Abstract
Resolution of rapid evolutionary radiations requires harvesting maximal signal from phylogenomic datasets. However, studies of non-model clades often target conserved loci that are characterized by reduced information content, which can negatively affect gene tree precision and species tree accuracy. Single nucleotide polymorphism (SNP)-based methods are an underutilized but potentially valuable tool for estimating phylogeny and divergence times because they do not rely on resolved gene trees, allowing information from many or all variant loci to be leveraged in species tree reconstruction. We evaluated the utility of SNP-based methods in resolving phylogeny of Holarctic ground squirrels (Urocitellus), a radiation that has been difficult to disentangle, even in prior phylogenomic studies. We inferred phylogeny from a dataset of >3,000 ultraconserved element loci (UCEs) using two methods (SNAPP, SVDquartets) and compared our results with a new mitogenome phylogeny. We also systematically evaluated how phasing of UCEs improves per-locus information content, and inference of topology and other parameters within each of these SNP-based methods. Phasing improved topological resolution and branch length estimation at shallow levels (within species complexes), but less so at deeper levels, likely reflecting true uncertainty due to ancestral polymorphisms segregating in these rapidly diverging lineages. We resolved several key clades in Urocitellus and present targeted opportunities for future phylogenomic inquiry. Our results extend the roadmap for use of SNPs to address vertebrate radiations and support comparative analyses at multiple temporal scales.
Collapse
Affiliation(s)
- Bryan S McLean
- University of North Carolina Greensboro, Department of Biology, Greensboro, NC 27402 USA.
| | - Kayce C Bell
- Natural History Museum of Los Angeles County, Department of Mammalogy, Los Angeles, CA 90007 USA.
| | - Joseph A Cook
- University of New Mexico, Department of Biology and Museum of Southwestern Biology, Albuquerque, NM 87131 USA.
| |
Collapse
|
15
|
Watkins A. Multi-model approaches to phylogenetics: Implications for idealization. STUDIES IN HISTORY AND PHILOSOPHY OF SCIENCE 2021; 90:285-297. [PMID: 34768089 DOI: 10.1016/j.shpsa.2021.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 09/17/2021] [Accepted: 10/08/2021] [Indexed: 06/13/2023]
Abstract
Phylogenetic models traditionally represent the history of life as having a strictly-branching tree structure. However, it is becoming increasingly clear that the history of life is often not strictly-branching; lateral gene transfer, endosymbiosis, and hybridization, for example, can all produce lateral branching events. There is thus motivation to allow phylogenetic models to have a reticulate structure. One proposal involves the reconciliation of genealogical discordance. Briefly, this method uses patterns of disagreement - discordance - between trees of different genes to add lateral branching events to phylogenetic trees of taxa, and to estimate the most likely cause of these events. I use this practice to argue for: (1) a need for expanded accounts of multiple-models idealization, (2) a distinction between automatic and manual de-idealization, and (3) recognition that idealization may serve the meso-level aims of science in a different way than hitherto acknowledged.
Collapse
Affiliation(s)
- Aja Watkins
- Boston University Department of Philosophy, 745 Commonwealth Ave, Boston 02215, Massachusetts, USA. http://www.ajawatkins.org
| |
Collapse
|
16
|
Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, California 92093, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
17
|
Wang Y, Cao Z, Ogilvie HA, Nakhleh L. Phylogenomic assessment of the role of hybridization and introgression in trait evolution. PLoS Genet 2021; 17:e1009701. [PMID: 34407067 PMCID: PMC8405015 DOI: 10.1371/journal.pgen.1009701] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 08/30/2021] [Accepted: 07/07/2021] [Indexed: 11/30/2022] Open
Abstract
Trait evolution among a set of species-a central theme in evolutionary biology-has long been understood and analyzed with respect to a species tree. However, the field of phylogenomics, which has been propelled by advances in sequencing technologies, has ushered in the era of species/gene tree incongruence and, consequently, a more nuanced understanding of trait evolution. For a trait whose states are incongruent with the branching patterns in the species tree, the same state could have arisen independently in different species (homoplasy) or followed the branching patterns of gene trees, incongruent with the species tree (hemiplasy). Another evolutionary process whose extent and significance are better revealed by phylogenomic studies is gene flow between different species. In this work, we present a phylogenomic method for assessing the role of hybridization and introgression in the evolution of polymorphic or monomorphic binary traits. We apply the method to simulated evolutionary scenarios to demonstrate the interplay between the parameters of the evolutionary history and the role of introgression in a binary trait's evolution (which we call xenoplasy). Very importantly, we demonstrate, including on a biological data set, that inferring a species tree and using it for trait evolution analysis in the presence of gene flow could lead to misleading hypotheses about trait evolution.
Collapse
Affiliation(s)
- Yaxuan Wang
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Zhen Cao
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Huw A. Ogilvie
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas, United States of America
- Department of BioSciences, Rice University, Houston, Texas, United States of America
| |
Collapse
|
18
|
Buckley SJ, Brauer C, Unmack PJ, Hammer MP, Beheregaray LB. The roles of aridification and sea level changes in the diversification and persistence of freshwater fish lineages. Mol Ecol 2021; 30:4866-4883. [PMID: 34265125 DOI: 10.1111/mec.16082] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 07/05/2021] [Accepted: 07/06/2021] [Indexed: 11/29/2022]
Abstract
While the influence of Pleistocene climatic changes on divergence and speciation has been well-documented across the globe, complex spatial interactions between hydrology and eustatics over longer timeframes may also determine species evolutionary trajectories. Within the Australian continent, glacial cycles were not associated with changes in ice cover and instead largely resulted in fluctuations from moist to arid conditions across the landscape. We investigated the role of hydrological and coastal topographic changes brought about by Plio-Pleistocene climatic changes on the biogeographic history of a small Australian freshwater fish, the southern pygmy perch Nannoperca australis. Using 7958 ddRAD-seq (double digest restriction-site associated DNA) loci and 45,104 filtered SNPs, we combined phylogenetic, coalescent and species distribution analyses to assess the various roles of aridification, sea level and tectonics and associated biogeographic changes across southeast Australia. Sea-level changes since the Pliocene and reduction or disappearance of large waterbodies throughout the Pleistocene were determining factors in strong divergence across the clade, including the initial formation and maintenance of a cryptic species, N. 'flindersi'. Isolated climatic refugia and fragmentation due to lack of connected waterways maintained the identity and divergence of inter- and intraspecific lineages. Our historical findings suggest that predicted increases in aridification and sea level due to anthropogenic climate change might result in markedly different demographic impacts, both spatially and across different landscape types.
Collapse
Affiliation(s)
- Sean James Buckley
- Molecular Ecology Laboratory, College of Science and Engineering, Flinders University, Adelaide, SA, Australia
| | - Chris Brauer
- Molecular Ecology Laboratory, College of Science and Engineering, Flinders University, Adelaide, SA, Australia
| | - Peter J Unmack
- Centre for Applied Water Science, Institute for Applied Ecology, University of Canberra, ACT, Australia
| | - Michael P Hammer
- Natural Sciences, Museum and Art Gallery of the Northern Territory, Darwin, NT, Australia
| | - Luciano B Beheregaray
- Molecular Ecology Laboratory, College of Science and Engineering, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
19
|
Yan Z, Smith ML, Du P, Hahn MW, Nakhleh L. Species Tree Inference Methods Intended to Deal with Incomplete Lineage Sorting Are Robust to the Presence of Paralogs. Syst Biol 2021; 71:367-381. [PMID: 34245291 PMCID: PMC8978208 DOI: 10.1093/sysbio/syab056] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 06/23/2021] [Accepted: 06/30/2021] [Indexed: 11/24/2022] Open
Abstract
Many recent phylogenetic methods have focused on accurately inferring species
trees when there is gene tree discordance due to incomplete lineage sorting
(ILS). For almost all of these methods, and for phylogenetic methods in general,
the data for each locus are assumed to consist of orthologous, single-copy
sequences. Loci that are present in more than a single copy in any of the
studied genomes are excluded from the data. These steps greatly reduce the
number of loci available for analysis. The question we seek to answer in this
study is: what happens if one runs such species tree inference methods on data
where paralogy is present, in addition to or without ILS being present? Through
simulation studies and analyses of two large biological data sets, we show that
running such methods on data with paralogs can still provide accurate results.
We use multiple different methods, some of which are based directly on the
multispecies coalescent model, and some of which have been proven to be
statistically consistent under it. We also treat the paralogous loci in multiple
ways: from explicitly denoting them as paralogs, to randomly selecting one copy
per species. In all cases, the inferred species trees are as accurate as
equivalent analyses using single-copy orthologs. Our results have significant
implications for the use of ILS-aware phylogenomic analyses, demonstrating that
they do not have to be restricted to single-copy loci. This will greatly
increase the amount of data that can be used for phylogenetic inference.[Gene
duplication and loss; incomplete lineage sorting; multispecies coalescent;
orthology; paralogy.]
Collapse
Affiliation(s)
- Zhi Yan
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Megan L Smith
- Department of Biology and Department of Computer Science, Indiana University, 1001 East Third Street, Bloomington, IN 47405, USA
| | - Peng Du
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Matthew W Hahn
- Department of Biology and Department of Computer Science, Indiana University, 1001 East Third Street, Bloomington, IN 47405, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA.,Department of BioSciences, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
20
|
Song Y, Jiang C, Li KH, Li J, Qiu H, Price M, Fan ZX, Li J. Genome-wide analysis reveals signatures of complex introgressive gene flow in macaques (genus Macaca). Zool Res 2021; 42:433-449. [PMID: 34114757 PMCID: PMC8317189 DOI: 10.24272/j.issn.2095-8137.2021.038] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The genus Macaca serves as an ideal research model for speciation and introgressive gene flow due to its short period of diversification (about five million years ago) and rapid radiation of constituent species. To understand evolutionary gene flow in macaques, we sequenced four whole genomes (two M. arctoides and two M. thibetana) and combined them with publicly available macaque genome data for genome-wide analyses. We analyzed 14 individuals from nine Macaca species covering all Asian macaque species groups and detected extensive gene flow signals, with the strongest signals between the fascicularis and silenus species groups. Notably, we detected bidirectional gene flow between M. fascicularis and M. nemestrina. The estimated proportion of the genome inherited via gene flow between the two species was 6.19%. However, the introgression signals found among studied island species, such as Sulawesi macaques and M. fuscata, and other species were largely attributed to the genomic similarity of closely related species or ancestral introgression. Furthermore, gene flow signals varied in individuals of the same species (M. arctoides, M. fascicularis, M. mulatta, M. nemestrina and M. thibetana), suggesting very recent gene flow after the populations split. Pairwise sequentially Markovian coalescence (PSMC) analysis showed all macaques experienced a bottleneck five million years ago, after which different species exhibited different fluctuations in demographic history trajectories, implying they have experienced complicated environmental variation and climate change. These results should help improve our understanding of the complicated evolutionary history of macaques, particularly introgressive gene flow.
Collapse
Affiliation(s)
- Yang Song
- Key Laboratory of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Cong Jiang
- Key Laboratory of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Kun-Hua Li
- Key Laboratory of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Jing Li
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan 611130, China
| | - Hong Qiu
- Key Laboratory of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Megan Price
- Key Laboratory of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Zhen-Xin Fan
- Key Laboratory of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China.,Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Jing Li
- Key Laboratory of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China.,Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China. E-mail:
| |
Collapse
|
21
|
Qin X, Zhang Z, Lou Q, Xia L, Li J, Li M, Zhou J, Zhao X, Xu Y, Li Q, Yang S, Yu X, Cheng C, Huang S, Chen J. Chromosome-scale genome assembly of Cucumis hystrix-a wild species interspecifically cross-compatible with cultivated cucumber. HORTICULTURE RESEARCH 2021; 8:40. [PMID: 33642577 PMCID: PMC7917098 DOI: 10.1038/s41438-021-00475-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 12/30/2020] [Accepted: 01/07/2021] [Indexed: 05/06/2023]
Abstract
Cucumis hystrix Chakr. (2n = 2x = 24) is a wild species that can hybridize with cultivated cucumber (C. sativus L., 2n = 2x = 14), a globally important vegetable crop. However, cucumber breeding is hindered by its narrow genetic base. Therefore, introgression from C. hystrix has been anticipated to bring a breakthrough in cucumber improvement. Here, we report the chromosome-scale assembly of C. hystrix genome (289 Mb). Scaffold N50 reached 14.1 Mb. Over 90% of the sequences were anchored onto 12 chromosomes. A total of 23,864 genes were annotated using a hybrid method. Further, we conducted a comprehensive comparative genomic analysis of cucumber, C. hystrix, and melon (C. melo L., 2n = 2x = 24). Whole-genome comparisons revealed that C. hystrix is phylogenetically closer to cucumber than to melon, providing a molecular basis for the success of its hybridization with cucumber. Moreover, expanded gene families of C. hystrix were significantly enriched in "defense response," and C. hystrix harbored 104 nucleotide-binding site-encoding disease resistance gene analogs. Furthermore, 121 genes were positively selected, and 12 (9.9%) of these were involved in responses to biotic stimuli, which might explain the high disease resistance of C. hystrix. The alignment of whole C. hystrix genome with cucumber genome and self-alignment revealed 45,417 chromosome-specific sequences evenly distributed on C. hystrix chromosomes. Finally, we developed four cucumber-C. hystrix alien addition lines and identified the exact introgressed chromosome using molecular and cytological methods. The assembled C. hystrix genome can serve as a valuable resource for studies on Cucumis evolution and interspecific introgression breeding of cucumber.
Collapse
Affiliation(s)
- Xiaodong Qin
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, 210095, Nanjing, China
| | - Zhonghua Zhang
- College of Horticulture, Qingdao Agricultural University, 266109, Qingdao, China
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, 100081, Beijing, China
| | - Qunfeng Lou
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, 210095, Nanjing, China
| | - Lei Xia
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, 210095, Nanjing, China
| | - Ji Li
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, 210095, Nanjing, China
| | - Mengxue Li
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, 210095, Nanjing, China
| | - Junguo Zhou
- College of Horticulture and Landscape, Henan Institute of Science and Technology, 453003, Xinxiang, China
| | - Xiaokun Zhao
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, 210095, Nanjing, China
| | - Yuanchao Xu
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, 100081, Beijing, China
| | - Qing Li
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, 100081, Beijing, China
| | - Shuqiong Yang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, 210095, Nanjing, China
| | - Xiaqing Yu
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, 210095, Nanjing, China
| | - Chunyan Cheng
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, 210095, Nanjing, China
| | - Sanwen Huang
- Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, 518120, Shenzhen, China.
| | - Jinfeng Chen
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, 210095, Nanjing, China.
| |
Collapse
|
22
|
Shen XX, Steenwyk JL, Rokas A. Dissecting incongruence between concatenation- and quartet-based approaches in phylogenomic data. Syst Biol 2021; 70:997-1014. [PMID: 33616672 DOI: 10.1093/sysbio/syab011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict between likelihood-based signal (quantified by the difference in gene-wise log likelihood score or ΔGLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or ΔGQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30% - 36% of genes in each data matrix are inconsistent, that is, each of these genes has higher log likelihood score for T1 versus T2 (i.e., ΔGLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., ΔGQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that removal of inconsistent genes from datasets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from datasets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.
Collapse
Affiliation(s)
- Xing-Xing Shen
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China.,Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
23
|
Han X, Guo J, Pang E, Song H, Lin K. Ab Initio Construction and Evolutionary Analysis of Protein-Coding Gene Families with Partially Homologous Relationships: Closely Related Drosophila Genomes as a Case Study. Genome Biol Evol 2021; 12:185-202. [PMID: 32108239 PMCID: PMC7144356 DOI: 10.1093/gbe/evaa041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/18/2020] [Indexed: 01/05/2023] Open
Abstract
How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.
Collapse
Affiliation(s)
- Xia Han
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Jindan Guo
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Erli Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Hongtao Song
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Kui Lin
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| |
Collapse
|
24
|
Ashrafi K, Sharifdini M, Darjani A, Brant SV. Migratory routes, domesticated birds and cercarial dermatitis: the distribution of Trichobilharzia franki in Northern Iran. ACTA ACUST UNITED AC 2021; 28:4. [PMID: 33433322 PMCID: PMC7802520 DOI: 10.1051/parasite/2020073] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 12/11/2020] [Indexed: 11/21/2022]
Abstract
Background: One of the major migration routes for birds going between Europe and Asia is the Black Sea-Mediterranean route that converges on the Volga Delta, continuing into the area of the Caspian Sea. Cercarial dermatitis is a disorder in humans caused by schistosome trematodes that use aquatic birds and snails as hosts and is prevalent in areas of aquaculture in Northern Iran. Before the disorder can be addressed, it is necessary to determine the etiological agents and their host species. This study aimed to document whether domestic mallards are reservoir hosts and if so, to characterize the species of schistosomes. Previous work has shown that domestic mallards are reservoir hosts for a nasal schistosome. Results: In 32 of 45 domestic mallards (Anas platyrhynchos domesticus) (71.1%), the schistosome Trichobilharzia franki, previously reported only from Europe, was found in visceral veins. Morphological and molecular phylogenetic analysis confirmed the species designation. These findings extend the range of T. franki from Europe to Eurasia. Conclusion: The occurrence of cercarial dermatitis in Iran is high in areas of aquaculture. Previous studies in the area have shown that domestic mallards are reservoir hosts of T. regenti, a nasal schistosome and T. franki, as shown in this study. The genetic results support the conclusion that populations of T. franki from Iran are not differentiated from populations in Europe. Therefore, the schistosomes are distributed with their migratory duck hosts, maintaining the gene flow across populations with compatible snail hosts in Iran.
Collapse
Affiliation(s)
- Keyhan Ashrafi
- Department of Medical Parasitology and Mycology, School of Medicine, Guilan University of Medical Sciences, Rasht 41996-13776, Iran
| | - Meysam Sharifdini
- Department of Medical Parasitology and Mycology, School of Medicine, Guilan University of Medical Sciences, Rasht 41996-13776, Iran
| | - Abbas Darjani
- Skin Research Center, Department of Dermatology, Razi Hospital, Guilan University of Medical Sciences, Rasht 41996-13776, Iran
| | - Sara V Brant
- Museum of Southwestern Biology, Division of Parasites, Department of Biology, University of New Mexico, 1 University of New Mexico MSC03 2020, Albuquerque, New Mexico 87131, USA
| |
Collapse
|
25
|
Lopes F, Oliveira LR, Kessler A, Beux Y, Crespo E, Cárdenas-Alayza S, Majluf P, Sepúlveda M, Brownell RL, Franco-Trecu V, Páez-Rosas D, Chaves J, Loch C, Robertson BC, Acevedo-Whitehouse K, Elorriaga-Verplancken FR, Kirkman SP, Peart CR, Wolf JBW, Bonatto SL. Phylogenomic Discordance in the Eared Seals is best explained by Incomplete Lineage Sorting following Explosive Radiation in the Southern Hemisphere. Syst Biol 2020; 70:786-802. [PMID: 33367817 DOI: 10.1093/sysbio/syaa099] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 12/02/2020] [Accepted: 12/08/2020] [Indexed: 12/25/2022] Open
Abstract
The phylogeny and systematics of fur seals and sea lions (Otariidae) have long been studied with diverse data types, including an increasing amount of molecular data. However, only a few phylogenetic relationships have reached acceptance because of strong gene-tree species tree discordance. Divergence times estimates in the group also vary largely between studies. These uncertainties impeded the understanding of the biogeographical history of the group, such as when and how trans-equatorial dispersal and subsequent speciation events occurred. Here, we used high-coverage genome-wide sequencing for 14 of the 15 species of Otariidae to elucidate the phylogeny of the family and its bearing on the taxonomy and biogeographical history. Despite extreme topological discordance among gene trees, we found a fully supported species tree that agrees with the few well-accepted relationships and establishes monophyly of the genus Arctocephalus. Our data support a relatively recent trans-hemispheric dispersal at the base of a southern clade, which rapidly diversified into six major lineages between 3 and 2.5 Ma. Otaria diverged first, followed by Phocarctos and then four major lineages within Arctocephalus. However, we found Zalophus to be nonmonophyletic, with California (Zalophus californianus) and Steller sea lions (Eumetopias jubatus) grouping closer than the Galapagos sea lion (Zalophus wollebaeki) with evidence for introgression between the two genera. Overall, the high degree of genealogical discordance was best explained by incomplete lineage sorting resulting from quasi-simultaneous speciation within the southern clade with introgresssion playing a subordinate role in explaining the incongruence among and within prior phylogenetic studies of the family. [Hybridization; ILS; phylogenomics; Pleistocene; Pliocene; monophyly.].
Collapse
Affiliation(s)
- Fernando Lopes
- Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, 90619-900 Porto Alegre, RS, Brazil.,Laboratório de Ecologia de Mamíferos, Universidade do Vale do Rio dos Sinos, São Leopoldo, RS, Brazil
| | - Larissa R Oliveira
- Laboratório de Ecologia de Mamíferos, Universidade do Vale do Rio dos Sinos, São Leopoldo, RS, Brazil.,GEMARS, Grupo de Estudos de Mamíferos Aquáticos do Rio Grande do Sul, 95560-000 Torres, RS, Brazil
| | - Amanda Kessler
- Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, 90619-900 Porto Alegre, RS, Brazil
| | - Yago Beux
- Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, 90619-900 Porto Alegre, RS, Brazil
| | - Enrique Crespo
- Centro Nacional Patagónico - CENPAT, CONICET, Puerto Madryn, Argentina
| | - Susana Cárdenas-Alayza
- Centro para la Sostenibilidad Ambiental, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Patricia Majluf
- Centro para la Sostenibilidad Ambiental, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Maritza Sepúlveda
- Centro de Investigación y Gestión de Recursos Naturales (CIGREN), Facultad de Ciencias, Universidad de Valparaíso, Valparaíso, Chile
| | - Robert L Brownell
- Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration, NOAA, La Jolla, USA
| | - Valentina Franco-Trecu
- Departamento de Ecología y Evolución, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Diego Páez-Rosas
- Colegio de Ciencias Biológicas y Ambientales, COCIBA, Universidad San Francisco de Quito, Quito, Ecuador
| | - Jaime Chaves
- Colegio de Ciencias Biológicas y Ambientales, COCIBA, Universidad San Francisco de Quito, Quito, Ecuador.,Department of Biology, San Francisco State University, 1800 Holloway Ave, San Francisco, CA, USA
| | - Carolina Loch
- Sir John Walsh Research Institute, Faculty of Dentistry, University of Otago, Dunedin, New Zealand
| | | | - Karina Acevedo-Whitehouse
- Unit for Basic and Applied Microbiology, School of Natural Sciences, Universidad Autónoma de Querétaro, Querétaro, Mexico
| | | | - Stephen P Kirkman
- Department of Environmental Affairs, Oceans and Coasts, Cape Town, South Africa
| | - Claire R Peart
- Department Biologie II, Division of Evolutionary Biology, Ludwig-Maximilians-Universität München, Münich, Germany
| | - Jochen B W Wolf
- Department Biologie II, Division of Evolutionary Biology, Ludwig-Maximilians-Universität München, Münich, Germany
| | - Sandro L Bonatto
- Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, 90619-900 Porto Alegre, RS, Brazil
| |
Collapse
|
26
|
Koch H, DeGiorgio M. Maximum Likelihood Estimation of Species Trees from Gene Trees in the Presence of Ancestral Population Structure. Genome Biol Evol 2020; 12:3977-3995. [PMID: 32022857 PMCID: PMC7061232 DOI: 10.1093/gbe/evaa022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/23/2020] [Indexed: 11/12/2022] Open
Abstract
Though large multilocus genomic data sets have led to overall improvements in phylogenetic inference, they have posed the new challenge of addressing conflicting signals across the genome. In particular, ancestral population structure, which has been uncovered in a number of diverse species, can skew gene tree frequencies, thereby hindering the performance of species tree estimators. Here we develop a novel maximum likelihood method, termed TASTI (Taxa with Ancestral structure Species Tree Inference), that can infer phylogenies under such scenarios, and find that it has increasing accuracy with increasing numbers of input gene trees, contrasting with the relatively poor performances of methods not tailored for ancestral structure. Moreover, we propose a supertree approach that allows TASTI to scale computationally with increasing numbers of input taxa. We use genetic simulations to assess TASTI's performance in the three- and four-taxon settings and demonstrate the application of TASTI on a six-species Afrotropical mosquito data set. Finally, we have implemented TASTI in an open-source software package for ease of use by the scientific community.
Collapse
Affiliation(s)
- Hillary Koch
- Department of Statistics, Pennsylvania State University
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University
| |
Collapse
|
27
|
Cai L, Xi Z, Lemmon EM, Lemmon AR, Mast A, Buddenhagen CE, Liu L, Davis CC. The Perfect Storm: Gene Tree Estimation Error, Incomplete Lineage Sorting, and Ancient Gene Flow Explain the Most Recalcitrant Ancient Angiosperm Clade, Malpighiales. Syst Biol 2020; 70:491-507. [PMID: 33169797 DOI: 10.1093/sysbio/syaa083] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 10/20/2020] [Accepted: 10/28/2020] [Indexed: 12/20/2022] Open
Abstract
The genomic revolution offers renewed hope of resolving rapid radiations in the Tree of Life. The development of the multispecies coalescent model and improved gene tree estimation methods can better accommodate gene tree heterogeneity caused by incomplete lineage sorting (ILS) and gene tree estimation error stemming from the short internal branches. However, the relative influence of these factors in species tree inference is not well understood. Using anchored hybrid enrichment, we generated a data set including 423 single-copy loci from 64 taxa representing 39 families to infer the species tree of the flowering plant order Malpighiales. This order includes 9 of the top 10 most unstable nodes in angiosperms, which have been hypothesized to arise from the rapid radiation during the Cretaceous. Here, we show that coalescent-based methods do not resolve the backbone of Malpighiales and concatenation methods yield inconsistent estimations, providing evidence that gene tree heterogeneity is high in this clade. Despite high levels of ILS and gene tree estimation error, our simulations demonstrate that these two factors alone are insufficient to explain the lack of resolution in this order. To explore this further, we examined triplet frequencies among empirical gene trees and discovered some of them deviated significantly from those attributed to ILS and estimation error, suggesting gene flow as an additional and previously unappreciated phenomenon promoting gene tree variation in Malpighiales. Finally, we applied a novel method to quantify the relative contribution of these three primary sources of gene tree heterogeneity and demonstrated that ILS, gene tree estimation error, and gene flow contributed to 10.0$\%$, 34.8$\%$, and 21.4$\%$ of the variation, respectively. Together, our results suggest that a perfect storm of factors likely influence this lack of resolution, and further indicate that recalcitrant phylogenetic relationships like the backbone of Malpighiales may be better represented as phylogenetic networks. Thus, reducing such groups solely to existing models that adhere strictly to bifurcating trees greatly oversimplifies reality, and obscures our ability to more clearly discern the process of evolution. [Coalescent; concatenation; flanking region; hybrid enrichment, introgression; phylogenomics; rapid radiation, triplet frequency.].
Collapse
Affiliation(s)
- Liming Cai
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Zhenxiang Xi
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Emily Moriarty Lemmon
- Department of Biological Sciences, Florida State University, Tallahassee, FL 32306, USA
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, Tallahassee, FL 32306, USA
| | - Austin Mast
- Department of Biological Sciences, Florida State University, Tallahassee, FL 32306, USA
| | - Christopher E Buddenhagen
- Department of Biological Sciences, Florida State University, Tallahassee, FL 32306, USA
- AgResearch, 10 Bisley Road, Hamilton 3214, New Zealand
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Charles C Davis
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
| |
Collapse
|
28
|
Simon C. An Evolving View of Phylogenetic Support. Syst Biol 2020; 71:921-928. [PMID: 32915964 DOI: 10.1093/sysbio/syaa068] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 08/04/2020] [Accepted: 08/15/2020] [Indexed: 01/09/2023] Open
Abstract
If all nucleotide sites evolved at the same rate within molecules and throughout the history of lineages, if all nucleotides were in equal proportion, if any nucleotide or amino acid evolved to any other with equal probability, if all taxa could be sampled, if diversification happened at well-spaced intervals, and if all gene segments had the same history, then tree building would be easy. But of course none of those conditions are true. Hence the need for evaluating the information content and accuracy of phylogenetic trees. The symposium for which this historial essay and presentation were developed focused on the importance of phylogenetic support, specifically branch support for individual clades. Here I present a timeline and review significant events in the history of systematics that set the stage for the development of the sophisticated measures of branch support and examinations of the information content of data highlighted in this symposium.
Collapse
Affiliation(s)
- Chris Simon
- Department of Ecology and Evolutionary Biology, 75 N. Eagleville Road, University of Connecticut, Storrs, CT
| |
Collapse
|
29
|
Forsythe ES, Nelson ADL, Beilstein MA. Biased Gene Retention in the Face of Introgression Obscures Species Relationships. Genome Biol Evol 2020; 12:1646-1663. [PMID: 33011798 PMCID: PMC7533067 DOI: 10.1093/gbe/evaa149] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/10/2020] [Indexed: 12/13/2022] Open
Abstract
Phylogenomic analyses are recovering previously hidden histories of hybridization, revealing the genomic consequences of these events on the architecture of extant genomes. We applied phylogenomic techniques and several complementary statistical tests to show that introgressive hybridization appears to have occurred between close relatives of Arabidopsis, resulting in cytonuclear discordance and impacting our understanding of species relationships in the group. The composition of introgressed and retained genes indicates that selection against incompatible cytonuclear and nuclear-nuclear interactions likely acted during introgression, whereas linkage also contributed to genome composition through the retention of ancient haplotype blocks. We also applied divergence-based tests to determine the species branching order and distinguish donor from recipient lineages. Surprisingly, these analyses suggest that cytonuclear discordance arose via extensive nuclear, rather than cytoplasmic, introgression. If true, this would mean that most of the nuclear genome was displaced during introgression whereas only a small proportion of native alleles were retained.
Collapse
|
30
|
Abstract
MOTIVATION Consider a simple computational problem. The inputs are (i) the set of mixed reads generated from a sample that combines two organisms and (ii) separate sets of reads for several reference genomes of known origins. The goal is to find the two organisms that constitute the mixed sample. When constituents are absent from the reference set, we seek to phylogenetically position them with respect to the underlying tree of the reference species. This simple yet fundamental problem (which we call phylogenetic double-placement) has enjoyed surprisingly little attention in the literature. As genome skimming (low-pass sequencing of genomes at low coverage, precluding assembly) becomes more prevalent, this problem finds wide-ranging applications in areas as varied as biodiversity research, food production and provenance, and evolutionary reconstruction. RESULTS We introduce a model that relates distances between a mixed sample and reference species to the distances between constituents and reference species. Our model is based on Jaccard indices computed between each sample represented as k-mer sets. The model, built on several assumptions and approximations, allows us to formalize the phylogenetic double-placement problem as a non-convex optimization problem that decomposes mixture distances and performs phylogenetic placement simultaneously. Using a variety of techniques, we are able to solve this optimization problem numerically. We test the resulting method, called MIxed Sample Analysis tool (MISA), on a varied set of simulated and biological datasets. Despite all the assumptions used, the method performs remarkably well in practice. AVAILABILITY AND IMPLEMENTATION The software and data are available at https://github.com/balabanmetin/misa and https://github.com/balabanmetin/misa-data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Metin Balaban
- Bioinformatics and Systems Biology Department, University of California San Diego, San Diego, CA 92093, USA
| | - Siavash Mirarab
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, CA 92093, USA
| |
Collapse
|
31
|
Degnan JH. Meng and Kubatko (2009): Modeling hybridization with coalescence. Theor Popul Biol 2020; 133:36-37. [DOI: 10.1016/j.tpb.2019.07.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 07/06/2019] [Accepted: 07/08/2019] [Indexed: 11/16/2022]
|
32
|
Nagy LG, Merényi Z, Hegedüs B, Bálint B. Novel phylogenetic methods are needed for understanding gene function in the era of mega-scale genome sequencing. Nucleic Acids Res 2020; 48:2209-2219. [PMID: 31943056 PMCID: PMC7049691 DOI: 10.1093/nar/gkz1241] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/15/2019] [Accepted: 12/31/2019] [Indexed: 12/21/2022] Open
Abstract
Ongoing large-scale genome sequencing projects are forecasting a data deluge that will almost certainly overwhelm current analytical capabilities of evolutionary genomics. In contrast to population genomics, there are no standardized methods in evolutionary genomics for extracting evolutionary and functional (e.g. gene-trait association) signal from genomic data. Here, we examine how current practices of multi-species comparative genomics perform in this aspect and point out that many genomic datasets are under-utilized due to the lack of powerful methodologies. As a result, many current analyses emphasize gene families for which some functional data is already available, resulting in a growing gap between functionally well-characterized genes/organisms and the universe of unknowns. This leaves unknown genes on the 'dark side' of genomes, a problem that will not be mitigated by sequencing more and more genomes, unless we develop tools to infer functional hypotheses for unknown genes in a systematic manner. We provide an inventory of recently developed methods capable of predicting gene-gene and gene-trait associations based on comparative data, then argue that realizing the full potential of whole genome datasets requires the integration of phylogenetic comparative methods into genomics, a rich but underutilized toolbox for looking into the past.
Collapse
Affiliation(s)
- László G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Balázs Bálint
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| |
Collapse
|
33
|
Perea S, Sousa‐Santos C, Robalo J, Doadrio I. Multilocus phylogeny and systematics of Iberian endemicSqualius(Actinopterygii, Leuciscidae). ZOOL SCR 2020. [DOI: 10.1111/zsc.12420] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Silvia Perea
- Department of Biodiversity and Evolutionary Biology Museo Nacional de Ciencias Naturales - CSIC Madrid Spain
| | - Carla Sousa‐Santos
- MARE – Marine and Environmental Sciences Centre ISPA‐Instituto Universitário Lisbon Portugal
| | - Joana Robalo
- MARE – Marine and Environmental Sciences Centre ISPA‐Instituto Universitário Lisbon Portugal
| | - Ignacio Doadrio
- Department of Biodiversity and Evolutionary Biology Museo Nacional de Ciencias Naturales - CSIC Madrid Spain
| |
Collapse
|
34
|
Abstract
Background Phylogeny estimation is an important part of much biological research, but large-scale tree estimation is infeasible using standard methods due to computational issues. Recently, an approach to large-scale phylogeny has been proposed that divides a set of species into disjoint subsets, computes trees on the subsets, and then merges the trees together using a computed matrix of pairwise distances between the species. The novel component of these approaches is the last step: Disjoint Tree Merger (DTM) methods. Results We present GTM (Guide Tree Merger), a polynomial time DTM method that adds edges to connect the subset trees, so as to provably minimize the topological distance to a computed guide tree. Thus, GTM performs unblended mergers, unlike the previous DTM methods. Yet, despite the potential limitation, our study shows that GTM has excellent accuracy, generally matching or improving on two previous DTMs, and is much faster than both. Conclusions The proposed GTM approach to the DTM problem is a useful new tool for large-scale phylogenomic analysis, and shows the surprising potential for unblended DTM methods.
Collapse
Affiliation(s)
- Vladimir Smirnov
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N Goodwin Ave, Urbana, 61801, IL, US
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N Goodwin Ave, Urbana, 61801, IL, US.
| |
Collapse
|
35
|
van Hooff JJE, Tromer E, van Dam TJP, Kops GJPL, Snel B. Inferring the Evolutionary History of Your Favorite Protein: A Guide for Molecular Biologists. Bioessays 2020; 41:e1900006. [PMID: 31026339 DOI: 10.1002/bies.201900006] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 02/17/2019] [Indexed: 01/01/2023]
Abstract
Comparative genomics has proven a fruitful approach to acquire many functional and evolutionary insights into core cellular processes. Here it is argued that in order to perform accurate and interesting comparative genomics, one first and foremost has to be able to recognize, postulate, and revise different evolutionary scenarios. After all, these studies lack a simple protocol, due to different proteins having different evolutionary dynamics and demanding different approaches. The authors here discuss this challenge from a practical (what are the observations?) and conceptual (how do these indicate a specific evolutionary scenario?) viewpoint, with the aim to guide investigators who want to analyze the evolution of their protein(s) of interest. By sharing how the authors draft, test, and update such a scenario and how it directs their investigations, the authors hope to illuminate how to execute molecular evolution studies and how to interpret them. Also see the video abstract here https://youtu.be/VCt3l2pbdbQ.
Collapse
Affiliation(s)
- Jolien J E van Hooff
- Theoretical Biology and Bioinformatics, Biology, Science Faculty, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands.,Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands
| | - Eelco Tromer
- Theoretical Biology and Bioinformatics, Biology, Science Faculty, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands.,Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands.,Department of Biochemistry, University of Cambridge, Hopkins Building, Tennis Court Road, Cambridge, CB2 1QW, UK
| | - Teunis J P van Dam
- Theoretical Biology and Bioinformatics, Biology, Science Faculty, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
| | - Geert J P L Kops
- Oncode Institute, Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands.,Molecular Cancer Research, University Medical Centre Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Biology, Science Faculty, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
| |
Collapse
|
36
|
Pinto D, da Fonseca RR. Evolution of the extracytoplasmic function σ factor protein family. NAR Genom Bioinform 2020; 2:lqz026. [PMID: 33575573 PMCID: PMC7671368 DOI: 10.1093/nargab/lqz026] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 11/04/2019] [Accepted: 12/19/2019] [Indexed: 12/18/2022] Open
Abstract
Understanding transcription has been a central goal of the scientific community for decades. However, much is still unknown, especially concerning how it is regulated. In bacteria, a single DNA-directed RNA-polymerase performs the whole of transcription. It contains multiple subunits, among which the σ factor that confers promoter specificity. Besides the housekeeping σ factor, bacteria encode several alternative σ factors. The most abundant and diverse family of alternative σ factors, the extracytoplasmic function (ECF) family, regulates transcription of genes associated with stressful scenarios, making them key elements of adaptation to specific environmental changes. Despite this, the evolutionary history of ECF σ factors has never been investigated. Here, we report on our analysis of thousands of members of this family. We show that single events are in the origin of alternative modes of regulation of ECF σ factor activity that require partner proteins, but that multiple events resulted in acquisition of regulatory extensions. Moreover, in Bacteroidetes there is a recent duplication of an ecologically relevant gene cluster that includes an ECF σ factor, whereas in Planctomycetes duplication generates distinct C-terminal extensions after fortuitous insertion of the duplicated σ factor. At last, we also demonstrate horizontal transfer of ECF σ factors between soil bacteria.
Collapse
Affiliation(s)
- Daniela Pinto
- Technische Universität Dresden, Institute of Microbiology, Zellescher Weg 20b, 01217 Dresden, Germany
| | - Rute R da Fonseca
- Center for Macroecology, Evolution and Climate (CMEC), GLOBE Institute, University of Copenhagen, 1350 Copenhagen K, Denmark
| |
Collapse
|
37
|
Kuhn A, Darras H, Paknia O, Aron S. Repeated evolution of queen parthenogenesis and social hybridogenesis in
Cataglyphis
desert ants. Mol Ecol 2019; 29:549-564. [DOI: 10.1111/mec.15283] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 10/07/2019] [Accepted: 10/21/2019] [Indexed: 01/18/2023]
Affiliation(s)
- Alexandre Kuhn
- Evolutionary Biology and Ecology Université Libre de Bruxelles Brussels Belgium
| | - Hugo Darras
- Evolutionary Biology and Ecology Université Libre de Bruxelles Brussels Belgium
- Department of Ecology and Evolution Biophore UNIL Sorge University of Lausanne Lausanne Switzerland
| | - Omid Paknia
- ITZ, Ecology and Evolution TiHo Hannover Hannover Germany
| | - Serge Aron
- Evolutionary Biology and Ecology Université Libre de Bruxelles Brussels Belgium
| |
Collapse
|
38
|
Breyta R, Atkinson SD, Bartholomew JL. Evolutionary dynamics of Ceratonova species (Cnidaria: Myxozoa) reveal different host adaptation strategies. INFECTION GENETICS AND EVOLUTION 2019; 78:104081. [PMID: 31676446 DOI: 10.1016/j.meegid.2019.104081] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 10/16/2019] [Accepted: 10/22/2019] [Indexed: 10/25/2022]
Abstract
The myxozoan parasite Ceratonova shasta is an important pathogen that infects multiple species of Pacific salmonids. Ongoing genetic surveillance has revealed stable host-parasite relationships throughout the parasite's endemic range. We applied Bayesian phylogenetics to test specific hypotheses about the evolution of these host-parasite relationships within the well-studied Klamath River watershed in Oregon and California, USA. The results provide statistical support that different genotypes of C. shasta are distinct lineages of one species, which is related to two other Ceratonova species in the same ecosystems; Ceratonova X in speckled dace and C. gasterostea in threespine stickleback. Furthermore, we found strong support for the hypothesis that C. shasta type 0 in native steelhead trout and type I in Chinook salmon each evolved with a specialist host adaptation strategy, while C. shasta type II in coho salmon resulted from a generalist host adaptation strategy. Inferred date and host species of the most recent common ancestor of extant Klamath basin types indicate that it occurred between 14,000 and 21,000 years ago, and most likely infected a native steelhead or rainbow trout host.
Collapse
Affiliation(s)
- Rachel Breyta
- Department of Microbiology, Oregon State University, Corvallis, OR, USA; US Geological Survey, Western Fisheries Research Center, Seattle, WA, USA.
| | | | | |
Collapse
|
39
|
Sloutsky R, Naegle KM. ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models. eLife 2019; 8:e47676. [PMID: 31621582 PMCID: PMC6797483 DOI: 10.7554/elife.47676] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Accepted: 09/19/2019] [Indexed: 12/27/2022] Open
Abstract
Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy.
Collapse
Affiliation(s)
- Roman Sloutsky
- Program in Computational and Systems BiologyWashington UniversitySt. LouisUnited States
- Department for Biomedical EngineeringWashington UniversitySt. LouisUnited States
- Department of Biochemistry and Molecular BiologyUniversity of MassachusettsAmherstUnited States
- Center for Biological Systems EngineeringWashington UniversitySt. LouisUnited States
| | - Kristen M Naegle
- Department for Biomedical EngineeringWashington UniversitySt. LouisUnited States
- Center for Biological Systems EngineeringWashington UniversitySt. LouisUnited States
- Department of Biomedical EngineeringUniversity of VirginiaCharlottesvilleUnited States
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleUnited States
| |
Collapse
|
40
|
Duchemin W, Gence G, Arigon Chifolleau AM, Arvestad L, Bansal MS, Berry V, Boussau B, Chevenet F, Comte N, Davín AA, Dessimoz C, Dylus D, Hasic D, Mallo D, Planel R, Posada D, Scornavacca C, Szöllosi G, Zhang L, Tannier É, Daubin V. RecPhyloXML: a format for reconciled gene trees. Bioinformatics 2019; 34:3646-3652. [PMID: 29762653 PMCID: PMC6198865 DOI: 10.1093/bioinformatics/bty389] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 05/09/2018] [Indexed: 12/21/2022] Open
Abstract
Motivation A reconciliation is an annotation of the nodes of a gene tree with evolutionary events—for example, speciation, gene duplication, transfer, loss, etc.—along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative—albeit flexible—specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation http://phylariane.univ-lyon1.fr/recphyloxml/.
Collapse
Affiliation(s)
- Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Guillaume Gence
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - Anne-Muriel Arigon Chifolleau
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| | - Lars Arvestad
- Department of Mathematics, Stockholm University, Stockholm, Sweden.,Swedish e-Science Research Centre (SeRC), Stockholm, Sweden
| | - Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.,Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Vincent Berry
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Bastien Boussau
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - François Chevenet
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,MIVEGEC, CNRS 5290, IRD 224, Université de Montpellier, Montpellier, France
| | - Nicolas Comte
- INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Adrián A Davín
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution and Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David Dylus
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Damir Hasic
- Department of Mathematics, Faculty of Science, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Diego Mallo
- Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rémi Planel
- Laboratoire d'Analyse Bio-informatique en Génomique et Métabolisme CNRS-UMR 8030, Commissariat à l'Énergie Atomique (CEA), Institut de Génomique, Genoscope, Evry, France
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Celine Scornavacca
- Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Gergely Szöllosi
- MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore, Singapore
| | - Éric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
41
|
Wang Y, Nakhleh L. Towards an accurate and efficient heuristic for species/gene tree co-estimation. Bioinformatics 2019; 34:i697-i705. [PMID: 30423064 DOI: 10.1093/bioinformatics/bty599] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Motivation Species and gene trees represent how species and individual loci within their genomes evolve from their most recent common ancestors. These trees are central to addressing several questions in biology relating to, among other issues, species conservation, trait evolution and gene function. Consequently, their accurate inference from genomic data is a major endeavor. One approach to their inference is to co-estimate species and gene trees from genome-wide data. Indeed, Bayesian methods based on this approach already exist. However, these methods are very slow, limiting their applicability to datasets with small numbers of taxa. The more commonly used approach is to first infer gene trees individually, and then use gene tree estimates to infer the species tree. Methods in this category rely significantly on the accuracy of the gene trees which is often not high when the dataset includes closely related species. Results In this work, we introduce a simple, yet effective, iterative method for co-estimating gene and species trees from sequence data of multiple, unlinked loci. In every iteration, the method estimates a species tree, uses it as a generative process to simulate a collection of gene trees, and then selects gene trees for the individual loci from among the simulated gene trees by making use of the sequence data. We demonstrate the accuracy and efficiency of our method on simulated as well as biological data, and compare them to those of existing competing methods. Availability and implementation The method has been implemented in PhyloNet, which is publicly available at http://bioinfocs.rice.edu/phylonet.
Collapse
Affiliation(s)
- Yaxuan Wang
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA.,Department of BioSciences, Rice University, Houston, TX, USA
| |
Collapse
|
42
|
Kao TT, Pryer KM, Freund FD, Windham MD, Rothfels CJ. Low-copy nuclear sequence data confirm complex patterns of farina evolution in notholaenid ferns (Pteridaceae). Mol Phylogenet Evol 2019; 138:139-155. [DOI: 10.1016/j.ympev.2019.05.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 03/15/2019] [Accepted: 05/17/2019] [Indexed: 11/24/2022]
|
43
|
Comparative Phylogenomics, a Stepping Stone for Bird Biodiversity Studies. DIVERSITY-BASEL 2019. [DOI: 10.3390/d11070115] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Birds are a group with immense availability of genomic resources, and hundreds of forthcoming genomes at the doorstep. We review recent developments in whole genome sequencing, phylogenomics, and comparative genomics of birds. Short read based genome assemblies are common, largely due to efforts of the Bird 10K genome project (B10K). Chromosome-level assemblies are expected to increase due to improved long-read sequencing. The available genomic data has enabled the reconstruction of the bird tree of life with increasing confidence and resolution, but challenges remain in the early splits of Neoaves due to their explosive diversification after the Cretaceous-Paleogene (K-Pg) event. Continued genomic sampling of the bird tree of life will not just better reflect their evolutionary history but also shine new light onto the organization of phylogenetic signal and conflict across the genome. The comparatively simple architecture of avian genomes makes them a powerful system to study the molecular foundation of bird specific traits. Birds are on the verge of becoming an extremely resourceful system to study biodiversity from the nucleotide up.
Collapse
|
44
|
Molloy EK, Warnow T. TreeMerge: a new method for improving the scalability of species tree estimation methods. Bioinformatics 2019; 35:i417-i426. [PMID: 31510668 PMCID: PMC6612878 DOI: 10.1093/bioinformatics/btz344] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
MOTIVATION At RECOMB-CG 2018, we presented NJMerge and showed that it could be used within a divide-and-conquer framework to scale computationally intensive methods for species tree estimation to larger datasets. However, NJMerge has two significant limitations: it can fail to return a tree and, when used within the proposed divide-and-conquer framework, has O(n5) running time for datasets with n species. RESULTS Here we present a new method called 'TreeMerge' that improves on NJMerge in two ways: it is guaranteed to return a tree and it has dramatically faster running time within the same divide-and-conquer framework-only O(n2) time. We use a simulation study to evaluate TreeMerge in the context of multi-locus species tree estimation with two leading methods, ASTRAL-III and RAxML. We find that the divide-and-conquer framework using TreeMerge has a minor impact on species tree accuracy, dramatically reduces running time, and enables both ASTRAL-III and RAxML to complete on datasets (that they would otherwise fail on), when given 64 GB of memory and 48 h maximum running time. Thus, TreeMerge is a step toward a larger vision of enabling researchers with limited computational resources to perform large-scale species tree estimation, which we call Phylogenomics for All. AVAILABILITY AND IMPLEMENTATION TreeMerge is publicly available on Github (http://github.com/ekmolloy/treemerge). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
45
|
Chan YB, Robin C. Reconciliation of a gene network and species tree. J Theor Biol 2019; 472:54-66. [DOI: 10.1016/j.jtbi.2019.04.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 03/29/2019] [Accepted: 04/02/2019] [Indexed: 12/26/2022]
|
46
|
Glémin S, Scornavacca C, Dainat J, Burgarella C, Viader V, Ardisson M, Sarah G, Santoni S, David J, Ranwez V. Pervasive hybridizations in the history of wheat relatives. SCIENCE ADVANCES 2019; 5:eaav9188. [PMID: 31049399 PMCID: PMC6494498 DOI: 10.1126/sciadv.aav9188] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 03/20/2019] [Indexed: 05/18/2023]
Abstract
Cultivated wheats are derived from an intricate history of three genomes, A, B, and D, present in both diploid and polyploid species. It was recently proposed that the D genome originated from an ancient hybridization between the A and B lineages. However, this result has been questioned, and a robust phylogeny of wheat relatives is still lacking. Using transcriptome data from all diploid species and a new methodological approach, our comprehensive phylogenomic analysis revealed that more than half of the species descend from an ancient hybridization event but with a more complex scenario involving a different parent than previously thought-Aegilops mutica, an overlooked wild species-instead of the B genome. We also detected other extensive gene flow events that could explain long-standing controversies in the classification of wheat relatives.
Collapse
Affiliation(s)
- Sylvain Glémin
- CNRS, Univ Rennes, ECOBIO (Ecosystèmes, biodiversité, évolution)–UMR 6553, F-35042 Rennes, France
- Department of Ecology and Genetics, Evolutionary Biology Center, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution Université de Montpellier, CNRS, IRD, EPHE CC 064, Place Eugène Bataillon, 34095 Montpellier, cedex 05, France
| | - Jacques Dainat
- National Bioinformatics Infrastructure Sweden (NBIS), SciLifeLab, Uppsala Biomedicinska Centrum (BMC), Husargatan 3, S-751 23 Uppsala, Sweden
- IMBIM–Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala Biomedicinska Centrum (BMC), Husargatan 3, Box 582, S-751 23 Uppsala, Sweden
| | - Concetta Burgarella
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
- CIRAD, UMR AGAP, F-34398 Montpellier, France
| | - Véronique Viader
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - Morgane Ardisson
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - Gautier Sarah
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
- South Green Bioinformatics Platform, BIOVERSITY, CIRAD, INRA, IRD, Montpellier SupAgro, Montpellier, France
| | - Sylvain Santoni
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - Jacques David
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - Vincent Ranwez
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| |
Collapse
|
47
|
|
48
|
Nicola MV, Johnson LA, Pozner R. Unraveling patterns and processes of diversification in the South Andean-Patagonian Nassauvia subgenus Strongyloma (Asteraceae, Nassauvieae). Mol Phylogenet Evol 2019; 136:164-182. [PMID: 30858079 DOI: 10.1016/j.ympev.2019.03.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Revised: 02/11/2019] [Accepted: 03/07/2019] [Indexed: 12/26/2022]
Abstract
Congruence among different sources of data is highly desirable in phylogenetic analyses. However, plastid and nuclear DNA may record different evolutionary processes such that incongruence among results from these sources can help unravel complex evolutionary histories. That is the case of Nassauvia subgenus Strongyloma (Asteraceae), a taxon with five putative species distributed in the southern Andes and Patagonian steppe. Morphometric and phylogeographic information cast doubt on the integrity of its species, and previous molecular data even questioned the monophyly of the subgenus. We tested those questions using plastid and nuclear DNA sequences by the application of different methods such as phylogenetic trees, networks, a test of genealogical sorting, an analysis of population structure, calibration of the trees, and hybridization test, assembling non-synchronous incongruent results at subgenus and species levels in a single reconstruction. The integration of our molecular analyses and previous taxonomic, morphological, and molecular studies support subgenus Strongyloma as a monophyletic group. However, the topology of the nuclear trees and the evidence of polyploids within subgenus Nassauvia, suggest a hypothetical origin and initial radiation of Nassauvia related to an ancient hybridization event that occurred around 17-6.3 Myr ago near the Andes in west-central Patagonia. Plastid data suggest a recent diversification within subgenus Strongyloma, at most 9.8 Myr ago, towards the Patagonian steppe east of the Andes. These processes cause phylogenies to deviate from the species tree since each putative species lack exclusive ancestry. The non-monophyly of its species using both plastid and nuclear data is caused mainly by incomplete lineage sorting occurred since the Miocene. The final uplift of the Andes and Pliocene-Pleistocene glacial-interglacial and its consequences on the landscape and climate structured the genetic composition of this group of plants in the Patagonian steppe. The molecular data presented here agree with previous morphological studies, in that the five putative species typically accepted in this subgenus are not independent taxa. This study emphasizes that adding more than one sequence per species, not combining data with dissimilar inheritance patterns without first performed incongruence tests, exploring data through different methodologies, considering the timing of events, and searching for the causes of poorly resolved and/or incongruent phylogenies help to reveal complex biological underlying processes, which might otherwise remain hidden.
Collapse
Affiliation(s)
- Marcela V Nicola
- Instituto de Botánica Darwinion (CONICET-ANCEFN), Labardén 200, C.C. 22, B1642HYD, San Isidro, Provincia de Buenos Aires, Argentina.
| | - Leigh A Johnson
- Department of Biology and Bean Life Science Museum, 4102 LSB, Brigham Young University, Provo, UT 84602, USA
| | - Raúl Pozner
- Instituto de Botánica Darwinion (CONICET-ANCEFN), Labardén 200, C.C. 22, B1642HYD, San Isidro, Provincia de Buenos Aires, Argentina
| |
Collapse
|
49
|
Gene tree species tree reconciliation with gene conversion. J Math Biol 2019; 78:1981-2014. [PMID: 30767052 DOI: 10.1007/s00285-019-01331-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Revised: 10/03/2018] [Indexed: 01/19/2023]
Abstract
Gene tree/species tree reconciliation is a recent decisive progress in phylogenetic methods, accounting for the possible differences between gene histories and species histories. Reconciliation consists in explaining these differences by gene-scale events such as duplication, loss, transfer, which translates mathematically into a mapping between gene tree nodes and species tree nodes or branches. Gene conversion is a frequent and important evolutionary event, which results in the replacement of a gene by a copy of another from the same species and in the same gene tree. Including this event in reconciliation models has never been attempted because it introduces a dependency between lineages, and standard algorithms based on dynamic programming become ineffective. We propose here a novel mathematical framework including gene conversion as an evolutionary event in gene tree/species tree reconciliation. We describe a randomized algorithm that finds, in polynomial running time, a reconciliation minimizing the number of duplications, losses and conversions in the case when their weights are equal. We show that the space of optimal reconciliations includes an analog of the last common ancestor reconciliation, but is not limited to it. Our algorithm outputs any optimal reconciliation with a non-null probability. We argue that this study opens a research avenue on including gene conversion in reconciliation, and discuss its possible importance in biology.
Collapse
|
50
|
Savinova OS, Moiseenko KV, Vavilova EA, Chulkin AM, Fedorova TV, Tyazhelova TV, Vasina DV. Evolutionary Relationships Between the Laccase Genes of Polyporales: Orthology-Based Classification of Laccase Isozymes and Functional Insight From Trametes hirsuta. Front Microbiol 2019; 10:152. [PMID: 30792703 PMCID: PMC6374638 DOI: 10.3389/fmicb.2019.00152] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 01/22/2019] [Indexed: 01/06/2023] Open
Abstract
Laccase is one of the oldest known and intensively studied fungal enzymes capable of oxidizing recalcitrant lignin-resembling phenolic compounds. It is currently well established that fungal genomes almost always contain several non-allelic copies of laccase genes (laccase multigene families); nevertheless, many aspects of laccase multigenicity, for example, their precise biological functions or evolutionary relationships, are mostly unknown. Here, we present a detailed evolutionary analysis of the sensu stricto laccase genes (CAZy - AA1_1) from fungi of the Polyporales order. The conducted analysis provides a better understanding of the Polyporales laccase multigenicity and allows for the systemization of the individual features of different laccase isozymes. In addition, we provide a comparison of the biochemical and catalytic properties of the four laccase isozymes from Trametes hirsuta and suggest their functional diversification within the multigene family.
Collapse
Affiliation(s)
- Olga S Savinova
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Konstantin V Moiseenko
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Ekaterina A Vavilova
- Laboratory of Gene Expression Optimization, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Andrey M Chulkin
- Laboratory of Gene Expression Optimization, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Tatiana V Fedorova
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Tatiana V Tyazhelova
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| | - Daria V Vasina
- Laboratory of Molecular Aspects of Biotransformations, A. N. Bach Institute of Biochemistry, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|