1
|
Mello B, Schrago CG. Modeling Substitution Rate Evolution across Lineages and Relaxing the Molecular Clock. Genome Biol Evol 2024; 16:evae199. [PMID: 39332907 PMCID: PMC11430275 DOI: 10.1093/gbe/evae199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2024] [Indexed: 09/29/2024] Open
Abstract
Relaxing the molecular clock using models of how substitution rates change across lineages has become essential for addressing evolutionary problems. The diversity of rate evolution models and their implementations are substantial, and studies have demonstrated their impact on divergence time estimates can be as significant as that of calibration information. In this review, we trace the development of rate evolution models from the proposal of the molecular clock concept to the development of sophisticated Bayesian and non-Bayesian methods that handle rate variation in phylogenies. We discuss the various approaches to modeling rate evolution, provide a comprehensive list of available software, and examine the challenges and advancements of the prevalent Bayesian framework, contrasting them to faster non-Bayesian methods. Lastly, we offer insights into potential advancements in the field in the era of big data.
Collapse
Affiliation(s)
- Beatriz Mello
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, RJ 21941-617, Brazil
| | - Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, RJ 21941-617, Brazil
| |
Collapse
|
2
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
3
|
Frankel LE, Ané C. Summary Tests of Introgression Are Highly Sensitive to Rate Variation Across Lineages. Syst Biol 2023; 72:1357-1369. [PMID: 37698548 DOI: 10.1093/sysbio/syad056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 07/07/2023] [Accepted: 08/29/2023] [Indexed: 09/13/2023] Open
Abstract
The evolutionary implications and frequency of hybridization and introgression are increasingly being recognized across the tree of life. To detect hybridization from multi-locus and genome-wide sequence data, a popular class of methods are based on summary statistics from subsets of 3 or 4 taxa. However, these methods often carry the assumption of a constant substitution rate across lineages and genes, which is commonly violated in many groups. In this work, we quantify the effects of rate variation on the D test (also known as ABBA-BABA test), the D3 test, and HyDe. All 3 tests are used widely across a range of taxonomic groups, in part because they are very fast to compute. We consider rate variation across species lineages, across genes, their lineage-by-gene interaction, and rate variation across gene-tree edges. We simulated species networks according to a birth-death-hybridization process, so as to capture a range of realistic species phylogenies. For all 3 methods tested, we found a marked increase in the false discovery of reticulation (type-1 error rate) when there is rate variation across species lineages. The D3 test was the most sensitive, with around 80% type-1 error, such that D3 appears to more sensitive to a departure from the clock than to the presence of reticulation. For all 3 tests, the power to detect hybridization events decreased as the number of hybridization events increased, indicating that multiple hybridization events can obscure one another if they occur within a small subset of taxa. Our study highlights the need to consider rate variation when using site-based summary statistics, and points to the advantages of methods that do not require assumptions on evolutionary rates across lineages or across genes.
Collapse
Affiliation(s)
- Lauren E Frankel
- Department of Botany, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Cécile Ané
- Department of Botany, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
4
|
Steenwyk JL, Li Y, Zhou X, Shen XX, Rokas A. Incongruence in the phylogenomics era. Nat Rev Genet 2023; 24:834-850. [PMID: 37369847 PMCID: PMC11499941 DOI: 10.1038/s41576-023-00620-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2023] [Indexed: 06/29/2023]
Abstract
Genome-scale data and the development of novel statistical phylogenetic approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved many of its branches. However, incongruence - the inference of conflicting evolutionary histories - remains pervasive in phylogenomic data, hampering our ability to reconstruct and interpret the tree of life. Biological factors, such as incomplete lineage sorting, horizontal gene transfer, hybridization, introgression, recombination and convergent molecular evolution, can lead to gene phylogenies that differ from the species tree. In addition, analytical factors, including stochastic, systematic and treatment errors, can drive incongruence. Here, we review these factors, discuss methodological advances to identify and handle incongruence, and highlight avenues for future research.
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Howards Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA
| | - Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Xiaofan Zhou
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, China
| | - Xing-Xing Shen
- Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA.
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA.
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
| |
Collapse
|
5
|
Tabatabaee Y, Roch S, Warnow T. QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent. J Comput Biol 2023; 30:1146-1181. [PMID: 37902986 DOI: 10.1089/cmb.2023.0185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2023] Open
Abstract
We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on "short quintets" has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.
Collapse
Affiliation(s)
- Yasamin Tabatabaee
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Sebastien Roch
- Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
6
|
Liu B, Warnow T. Weighted ASTRID: fast and accurate species trees from weighted internode distances. Algorithms Mol Biol 2023; 18:6. [PMID: 37468904 PMCID: PMC10355063 DOI: 10.1186/s13015-023-00230-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 06/10/2023] [Indexed: 07/21/2023] Open
Abstract
BACKGROUND Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., "gene tree heterogeneity"). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing "gene trees") and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. RESULTS Our experimental study evaluating weighted ASTRID typically shows improvements in accuracy compared to the original (unweighted) ASTRID, and shows competitive accuracy against weighted ASTRAL, the state of the art. Our re-implementation of ASTRID also improves the runtime, with marked improvements on large datasets. CONCLUSIONS Weighted ASTRID is a new and very fast method for species tree estimation that typically improves upon ASTRID and has comparable accuracy to weighted ASTRAL, while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode .
Collapse
Affiliation(s)
- Baqiao Liu
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL USA
| |
Collapse
|
7
|
Zhang BL, Chen W, Wang Z, Pang W, Luo MT, Wang S, Shao Y, He WQ, Deng Y, Zhou L, Chen J, Yang MM, Wu Y, Wang L, Fernández-Bellon H, Molloy S, Meunier H, Wanert F, Kuderna L, Marques-Bonet T, Roos C, Qi XG, Li M, Liu Z, Schierup MH, Cooper DN, Liu J, Zheng YT, Zhang G, Wu DD. Comparative genomics reveals the hybrid origin of a macaque group. SCIENCE ADVANCES 2023; 9:eadd3580. [PMID: 37262187 PMCID: PMC10413639 DOI: 10.1126/sciadv.add3580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 01/25/2023] [Indexed: 06/03/2023]
Abstract
Although species can arise through hybridization, compelling evidence for hybrid speciation has been reported only rarely in animals. Here, we present phylogenomic analyses on genomes from 12 macaque species and show that the fascicularis group originated from an ancient hybridization between the sinica and silenus groups ~3.45 to 3.56 million years ago. The X chromosomes and low-recombination regions exhibited equal contributions from each parental lineage, suggesting that they were less affected by subsequent backcrossing and hence could have played an important role in maintaining hybrid integrity. We identified many reproduction-associated genes that could have contributed to the development of the mixed sexual phenotypes characteristic of the fascicularis group. The phylogeny within the silenus group was also resolved, and functional experimentation confirmed that all extant Western silenus species are susceptible to HIV-1 infection. Our study provides novel insights into macaque evolution and reveals a hybrid speciation event that has occurred only very rarely in primates.
Collapse
Affiliation(s)
- Bao-Lin Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Wu Chen
- Guangzhou Zoo and Guangzhou Wildlife Research Center, Guangzhou 510070, China
| | - Zefu Wang
- Key Laboratory for Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Biology and the Environment, Nanjing Forestry University, Nanjing 210037, China
| | - Wei Pang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences, KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Meng-Ting Luo
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences, KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Sheng Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Wen-Qiang He
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences, KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Yuan Deng
- BGI-Shenzhen, Shenzhen 518083, China
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen DK-2100, Denmark
| | - Long Zhou
- Center for Evolutionary and Organismal Biology and Women’s Hospital at Zhejiang University School of Medicine, Hangzhou 310058, China
| | | | - Min-Min Yang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Yajiang Wu
- Guangzhou Zoo and Guangzhou Wildlife Research Center, Guangzhou 510070, China
| | - Lu Wang
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi’an, China
| | | | | | - Hélène Meunier
- Centre de Primatologie, de l'Université de Strasbourg, Niederhausbergen, France
- Laboratoire de Neurosciences Cognitives et Adaptatives, UMR 7364, Université de Strasbourg, Strasbourg, France
| | - Fanélie Wanert
- Plateforme SILABE, Université de Strasbourg, Niederhausbergen, France
| | - Lukas Kuderna
- Genome Interpretation Department, Illumina Inc., Foster City, CA, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, Barcelona 08003, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, Barcelona 08010, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, Barcelona 08028, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Christian Roos
- Primate Genetics Laboratory, German Primate Center, Göttingen, Germany
- Gene Bank of Primates, German Primate Center, Göttingen, Germany
| | - Xiao-Guang Qi
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi’an, China
| | - Ming Li
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhijin Liu
- College of Life Sciences, Capital Normal University, Beijing 100048, China
| | | | - David N. Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff CF14 4XN, UK
| | - Jianquan Liu
- Key Laboratory for Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
- State Key Laboratory of Grassland Agro-ecosystem, Institute of Innovation Ecology and College of Life Sciences, Lanzhou University, Lanzhou 730000, China
| | - Yong-Tang Zheng
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences, KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center and National Research Facility for Phenotypic and Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650107, China
| | - Guojie Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
- Center for Evolutionary and Organismal Biology and Women’s Hospital at Zhejiang University School of Medicine, Hangzhou 310058, China
- Liangzhu Laboratory, Zhejiang University Medical Center, 1369 West Wenyi Road, Hangzhou 311121, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen 2100, Denmark
| | - Dong-Dong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center and National Research Facility for Phenotypic and Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650107, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
- Kunming Natural History Museum of Zoology, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| |
Collapse
|
8
|
Mossop KD, Lemmon AR, Moriarty Lemmon E, Eytan R, Adams M, Unmack PJ, Smith Date K, Morales HE, Hammer MP, Wong BBM, Chapple DG. Phylogenomics and biogeography of arid-adapted Chlamydogobius goby fishes. Mol Phylogenet Evol 2023; 182:107757. [PMID: 36925090 DOI: 10.1016/j.ympev.2023.107757] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 02/01/2023] [Accepted: 03/07/2023] [Indexed: 03/17/2023]
Abstract
The progressive aridification of the Australian continent from ∼ 20 million years ago posed severe challenges for the persistence of its resident biota. A key question involves the role of refugial habitats - specifically, their ability to mediate the effects of habitat loss and fragmentation, and their potential to shape opportunities for allopatric speciation. With freshwater species, for example, the patchiness, or absence, of water will constrain distributions. However, aridity may not necessarily isolate populations if disjunct refugia experience frequent hydrological connections. To investigate this potential dichotomy, we explored the evolutionary history of the Chlamydogobius gobies (Gobiiformes: Gobiidae), an arid-adapted genus of six small, benthic fish species that exploit all types of waterbodies (i.e. desert springs, waterholes and bore-fed wetlands, coastal estuarine creeks and mangroves) across parts of central and northern Australia. We used Anchored Phylogenomics to generate a highly resolved phylogeny of the group from sequence data for 260 nuclear loci. Buttressed by companion allozyme and mtDNA datasets, our molecular findings infer the diversification of Chlamydogobius in arid Australia, and provide a phylogenetic structure that cannot be simply explained by invoking allopatric speciation events reflecting current geographic proximity. Our findings are generally consistent with the existing morphological delimitation of species, with one exception: at the shallowest nodes of phylogenetic reconstruction, the molecular data do not fully support the current dichotomous delineation of C. japalpa from C. eremius in Kati Thanda-Lake Eyre-associated waterbodies. Together these findings illustrate the ability of structural (hydrological) connections to generate patterns of connectivity and isolation for an ecologically moderate disperser in response to ongoing habitat aridification. Finally, we explore the implications of these results for the immediate management of threatened (C. gloveri) and critically endangered (C. micropterus, C. squamigenus) congeners.
Collapse
Affiliation(s)
- Krystina D Mossop
- School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, FL, USA
| | | | - Ron Eytan
- Marine Biology Department, Texas A&M University at Galveston, Galveston, TX 77554, USA; Peabody Museum of Natural History, Yale University, New Haven, CT, USA
| | - Mark Adams
- Evolutionary Biology Unit, South Australian Museum, North Terrace, Adelaide, SA 5000, Australia; School of Biological Sciences, University of Adelaide, Adelaide, SA 5005, Australia
| | - Peter J Unmack
- School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia; Centre for Applied Water Science, Institute for Applied Ecology, University of Canberra, ACT 2617, Australia
| | - Katie Smith Date
- School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia; Museum Victoria, Sciences Department, GPO Box 666, Melbourne, VIC 3001, Australia
| | - Hernán E Morales
- School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia; Section for Evolutionary Genomics, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Michael P Hammer
- Natural Sciences, Museum and Art Gallery of the Northern Territory, Darwin, NT 0801, Australia
| | - Bob B M Wong
- School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| | - David G Chapple
- School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia.
| |
Collapse
|
9
|
Martins AB, Valença-Montenegro MM, Lima MGM, Lynch JW, Svoboda WK, Silva-Júnior JDSE, Röhe F, Boubli JP, Fiore AD. A New Assessment of Robust Capuchin Monkey ( Sapajus) Evolutionary History Using Genome-Wide SNP Marker Data and a Bayesian Approach to Species Delimitation. Genes (Basel) 2023; 14:genes14050970. [PMID: 37239330 DOI: 10.3390/genes14050970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/11/2023] [Accepted: 04/12/2023] [Indexed: 05/28/2023] Open
Abstract
Robust capuchin monkeys, Sapajus genus, are among the most phenotypically diverse and widespread groups of primates in South America, with one of the most confusing and often shifting taxonomies. We used a ddRADseq approach to generate genome-wide SNP markers for 171 individuals from all putative extant species of Sapajus to access their evolutionary history. Using maximum likelihood, multispecies coalescent phylogenetic inference, and a Bayes Factor method to test for alternative hypotheses of species delimitation, we inferred the phylogenetic history of the Sapajus radiation, evaluating the number of discrete species supported. Our results support the recognition of three species from the Atlantic Forest south of the São Francisco River, with these species being the first splits in the robust capuchin radiation. Our results were congruent in recovering the Pantanal and Amazonian Sapajus as structured into three monophyletic clades, though new morphological assessments are necessary, as the Amazonian clades do not agree with previous morphology-based taxonomic distributions. Phylogenetic reconstructions for Sapajus occurring in the Cerrado, Caatinga, and northeastern Atlantic Forest were less congruent with morphology-based phylogenetic reconstructions, as the bearded capuchin was recovered as a paraphyletic clade, with samples from the Caatinga biome being either a monophyletic clade or nested with the blond capuchin monkey.
Collapse
Affiliation(s)
- Amely Branquinho Martins
- Centro Nacional de Pesquisa e Conservação de Primatas Brasileiros, Instituto Chico Mendes de Conservação da Biodiversidade, Cabedelo 58310-000, PB, Brazil
- Primate Molecular Ecology and Evolution Laboratory, Department of Anthropology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Mônica Mafra Valença-Montenegro
- Centro Nacional de Pesquisa e Conservação de Primatas Brasileiros, Instituto Chico Mendes de Conservação da Biodiversidade, Cabedelo 58310-000, PB, Brazil
| | - Marcela Guimarães Moreira Lima
- Laboratório de Biogeografia da Conservação e Macroecologia, Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém 66077-530, PA, Brazil
| | - Jessica W Lynch
- Institute for Society and Genetics, Department of Anthropology, University of California-Los Angeles, Los Angeles, CA 90095, USA
| | - Walfrido Kühl Svoboda
- Instituto Latino-Americano de Ciências da Vida e da Natureza, Centro Interdisciplinar de Ciências da Vida, Universidade Federal da Integração Latino-Americana, Foz do Iguaçu 85870-650, PR, Brazil
| | - José de Sousa E Silva-Júnior
- Museu Paraense Emílio Goeldi, Ministério da Ciência, Tecnologia, Inovações e Comunicações, Coordenação de Zoologia, Campus de Pesquisa, Setor de Mastozoologia, Belém 66077-830, PA, Brazil
| | - Fábio Röhe
- Laboratório de Evolução e Genética Animal, Universidade Federal do Amazonas, Manaus 69067-005, AM, Brazil
| | - Jean Philippe Boubli
- School of Science, Engineering and the Environment, University of Salford, Salford M5 4WT, UK
| | - Anthony Di Fiore
- Primate Molecular Ecology and Evolution Laboratory, Department of Anthropology, The University of Texas at Austin, Austin, TX 78712, USA
- Tiputini Biodiversity Station, Universidad San Francisco de Quito, Quito 170901, Ecuador
| |
Collapse
|
10
|
Timilsena PR, Barrett CF, Piñeyro-Nelson A, Wafula EK, Ayyampalayam S, McNeal JR, Yukawa T, Givnish TJ, Graham SW, Pires JC, Davis JI, Ané C, Stevenson DW, Leebens-Mack J, Martínez-Salas E, Álvarez-Buylla ER, dePamphilis CW. Phylotranscriptomic Analyses of Mycoheterotrophic Monocots Show a Continuum of Convergent Evolutionary Changes in Expressed Nuclear Genes From Three Independent Nonphotosynthetic Lineages. Genome Biol Evol 2023; 15:evac183. [PMID: 36582124 PMCID: PMC9887272 DOI: 10.1093/gbe/evac183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 12/13/2022] [Accepted: 12/18/2022] [Indexed: 12/31/2022] Open
Abstract
Mycoheterotrophy is an alternative nutritional strategy whereby plants obtain sugars and other nutrients from soil fungi. Mycoheterotrophy and associated loss of photosynthesis have evolved repeatedly in plants, particularly in monocots. Although reductive evolution of plastomes in mycoheterotrophs is well documented, the dynamics of nuclear genome evolution remains largely unknown. Transcriptome datasets were generated from four mycoheterotrophs in three families (Orchidaceae, Burmanniaceae, Triuridaceae) and related green plants and used for phylogenomic analyses to resolve relationships among the mycoheterotrophs, their relatives, and representatives across the monocots. Phylogenetic trees based on 602 genes were mostly congruent with plastome phylogenies, except for an Asparagales + Liliales clade inferred in the nuclear trees. Reduction and loss of chlorophyll synthesis and photosynthetic gene expression and relaxation of purifying selection on retained genes were progressive, with greater loss in older nonphotosynthetic lineages. One hundred seventy-four of 1375 plant benchmark universally conserved orthologous genes were undetected in any mycoheterotroph transcriptome or the genome of the mycoheterotrophic orchid Gastrodia but were expressed in green relatives, providing evidence for massively convergent gene loss in nonphotosynthetic lineages. We designate this set of deleted or undetected genes Missing in Mycoheterotrophs (MIM). MIM genes encode not only mainly photosynthetic or plastid membrane proteins but also a diverse set of plastid processes, genes of unknown function, mitochondrial, and cellular processes. Transcription of a photosystem II gene (psb29) in all lineages implies a nonphotosynthetic function for this and other genes retained in mycoheterotrophs. Nonphotosynthetic plants enable novel insights into gene function as well as gene expression shifts, gene loss, and convergence in nuclear genomes.
Collapse
Affiliation(s)
- Prakash Raj Timilsena
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania
| | - Craig F Barrett
- Department of Biology, West Virginia University, Morgantown, West Virginia
| | - Alma Piñeyro-Nelson
- Departamento de Producción Agrícola y Animal, Universidad Autónoma Metropolitana-Xochimilco, Mexico City, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Eric K Wafula
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania
| | | | - Joel R McNeal
- Department of Ecology, Evolution, and Organismal Biology, Kennesaw State University, Georgia
| | - Tomohisa Yukawa
- Tsukuba Botanical Garden, National Museum of Nature and Science, 1-1, Amakubo 4, Tsukuba, 305-0005, Japan
| | - Thomas J Givnish
- Department of Botany, University of Wisconsin-Madison, Madison, Wisconsin
| | - Sean W Graham
- Department of Botany, University of British Columbia, Vancouver, British Columbia, V6T 1Z4Canada
| | - J Chris Pires
- Division of Biological Sciences, University of Missouri–Columbia, Columbia, Missouri
| | - Jerrold I Davis
- School of Integrative Plant Sciences and L.H. Bailey Hortorium, Cornell University, Ithaca, New York, 1485
| | - Cécile Ané
- Department of Botany, University of Wisconsin-Madison, Madison, Wisconsin
- Department of Statistics, University of Wisconsin–Madison, Madison, Wisconsin
| | | | - Jim Leebens-Mack
- Department of Plant Biology, University of Georgia, Athens, Georgia, 3060
| | - Esteban Martínez-Salas
- Departmento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, México
| | - Elena R Álvarez-Buylla
- Departamento de Ecología Funcional, Instituto de Ecología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Claude W dePamphilis
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania
| |
Collapse
|
11
|
Xu J, Ané C. Identifiability of local and global features of phylogenetic networks from average distances. J Math Biol 2022; 86:12. [PMID: 36481927 DOI: 10.1007/s00285-022-01847-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/17/2022] [Accepted: 11/22/2022] [Indexed: 12/12/2022]
Abstract
Phylogenetic networks extend phylogenetic trees to model non-vertical inheritance, by which a lineage inherits material from multiple parents. The computational complexity of estimating phylogenetic networks from genome-wide data with likelihood-based methods limits the size of networks that can be handled. Methods based on pairwise distances could offer faster alternatives. We study here the information that average pairwise distances contain on the underlying phylogenetic network, by characterizing local and global features that can or cannot be identified. For general networks, we clarify that the root and edge lengths adjacent to reticulations are not identifiable, and then focus on the class of zipped-up semidirected networks. We provide a criterion to swap subgraphs locally, such as 3-cycles, resulting in indistinguishable networks. We propose the "distance split tree", which can be constructed from pairwise distances, and prove that it is a refinement of the network's tree of blobs, capturing the tree-like features of the network. For level-1 networks, this distance split tree is equal to the tree of blobs refined to separate polytomies from blobs, and we prove that the mixed representation of the network is identifiable. The information loss is localized around 4-cycles, for which the placement of the reticulation is unidentifiable. The mixed representation combines split edges for 4-cycles, regular tree and hybrid edges from the semidirected network, and edge parameters that encode all information identifiable from average pairwise distances.
Collapse
Affiliation(s)
- Jingcheng Xu
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA.
| | - Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
- Department of Botany, University of Wisconsin - Madison, Madison, WI, 53706, USA
| |
Collapse
|
12
|
Zhang C, Mirarab S. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Mol Biol Evol 2022; 39:6750035. [PMID: 36201617 PMCID: PMC9750496 DOI: 10.1093/molbev/msac215] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/20/2022] [Accepted: 10/03/2022] [Indexed: 01/07/2023] Open
Abstract
Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, USA
| | | |
Collapse
|
13
|
Mahbub S, Sawmya S, Saha A, Reaz R, Rahman MS, Bayzid MS. Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data. JOURNAL OF COMPUTATIONAL BIOLOGY : A JOURNAL OF COMPUTATIONAL MOLECULAR CELL BIOLOGY 2022; 29:1156-1172. [PMID: 36048555 DOI: 10.1089/cmb.2022.0212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, for a combination of reasons (ranging from sampling biases to more biological causes, as in gene birth and loss), gene trees are often incomplete, meaning that not all species of interest have a common set of genes. Incomplete gene trees can potentially impact the accuracy of phylogenomic inference. We, for the first time, introduce the problem of imputing the quartet distribution induced by a set of incomplete gene trees, which involves adding the missing quartets back to the quartet distribution. We present Quartet based Gene tree Imputation using Deep Learning (QT-GILD), an automated and specially tailored unsupervised deep learning technique, accompanied by cues from natural language processing, which learns the quartet distribution in a given set of incomplete gene trees and generates a complete set of quartets accordingly. QT-GILD is a general-purpose technique needing no explicit modeling of the subject system or reasons for missing data or gene tree heterogeneity. Experimental studies on a collection of simulated and empirical datasets suggest that QT-GILD can effectively impute the quartet distribution, which results in a dramatic improvement in the species tree accuracy. Remarkably, QT-GILD not only imputes the missing quartets but can also account for gene tree estimation error. Therefore, QT-GILD advances the state-of-the-art in species tree estimation from gene trees in the face of missing data.
Collapse
Affiliation(s)
- Sazan Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.,Department of Computer Science, University of Maryland, College Park, Maryland, USA
| | - Shashata Sawmya
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Arpita Saha
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Rezwana Reaz
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - M Sohel Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| |
Collapse
|
14
|
Gatesy J, Springer MS. Phylogenomic Coalescent Analyses of Avian Retroelements Infer Zero-Length Branches at the Base of Neoaves, Emergent Support for Controversial Clades, and Ancient Introgressive Hybridization in Afroaves. Genes (Basel) 2022; 13:1167. [PMID: 35885951 PMCID: PMC9324441 DOI: 10.3390/genes13071167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 06/20/2022] [Accepted: 06/21/2022] [Indexed: 01/25/2023] Open
Abstract
Retroelement insertions (RIs) are low-homoplasy characters that are ideal data for addressing deep evolutionary radiations, where gene tree reconstruction errors can severely hinder phylogenetic inference with DNA and protein sequence data. Phylogenomic studies of Neoaves, a large clade of birds (>9000 species) that first diversified near the Cretaceous−Paleogene boundary, have yielded an array of robustly supported, contradictory relationships among deep lineages. Here, we reanalyzed a large RI matrix for birds using recently proposed quartet-based coalescent methods that enable inference of large species trees including branch lengths in coalescent units, clade-support, statistical tests for gene flow, and combined analysis with DNA-sequence-based gene trees. Genome-scale coalescent analyses revealed extremely short branches at the base of Neoaves, meager branch support, and limited congruence with previous work at the most challenging nodes. Despite widespread topological conflicts with DNA-sequence-based trees, combined analyses of RIs with thousands of gene trees show emergent support for multiple higher-level clades (Columbea, Passerea, Columbimorphae, Otidimorphae, Phaethoquornithes). RIs express asymmetrical support for deep relationships within the subclade Afroaves that hints at ancient gene flow involving the owl lineage (Strigiformes). Because DNA-sequence data are challenged by gene tree-reconstruction error, analysis of RIs represents one approach for improving gene tree-based methods when divergences are deep, internodes are short, terminal branches are long, and introgressive hybridization further confounds species−tree inference.
Collapse
Affiliation(s)
- John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Mark S. Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA;
| |
Collapse
|
15
|
Sanín MJ, Borchsenius F, Paris M, Carvalho-Madrigal S, Gómez Hoyos AC, Cardona A, Arcila Marín N, Ospina Y, Hoyos-Gómez SE, Manrique HF, Bernal R. The Tracking of Moist Habitats Allowed Aiphanes (Arecaceae) to Cover the Elevation Gradient of the Northern Andes. FRONTIERS IN PLANT SCIENCE 2022; 13:881879. [PMID: 35832227 PMCID: PMC9272002 DOI: 10.3389/fpls.2022.881879] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 05/20/2022] [Indexed: 06/15/2023]
Abstract
The topographic gradients of the Tropical Andes may have triggered species divergence by different mechanisms. Topography separates species' geographical ranges and offers climatic heterogeneity, which could potentially foster local adaptation to specific climatic conditions and result in narrowly distributed endemic species. Such a pattern is found in the Andean centered palm genus Aiphanes. To test the extent to which geographic barriers and climatic heterogeneity can explain distribution patterns in Aiphanes, we sampled 34 out of 36 currently recognized species in that genus and sequenced them by Sanger sequencing and/or sequence target capture sequencing. We generated Bayesian, likelihood, and species-tree phylogenies, with which we explored climatic trait evolution from current climatic occupation. We also estimated species distribution models to test the relative roles of geographical and climatic divergence in their evolution. We found that Aiphanes originated in the Miocene in Andean environments and possibly in mid-elevation habitats. Diversification is related to the occupation of the adjacent high and low elevation habitats tracking high annual precipitation and low precipitation seasonality (moist habitats). Different species in different clades repeatedly occupy all the different temperatures offered by the elevation gradient from 0 to 3,000 m in different geographically isolated areas. A pattern of conserved adaptation to moist environments is consistent among the clades. Our results stress the evolutionary roles of niche truncation of wide thermal tolerance by physical range fragmentation, coupled with water-related niche conservatism, to colonize the topographic gradient.
Collapse
Affiliation(s)
- María José Sanín
- Facultad de Ciencias y Biotecnología, Universidad CES, Medellín, Colombia
- School of Mathematical and Natural Sciences, Arizona State University, Tempe, AZ, United States
- Departamento de Procesos y Energía, Universidad Nacional de Colombia, Medellín, Colombia
| | - Finn Borchsenius
- Faculty of Technical Sciences, Aarhus University, Aarhus, Denmark
| | - Margot Paris
- Unit of Ecology and Evolution, Department of Biology, University of Fribourg, Fribourg, Switzerland
| | | | | | - Agustín Cardona
- Departamento de Procesos y Energía, Universidad Nacional de Colombia, Medellín, Colombia
| | | | - Yerson Ospina
- Facultad de Ciencias y Biotecnología, Universidad CES, Medellín, Colombia
| | | | | | | |
Collapse
|
16
|
Xiong H, Wang D, Shao C, Yang X, Yang J, Ma T, Davis CC, Liu L, Xi Z. Species Tree Estimation and the Impact of Gene Loss Following Whole-Genome Duplication. Syst Biol 2022; 71:1348-1361. [PMID: 35689633 PMCID: PMC9558847 DOI: 10.1093/sysbio/syac040] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 12/02/2022] Open
Abstract
Whole-genome duplication (WGD) occurs broadly and repeatedly across the history of eukaryotes and is recognized as a prominent evolutionary force, especially in plants. Immediately following WGD, most genes are present in two copies as paralogs. Due to this redundancy, one copy of a paralog pair commonly undergoes pseudogenization and is eventually lost. When speciation occurs shortly after WGD; however, differential loss of paralogs may lead to spurious phylogenetic inference resulting from the inclusion of pseudoorthologs–paralogous genes mistakenly identified as orthologs because they are present in single copies within each sampled species. The influence and impact of including pseudoorthologs versus true orthologs as a result of gene extinction (or incomplete laboratory sampling) are only recently gaining empirical attention in the phylogenomics community. Moreover, few studies have yet to investigate this phenomenon in an explicit coalescent framework. Here, using mathematical models, numerous simulated data sets, and two newly assembled empirical data sets, we assess the effect of pseudoorthologs on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and differential gene loss scenarios following WGD. When gene loss occurs along the terminal branches of the species tree, alignment-based (BPP) and gene-tree-based (ASTRAL, MP-EST, and STAR) coalescent methods are adversely affected as the degree of ILS increases. This can be greatly improved by sampling a sufficiently large number of genes. Under the same circumstances, however, concatenation methods consistently estimate incorrect species trees as the number of genes increases. Additionally, pseudoorthologs can greatly mislead species tree inference when gene loss occurs along the internal branches of the species tree. Here, both coalescent and concatenation methods yield inconsistent results. These results underscore the importance of understanding the influence of pseudoorthologs in the phylogenomics era. [Coalescent method; concatenation method; incomplete lineage sorting; pseudoorthologs; single-copy gene; whole-genome duplication.]
Collapse
Affiliation(s)
- Haifeng Xiong
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Danying Wang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Chen Shao
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Xuchen Yang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Jialin Yang
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Tao Ma
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Charles C Davis
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Zhenxiang Xi
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| |
Collapse
|
17
|
Guerra G, Nielsen R. Covariance of pairwise differences on a multi-species coalescent tree and implications for
F
ST. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200415. [PMID: 35430886 PMCID: PMC9014196 DOI: 10.1098/rstb.2020.0415] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The multi-species coalescent (MSC) provides a theoretical foundation for modern phylogenetics and comparative population genetics. Its theoretical properties have been heavily studied but there are still aspects of the MSC that are largely unknown, including the covariances in pairwise coalescence times, which are fundamental for understanding the properties of statistics that combine data from multiple species, such as the fixation index (FST). The major contribution of this study is the derivation and implementation of exact expressions for the covariances of pairwise coalescence times under phylogenetic models with piecewise constant changes in population size, assuming no gene flow after species divergence. We use these expressions to derive the variance in average pairwise differences within and between populations. We then derive approximations for the expectation and bias of a sequence-based estimator of FST, a commonly used genetic measurement of population differentiation, when it is applied to a non-recombining region of the genome. We show that the estimator of FST is generally biased downward. A freely available software package is provided, STCov, to calculate the mean, variances and covariances in coalescence times presented here under user-defined piecewise-constant species trees. This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’.
Collapse
Affiliation(s)
- Geno Guerra
- Department of Statistics, University of California, Berkeley, CA 94720, USA
- Department of Neurological Surgery, University of California, San Francisco, CA 94158, USA
| | - Rasmus Nielsen
- Department of Statistics, University of California, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
- Lundbeck Foundations Centre for GeoGenetics, University of Copenhagen, Kobenhavn, Denmark
| |
Collapse
|
18
|
Hutter CR, Cobb KA, Portik DM, Travers SL, Wood PL, Brown RM. FrogCap: A modular sequence capture probe-set for phylogenomics and population genetics for all frogs, assessed across multiple phylogenetic scales. Mol Ecol Resour 2022; 22:1100-1119. [PMID: 34569723 DOI: 10.1111/1755-0998.13517] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/08/2021] [Accepted: 09/14/2021] [Indexed: 12/01/2022]
Abstract
Despite the prevalence of high-throughput sequencing in phylogenetics, many relationships remain difficult to resolve because of conflicting signal among genomic regions. Selection of different types of molecular markers from different genomic regions is required to overcome these challenges. For evolutionary studies in frogs, we introduce the publicly available FrogCap suite of genomic resources, which is a large collection of ~15,000 markers that unifies previous genetic sequencing efforts. FrogCap is designed to be modular, such that subsets of markers and SNPs can be selected based on the desired phylogenetic scale. FrogCap uses a variety of marker types that include exons and introns, ultraconserved elements, and previously sequenced Sanger markers, which span up to 10,000 bp in alignment lengths; in addition, we demonstrate potential for SNP-based analyses. We tested FrogCap using 121 samples distributed across five phylogenetic scales, comparing probes designed using a consensus- or exemplar genome-based approach. Using the consensus design is more resilient to issues with sensitivity, specificity, and missing data than picking an exemplar genome sequence. We also tested the impact of different bait kit sizes (20,020 vs. 40,040) on depth of coverage and found triple the depth for the 20,020 bait kit. We observed sequence capture success (i.e., missing data, sequenced markers/bases, marker length, and informative sites) across phylogenetic scales. The incorporation of different marker types is effective for deep phylogenetic relationships and shallow population genetics studies. Having demonstrated FrogCap's utility and modularity, we conclude that these new resources are efficacious for high-throughput sequencing projects across variable timescales.
Collapse
Affiliation(s)
- Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, USA
| | - Kerry A Cobb
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, USA
| | - Daniel M Portik
- California Academy of Sciences, San Francisco, California, USA
| | - Scott L Travers
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, USA
- Department of Biological Sciences, Rutgers University-Newark, Newark, New Jersey, USA
| | - Perry L Wood
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, USA
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, USA
| |
Collapse
|
19
|
Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022; 2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenomics is the inference of phylogenetic trees based on multiple marker genes sampled in the genomes of interest. An important challenge in phylogenomics is the potential incongruence among the evolutionary histories of individual genes, which can be widespread in microorganisms due to the prevalence of horizontal gene transfer. This protocol introduces the procedures for building a phylogenetic tree of a large number of microbial genomes using a broad sampling of marker genes that are representative of whole-genome evolution. The protocol highlights the use of a gene tree summary method, which can effectively reconstruct the species tree while accounting for the topological conflicts among individual gene trees. The pipeline described in this protocol is scalable to tens of thousands of genomes while retaining high accuracy. We discussed multiple software tools, libraries, and scripts to enable convenient adoption of the protocol. The protocol is suitable for microbiology and microbiome studies based on public genomes and metagenomic data.
Collapse
Affiliation(s)
- Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
20
|
Welt RS, Raxworthy CJ. Dispersal, not vicariance, explains the biogeographic origin of iguanas on Madagascar. Mol Phylogenet Evol 2021; 167:107345. [PMID: 34748875 DOI: 10.1016/j.ympev.2021.107345] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 09/23/2021] [Accepted: 11/02/2021] [Indexed: 11/19/2022]
Abstract
Lizards of the clade Iguanidae (sensu lato) are primarily a New World group. Thus, the remarkable presence of an endemic lineage of iguanas (family Opluridae) on the isolated Indian Ocean island of Madagascar has long been considered a biogeographic anomaly. Previous work attributed this disjunct extant distribution to: (1) vicariance at about 140-165 Ma, caused by the breakup of Gondwana and the separation of South America, Africa, and Madagascar (with subsequent extinction of iguanas on Africa, and potentially other Gondwanan landmasses), (2) vicariance at about 80-90 Ma, caused by the sundering of hypothesized land-bridge connections between South America, Antarctica, India, and Madagascar, or (3) long-distance overwater dispersal from South America to Madagascar. Each hypothesis has been supported with molecular divergence dating analyses, and thus the biogeographic origin of the Opluridae is not yet well resolved. Here we utilize genetic sequences of ultraconserved elements for all Iguania families and the majority of Iguanidae (s.l.) genera, and morphological data for extant and fossil taxa (used for divergence dating analyses), to produce the most comprehensive dataset applied to date to test these origin hypotheses. We find strong support for a sister relationship between the Opluridae (Madagascar) and Leiosauridae (South America). Divergence of the Opluridae from Leiosauridae is dated to between the late Cretaceous and mid-Paleogene, at a time when Madagascar was already an island and was isolated from all other Gondwanan landmasses. Consequently, our results support a hypothesis of long-distance overwater dispersal of the Opluridae lineage, either directly from South America to Madagascar or potentially via Antarctica or Africa, leading to this radiation of iguanas in the Indian Ocean.
Collapse
Affiliation(s)
- Rachel S Welt
- Department of Herpetology, American Museum of Natural History, USA.
| | | |
Collapse
|
21
|
Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, California 92093, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
22
|
|
23
|
Xie DF, Cheng RY, Fu X, Zhang XY, Price M, Lan YL, Wang CB, He XJ. A Combined Morphological and Molecular Evolutionary Analysis of Karst-Environment Adaptation for the Genus Urophysa (Ranunculaceae). FRONTIERS IN PLANT SCIENCE 2021; 12:667988. [PMID: 34177982 PMCID: PMC8223000 DOI: 10.3389/fpls.2021.667988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 05/12/2021] [Indexed: 06/13/2023]
Abstract
The karst environment is characterized by low soil water content, periodic water deficiency, and poor nutrient availability, which provides an ideal natural laboratory for studying the adaptive evolution of its inhabitants. However, how species adapt to such a special karst environment remains poorly understood. Here, transcriptome sequences of two Urophysa species (Urophysa rockii and Urophysa henryi), which are Chinese endemics with karst-specific distribution, and allied species in Semiaquilegia and Aquilegia (living in non-karst habitat) were collected. Single-copy genes (SCGs) were extracted to perform the phylogenetic analysis using concatenation and coalescent methods. Positively selected genes (PSGs) and clusters of paralogous genes (Mul_genes) were detected and subsequently used to conduct gene function annotation. We filtered 2,271 SCGs and the coalescent analysis revealed that 1,930 SCGs shared the same tree topology, which was consistent with the topology detected from the concatenated tree. Total of 335 PSGs and 243 Mul_genes were detected, and many were enriched in stress and stimulus resistance, transmembrane transport, cellular ion homeostasis, calcium ion transport, calcium signaling regulation, and water retention. Both molecular and morphological evidences indicated that Urophysa species evolved complex strategies for adapting to hostile karst environments. Our findings will contribute to a new understanding of genetic and phenotypic adaptive mechanisms of karst adaptation in plants.
Collapse
Affiliation(s)
- Deng-Feng Xie
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Rui-Yu Cheng
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Xiao Fu
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Xiang-Yi Zhang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Megan Price
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Yan-Ling Lan
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | | | - Xing-Jin He
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| |
Collapse
|
24
|
Mahbub M, Wahab Z, Reaz R, Rahman MS, Bayzid MS. wQFM: Highly Accurate Genome-scale Species Tree Estimation from Weighted Quartets. Bioinformatics 2021; 37:3734-3743. [PMID: 34086858 DOI: 10.1093/bioinformatics/btab428] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 06/03/2021] [Indexed: 02/01/2023] Open
Abstract
MOTIVATION Species tree estimation from genes sampled from throughout the whole genome is complicated due to the gene tree-species tree discordance. Incomplete lineage sorting (ILS) is one of the most frequent causes for this discordance, where alleles can coexist in populations for periods that may span several speciation events. Quartet-based summary methods for estimating species trees from a collection of gene trees are becoming popular due to their high accuracy and statistical guarantee under ILS. Generating quartets with appropriate weights, where weights correspond to the relative importance of quartets, and subsequently amalgamating the weighted quartets to infer a single coherent species tree can allow for a statistically consistent way of estimating species trees. However, handling weighted quartets is challenging. RESULTS We propose wQFM, a highly accurate method for species tree estimation from multi-locus data, by extending the quartet FM (QFM) algorithm to a weighted setting. wQFM was assessed on a collection of simulated and real biological datasets, including the avian phylogenomic dataset which is one of the largest phylogenomic datasets to date. We compared wQFM with wQMC, which is the best alternate method for weighted quartet amalgamation, and with ASTRAL, which is one of the most accurate and widely used coalescent-based species tree estimation methods. Our results suggest that wQFM matches or improves upon the accuracy of wQMC and ASTRAL. AVAILABILITY wQFM is available in open source form at https://github.com/Mahim1997/wQFM-2020. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mahim Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Zahin Wahab
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Rezwana Reaz
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - M Saifur Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| |
Collapse
|
25
|
Markin A, Eulenstein O. Quartet-Based Inference is Statistically Consistent Under the Unified Duplication-Loss-Coalescence Model. Bioinformatics 2021; 37:4064-4074. [PMID: 34048529 PMCID: PMC9113308 DOI: 10.1093/bioinformatics/btab414] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 05/19/2021] [Accepted: 05/27/2021] [Indexed: 12/19/2022] Open
Abstract
Motivation The classic multispecies coalescent (MSC) model provides the means for theoretical justification of incomplete lineage sorting-aware species tree inference methods. This has motivated an extensive body of work on phylogenetic methods that are statistically consistent under MSC. One such particularly popular method is ASTRAL, a quartet-based species tree inference method. Novel studies suggest that ASTRAL also performs well when given multi-locus gene trees in simulation studies. Further, Legried et al. recently demonstrated that ASTRAL is statistically consistent under the gene duplication and loss model (GDL). GDL is prevalent in evolutionary histories and is the first core process in the powerful duplication-loss-coalescence evolutionary model (DLCoal) by Rasmussen and Kellis. Results In this work, we prove that ASTRAL is statistically consistent under the general DLCoal model. Therefore, our result supports the empirical evidence from the simulation-based studies. More broadly, we prove that the quartet-based inference approach is statistically consistent under DLCoal. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexey Markin
- Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA, 50010, USA
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
26
|
Alanzi AAR, Degnan JH. Statistical inconsistency of the unrooted minimize deep coalescence criterion. PLoS One 2021; 16:e0251107. [PMID: 33970931 PMCID: PMC8109837 DOI: 10.1371/journal.pone.0251107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Accepted: 04/20/2021] [Indexed: 11/24/2022] Open
Abstract
Species trees, which describe the evolutionary relationships between species, are often inferred from gene trees, which describe the ancestral relationships between sequences sampled at different loci from the species of interest. A common approach to inferring species trees from gene trees is motivated by supposing that gene tree variation is due to incomplete lineage sorting, also known as deep coalescence. One of the earliest methods motivated by deep coalescence is to find the species tree that minimizes the number of deep coalescent events needed to explain discrepancies between the species tree and input gene trees. This minimize deep coalescence (MDC) criterion can be applied in both rooted and unrooted settings. where either rooted or unrooted gene trees can be used to infer a rooted species tree. Previous work has shown that MDC is statistically inconsistent in the rooted setting, meaning that under a probabilistic model for deep coalescence, the multispecies coalescent, for some species trees, increasing the number of input gene trees does not make the method more likely to return a correct species tree. Here, we obtain analogous results in the unrooted setting, showing conditions leading to inconsistency of the MDC criterion using the multispecies coalescent model with unrooted gene trees for four taxa and five taxa.
Collapse
Affiliation(s)
- Ayed A. R. Alanzi
- Mathematics Department, College of Science and Human Studies of Hotat Sudair, Majmaah University, Majmaah, Saudi Arabia
| | - James H. Degnan
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM, United States of America
- * E-mail:
| |
Collapse
|
27
|
Farah IT, Islam MM, Zinat KT, Rahman AH, Bayzid MS. Species tree estimation from gene trees by minimizing deep coalescence and maximizing quartet consistency: a comparative study and the presence of pseudo species tree terraces. Syst Biol 2021; 70:1213-1231. [PMID: 33844023 DOI: 10.1093/sysbio/syab026] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 03/25/2021] [Accepted: 03/29/2021] [Indexed: 11/14/2022] Open
Abstract
Species tree estimation from multi-locus datasets is extremely challenging, especially in the presence of gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Summary methods have been developed which estimate gene trees and then combine the gene trees to estimate a species tree by optimizing various optimization scores. In this study, we have extended and adapted the concept of phylogenetic terraces to species tree estimation by "summarizing" a set of gene trees, where multiple species trees with distinct topologies may have exactly the same optimality score (i.e., quartet score, extra lineage score, etc.). We particularly investigated the presence and impacts of equally optimal trees in species tree estimation from multi-locus data using summary methods by taking ILS into account. We analyzed two of the most popular ILS-aware optimization criteria: maximize quartet consistency (MQC) and minimize deep coalescence (MDC). Methods based on MQC are provably statistically consistent, whereas MDC is not a consistent criterion for species tree estimation. We present a comprehensive comparative study of these two optimality criteria. Our experiments, on a collection of datasets simulated under ILS, indicate that MDC may result in competitive or identical quartet consistency score as MQC, but could be significantly worse than MQC in terms of tree accuracy - demonstrating the presence and impacts of equally optimal species trees. This is the first known study that provides the conditions for the datasets to have equally optimal trees in the context of phylogenomic inference using summary methods.
Collapse
Affiliation(s)
- Ishrat Tanzila Farah
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh
| | - Md Muktadirul Islam
- Applied Statistics and Data Science (ASDS), Department of Statistics Jahangirnagar University Dhaka-1342, Bangladesh
| | - Kazi Tasnim Zinat
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh.,Department of Computer Science University of Maryland, College Park, Maryland, USA
| | - Atif Hasan Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh
| |
Collapse
|
28
|
Kim A, Degnan JH. Heuristics for unrooted, unranked, and ranked anomaly zones under birth-death models. Mol Phylogenet Evol 2021; 161:107162. [PMID: 33831548 DOI: 10.1016/j.ympev.2021.107162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2019] [Revised: 10/21/2020] [Accepted: 03/23/2021] [Indexed: 10/21/2022]
Abstract
Species trees that can generate a nonmatching gene tree topology that is more probable than the topology matching the species tree are said to be in an anomaly zone. We introduce some heuristic approaches to infer whether species trees are in anomaly zones when it is difficult or impossible to compute the entire distribution of gene tree topologies. Here, probabilities of unrooted, unranked, and ranked gene tree topologies under the multispecies coalescent are used. A ranked tree can be viewed as an unranked tree with a temporal ordering of its internal nodes. Overall, considering probabilities of unrooted or unranked gene tree topologies within one nearest neighbor interchange from the species tree topology is a reasonable heuristic to infer the existence of anomalous unrooted or unranked gene trees, respectively. We investigated a test proposed by Linkem et al. (2016) which classifies a species tree as being in an unranked anomaly zone if there is a subset of four taxa in an unranked anomaly zone. We find this test to have high true positive rates, but it can also have high false positive rates. For ranked trees, because at least one of the most probable ranked gene tree topologies must have the same unranked topology as the species tree, we propose to use only those ranked gene trees that have topologies that match the unranked species tree topology. We find that the probability that the species tree is in unrooted and unranked anomaly zones tends to increase with the speciation rate, and the probability of all three types of anomaly zones increases rapidly with the number of taxa. We find that probabilities that species trees are in an anomaly zone can be quite high for moderately high speciation rates.
Collapse
Affiliation(s)
- Anastasiia Kim
- Department of Mathematics and Statistics, University of New Mexico, United States
| | - James H Degnan
- Department of Mathematics and Statistics, University of New Mexico, United States
| |
Collapse
|
29
|
Terraneo TI, Benzoni F, Arrigoni R, Baird AH, Mariappan KG, Forsman ZH, Wooster MK, Bouwmeester J, Marshell A, Berumen ML. Phylogenomics of Porites from the Arabian Peninsula. Mol Phylogenet Evol 2021; 161:107173. [PMID: 33813021 DOI: 10.1016/j.ympev.2021.107173] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 03/25/2021] [Accepted: 03/29/2021] [Indexed: 11/16/2022]
Abstract
The advent of high throughput sequencing technologies provides an opportunity to resolve phylogenetic relationships among closely related species. By incorporating hundreds to thousands of unlinked loci and single nucleotide polymorphisms (SNPs), phylogenomic analyses have a far greater potential to resolve species boundaries than approaches that rely on only a few markers. Scleractinian taxa have proved challenging to identify using traditional morphological approaches and many groups lack an adequate set of molecular markers to investigate their phylogenies. Here, we examine the potential of Restriction-site Associated DNA sequencing (RADseq) to investigate phylogenetic relationships and species limits within the scleractinian coral genus Porites. A total of 126 colonies were collected from 16 localities in the seas surrounding the Arabian Peninsula and ascribed to 12 nominal and two unknown species based on their morphology. Reference mapping was used to retrieve and compare nearly complete mitochondrial genomes, ribosomal DNA, and histone loci. De novo assembly and reference mapping to the P. lobata coral transcriptome were compared and used to obtain thousands of genome-wide loci and SNPs. A suite of species discovery methods (phylogenetic, ordination, and clustering analyses) and species delimitation approaches (coalescent-based, species tree, and Bayesian Factor delimitation) suggested the presence of eight molecular lineages, one of which included six morphospecies. Our phylogenomic approach provided a fully supported phylogeny of Porites from the Arabian Peninsula, suggesting the power of RADseq data to solve the species delineation problem in this speciose coral genus.
Collapse
Affiliation(s)
- Tullia I Terraneo
- Red Sea Research Centre, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville 4811, QLD, Australia.
| | - Francesca Benzoni
- Red Sea Research Centre, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Roberto Arrigoni
- Red Sea Research Centre, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; European Commission, Joint Research Centre (JRC), Ispra, Italy; Department of Biology and Evolution of Marine Organisms (BEOM), Stazione Zoologica Anton Dohrn Napoli, Villa Comunale, 80121 Napoli, Italy
| | - Andrew H Baird
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville 4811, QLD, Australia
| | - Kiruthiga G Mariappan
- Red Sea Research Centre, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Zac H Forsman
- Hawaii Institute of Marine Biology, Kaneohe 96744, HI, USA
| | - Michael K Wooster
- Red Sea Research Centre, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | | | - Alyssa Marshell
- Department of Marine Science and Fisheries, College of Agricultural and Marine Sciences, Sultan Qaboos University, Muscat, Oman
| | - Michael L Berumen
- Red Sea Research Centre, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
30
|
Ayoola AO, Zhang BL, Meisel RP, Nneji LM, Shao Y, Morenikeji OB, Adeola AC, Ng’ang’a SI, Ogunjemite BG, Okeyoyin AO, Roos C, Wu DD. Population Genomics Reveals Incipient Speciation, Introgression, and Adaptation in the African Mona Monkey (Cercopithecus mona). Mol Biol Evol 2021; 38:876-890. [PMID: 32986826 PMCID: PMC7947840 DOI: 10.1093/molbev/msaa248] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Guenons (tribe Cercopithecini) are the most widely distributed nonhuman primate in the tropical forest belt of Africa and show considerable phenotypic, taxonomic, and ecological diversity. However, genomic information for most species within this group is still lacking. Here, we present a high-quality de novo genome (total 2.90 Gb, contig N50 equal to 22.7 Mb) of the mona monkey (Cercopithecus mona), together with genome resequencing data of 13 individuals sampled across Nigeria. Our results showed differentiation between populations from East and West of the Niger River ∼84 ka and potential ancient introgression in the East population from other mona group species. The PTPRK, FRAS1, BNC2, and EDN3 genes related to pigmentation displayed signals of introgression in the East population. Genomic scans suggest that immunity genes such as AKT3 and IL13 (possibly involved in simian immunodeficiency virus defense), and G6PD, a gene involved in malaria resistance, are under positive natural selection. Our study gives insights into differentiation, natural selection, and introgression in guenons.
Collapse
Affiliation(s)
- Adeola Oluwakemi Ayoola
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Kunming College of Life Science, University of the Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Bao-Lin Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Richard P Meisel
- Department of Biology and Biochemistry, University of Houston, Houston, TX
| | - Lotanna M Nneji
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Sino-Africa Joint Research Center, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Olanrewaju B Morenikeji
- Department of Biomedical Sciences, Rochester Institute of Technology, Rochester, NY
- Department of Biology, Hamilton College, Clinton, NY
| | - Adeniyi C Adeola
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Sino-Africa Joint Research Center, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Said I Ng’ang’a
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Kunming College of Life Science, University of the Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Babafemi G Ogunjemite
- Department of Ecotourism and Wildlife Management, Federal University of Technology, Akure, Nigeria
| | - Agboola O Okeyoyin
- National Park Service Headquarters, Federal Capital Territory, Abuja, Nigeria
| | - Christian Roos
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
| | - Dong-Dong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| |
Collapse
|
31
|
Kim A, Degnan JH. PRANC: ML species tree estimation from the ranked gene trees under coalescence. Bioinformatics 2021; 36:4819-4821. [PMID: 32609371 DOI: 10.1093/bioinformatics/btaa605] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 06/16/2020] [Accepted: 06/23/2020] [Indexed: 11/12/2022] Open
Abstract
SUMMARY PRANC computes the Probabilities of RANked gene tree topologies under the multispecies coalescent. A ranked gene tree is a gene tree accounting for the temporal ordering of internal nodes. PRANC can also estimate the maximum likelihood (ML) species tree from a sample of ranked or unranked gene tree topologies. It estimates the ML tree with estimated branch lengths in coalescent units. AVAILABILITY AND IMPLEMENTATION PRANC is written in C++ and freely available at github.com/anastasiiakim/PRANC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anastasiia Kim
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87106, USA
| | - James H Degnan
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87106, USA
| |
Collapse
|
32
|
Allman ES, Mitchell JD, Rhodes JA. Gene tree discord, simplex plots, and statistical tests under the coalescent. Syst Biol 2021; 71:929-942. [PMID: 33560348 DOI: 10.1093/sysbio/syab008] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 01/31/2021] [Accepted: 02/03/2021] [Indexed: 02/06/2023] Open
Abstract
A simple graphical device, the simplex plot of quartet concordance factors, is introduced to aid in the exploration of a collection of gene trees on a common set of taxa. A single plot summarizes all gene tree discord, and allows for visual comparison to the expected discord from the multispecies coalescent model (MSC) of incomplete lineage sorting on a species tree. A formal statistical procedure is described that can quantify the deviation from expectation for each subset of four taxa, suggesting when the data is not in accord with the MSC, and thus that either gene tree inference error is substantial or a more complex model such as that on a network may be required. If the collection of gene trees is in accord with the MSC, the plots reveal when substantial incomplete lineage sorting is present. Applications to both simulated and empirical multilocus data sets illustrate the insights provided.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99709, USA
| | - Jonathan D Mitchell
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99709, USA.,Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99709, USA
| |
Collapse
|
33
|
Collapsing dubiously resolved gene-tree branches in phylogenomic coalescent analyses. Mol Phylogenet Evol 2021; 158:107092. [PMID: 33545272 DOI: 10.1016/j.ympev.2021.107092] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 12/30/2020] [Accepted: 01/28/2021] [Indexed: 01/15/2023]
Abstract
In two-step coalescent analyses of phylogenomic data, gene-tree topologies are treated as fixed prior to species-tree inference. Although all gene-tree conflict is assumed to be caused by lineage sorting when applying these methods, in empirical datasets much of the conflict can be caused by estimation error. Weakly supported and even arbitrarily resolved clades are important sources of this estimation error for gene trees inferred from few informative characters relative to the number of sampled terminals, and the resulting extraneous conflict among gene trees can negatively impact species-tree inference. In this study, we quantified the relative severity of alternative methods for collapsing gene-tree branches for seven empirical datasets and quantified their effects on species-tree inference. The branch-collapsing methods that we employed were based on the strict consensus of optimal topologies, various bootstrap thresholds, and 0% approximate likelihood ratio test (SH-like aLRT) support. Up to 86% of internal gene-tree branches are dubiously or arbitrarily resolved in reanalyses of these published phylogenomic datasets, and collapsing these branches increased inferred species-tree coalescent branch lengths by up to 455%. For two datasets, the longer inferred branch lengths sometimes impacted inference of anomaly-zone conditions. Although branch-collapsing methods did not consistently affect the species-tree topology, they often increased branch support. The more severe and clearly justified gene-tree branch-collapsing methods, which we recommend be broadly applied for two-step coalescent analyses, are use of the strict consensus in parsimony analyses and the collapse clades with 0% SH-like aLRT support in likelihood analyses. Collapsing dubiously or arbitrarily resolved branches in gene trees sometimes improved congruence between coalescent-based results and concatenation trees. In such cases, we contend that the resolution provided by concatenation should be preferred and that incomplete lineage sorting is a poor explanation for the initial conflict between phylogenetic approaches.
Collapse
|
34
|
New Approaches for Inferring Phylogenies in the Presence of Paralogs. Trends Genet 2021; 37:174-187. [DOI: 10.1016/j.tig.2020.08.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/13/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022]
|
35
|
|
36
|
Zhu T, Yang Z. Complexity of the simplest species tree problem. Mol Biol Evol 2021; 38:3993-4009. [PMID: 33492385 PMCID: PMC8382899 DOI: 10.1093/molbev/msab009] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 01/04/2021] [Accepted: 01/13/2021] [Indexed: 02/06/2023] Open
Abstract
The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.
Collapse
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Ziheng Yang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
37
|
Gwee CY, Garg KM, Chattopadhyay B, Sadanandan KR, Prawiradilaga DM, Irestedt M, Lei F, Bloch LM, Lee JGH, Irham M, Haryoko T, Soh MCK, Peh KSH, Rowe KMC, Ferasyi TR, Wu S, Wogan GOU, Bowie RCK, Rheindt FE. Phylogenomics of white-eyes, a 'great speciator', reveals Indonesian archipelago as the center of lineage diversity. eLife 2020; 9:e62765. [PMID: 33350381 PMCID: PMC7775107 DOI: 10.7554/elife.62765] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 12/21/2020] [Indexed: 01/09/2023] Open
Abstract
Archipelagoes serve as important 'natural laboratories' which facilitate the study of island radiations and contribute to the understanding of evolutionary processes. The white-eye genus Zosterops is a classical example of a 'great speciator', comprising c. 100 species from across the Old World, most of them insular. We achieved an extensive geographic DNA sampling of Zosterops by using historical specimens and recently collected samples. Using over 700 genome-wide loci in conjunction with coalescent species tree methods and gene flow detection approaches, we untangled the reticulated evolutionary history of Zosterops, which comprises three main clades centered in Indo-Africa, Asia, and Australasia, respectively. Genetic introgression between species permeates the Zosterops phylogeny, regardless of how distantly related species are. Crucially, we identified the Indonesian archipelago, and specifically Borneo, as the major center of diversity and the only area where all three main clades overlap, attesting to the evolutionary importance of this region.
Collapse
Affiliation(s)
- Chyi Yin Gwee
- National University of Singapore, Department of Biological SciencesSingaporeSingapore
| | - Kritika M Garg
- National University of Singapore, Department of Biological SciencesSingaporeSingapore
| | - Balaji Chattopadhyay
- National University of Singapore, Department of Biological SciencesSingaporeSingapore
| | - Keren R Sadanandan
- National University of Singapore, Department of Biological SciencesSingaporeSingapore
- Max Planck Institute for OrnithologySeewiesenGermany
| | - Dewi M Prawiradilaga
- Division of Zoology, Research Center for Biology, Indonesian Institute of Sciences (LIPI), Cibinong Science CenterCibinongIndonesia
| | - Martin Irestedt
- Department of Bioinformatics and Genetics, Swedish Museum of Natural HistoryStockholmSweden
| | - Fumin Lei
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of SciencesBeijingChina
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of SciencesKunmingChina
| | - Luke M Bloch
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, BerkeleyBerkeleyUnited States
| | | | - Mohammad Irham
- Division of Zoology, Research Center for Biology, Indonesian Institute of Sciences (LIPI), Cibinong Science CenterCibinongIndonesia
| | - Tri Haryoko
- Division of Zoology, Research Center for Biology, Indonesian Institute of Sciences (LIPI), Cibinong Science CenterCibinongIndonesia
| | - Malcolm CK Soh
- University of Western Australia, School of Biological SciencesPerthAustralia
| | - Kelvin S-H Peh
- University of Southampton, School of Biological Sciences, UniversitySouthamptonUnited Kingdom
| | - Karen MC Rowe
- Sciences Department, Museums VictoriaMelbourneAustralia
| | - Teuku Reza Ferasyi
- Faculty of Veterinary Medicine, Universitas Syiah KualaDarussalamIndonesia
- Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics, School of Life Sciences, Jiangsu Normal UniversityXuzhouChina
| | - Shaoyuan Wu
- Department of Biochemistry and Molecular Biology, 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical UniversityTianjinChina
- Center for Tropical Veterinary Studies – One Health Collaboration Center, Universitas Syiah KualaDarussalamIndonesia
| | - Guinevere OU Wogan
- Museum of Vertebrate Zoology and Department of Environmental Science, Policy, and Management, University of California, BerkeleyBerkeleyUnited States
| | - Rauri CK Bowie
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, BerkeleyBerkeleyUnited States
| | - Frank E Rheindt
- National University of Singapore, Department of Biological SciencesSingaporeSingapore
| |
Collapse
|
38
|
Murphy WJ, Foley NM, Bredemeyer KR, Gatesy J, Springer MS. Phylogenomics and the Genetic Architecture of the Placental Mammal Radiation. Annu Rev Anim Biosci 2020; 9:29-53. [PMID: 33228377 DOI: 10.1146/annurev-animal-061220-023149] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genomes of placental mammals are being sequenced at an unprecedented rate. Alignments of hundreds, and one day thousands, of genomes spanning the rich living and extinct diversity of species offer unparalleled power to resolve phylogenetic controversies, identify genomic innovations of adaptation, and dissect the genetic architecture of reproductive isolation. We highlight outstanding questions about the earliest phases of placental mammal diversification and the promise of newer methods, as well as remaining challenges, toward using whole genome data to resolve placental mammal phylogeny. The next phase of mammalian comparative genomics will see the completion and application of finished-quality, gapless genome assemblies from many ordinal lineages and closely related species. Interspecific comparisons between the most hypervariable genomic loci will likely reveal large, but heretofore mostly underappreciated, effects on population divergence, morphological innovation, and the origin of new species.
Collapse
Affiliation(s)
- William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology and Organismal Biology, University of California, Riverside, California 92521, USA
| |
Collapse
|
39
|
Rhodes JA. Topological Metrizations of Trees, and New Quartet Methods of Tree Inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2107-2118. [PMID: 31095496 PMCID: PMC7650847 DOI: 10.1109/tcbb.2019.2917204] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Topological phylogenetic trees can be assigned edge weights in several natural ways, highlighting different aspects of the tree. Here, the rooted triple and quartet metrizations are introduced, and applied to formulate novel methods of inferring large trees from rooted triple and quartet data. These methods lead to new statistically consistent procedures for inference of a species tree from gene trees under the multispecies coalescent model.
Collapse
|
40
|
Zhang C, Scornavacca C, Molloy EK, Mirarab S. ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy. Mol Biol Evol 2020; 37:3292-3307. [PMID: 32886770 PMCID: PMC7751180 DOI: 10.1093/molbev/msaa139] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, University of California San Diego, San Diego, CA
| | | | - Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA
| |
Collapse
|
41
|
Costa TAS, Sales JBL, Markaida U, Granados-Amores J, Gales SM, Sampaio I, Vallinoto M, Rodrigues-Filho LFS, Ready JS. Revisiting the phylogeny of the genus Lolliguncula Steenstrup 1881 improves understanding of their biogeography and proves the validity of Lolliguncula argus Brakoniecki & Roper, 1985. Mol Phylogenet Evol 2020; 154:106968. [PMID: 33031931 DOI: 10.1016/j.ympev.2020.106968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 09/14/2020] [Accepted: 09/28/2020] [Indexed: 12/01/2022]
Abstract
The biogeography of American loliginid squids has been improved in recent years, but certain key taxa have been missing. Given that the most accurate phylogenies and estimates of divergence times of common ancestors depend heavily on good taxonomic coverage we have reanalyzed the genus Lolliguncula in light of new samples that increase the geographic and taxonomic coverage. New sequences were produced using standard methods to update an existing dataset for COI, 16S and Rhodopsin markers. Data was analyzed using various species delimitation methods, rigorous phylogenetic analyses and estimates of divergence times between clades. Within Lolliguncula we recover five monophyletic lineages that relate to the known species L. argus, L. diomedeae, L. panamensis, L. brevis North Atlantic and L. brevis South Atlantic. Except when using low divergence thresholds in ABGD, species delimitation methods only identify four of these lineages as distinct species, grouping L. argus and L. diomedeae as a single species. However, considering the reciprocal monophyly, recent divergence time estimate and morphological diagnoses we refrain from synonymizing L. argus within L. diomedeae, considering them very recently diverged species. The biogeography of the American loliginids is discussed, wherein basal cladogenesis in both Lolliguncula and Doryteuthis occur between the Atlantic and Pacific about 45 mya, with subsequent speciation around 20 mya associated with seafloor changes during the formation of the Caribbean. The recent speciation between L. argus and L. diomedeae is associated to oceanic environmental changes associated with glaciation, deep sea cooling and tropical upwelling.
Collapse
Affiliation(s)
- Tarcisio A S Costa
- Federal University of Pará, Faculty for Biological Sciences, Alameda Leandro Ribeiro, 68600-000 Bragança, PA, Brazil; Federal University of Pará, Aquatic Molecular Biology Laboratory, Center for Advanced Biodiversity Studies (CEABIO), Av. Perimetral da Ciência, km 01, PCT-Guamá, Lot 11, 66075-750 Belém, PA, Brazil
| | - João B L Sales
- Federal University of Pará, Aquatic Molecular Biology Laboratory, Center for Advanced Biodiversity Studies (CEABIO), Av. Perimetral da Ciência, km 01, PCT-Guamá, Lot 11, 66075-750 Belém, PA, Brazil.
| | - Unai Markaida
- Línea de Pesquerías Artesanales, EL Colegio da la Frontera Sur, Lerma, Campeche, Mexico
| | - Jasmin Granados-Amores
- Universidad Autónoma de Nayarit-Escuela Nacional de Ingeniería Pesquera, San Blas, Nayarit, Mexico
| | - Suellen M Gales
- Federal University of Pará, Aquatic Molecular Biology Laboratory, Center for Advanced Biodiversity Studies (CEABIO), Av. Perimetral da Ciência, km 01, PCT-Guamá, Lot 11, 66075-750 Belém, PA, Brazil
| | - Iracilda Sampaio
- Federal University of Pará, Faculty for Biological Sciences, Alameda Leandro Ribeiro, 68600-000 Bragança, PA, Brazil
| | - Marcelo Vallinoto
- Federal University of Pará, Faculty for Biological Sciences, Alameda Leandro Ribeiro, 68600-000 Bragança, PA, Brazil
| | | | - Jonathan S Ready
- Federal University of Pará, Aquatic Molecular Biology Laboratory, Center for Advanced Biodiversity Studies (CEABIO), Av. Perimetral da Ciência, km 01, PCT-Guamá, Lot 11, 66075-750 Belém, PA, Brazil
| |
Collapse
|
42
|
Patterns of genetic partitioning and gene flow in the endangered San Bernardino kangaroo rat (Dipodomys merriami parvus) and implications for conservation management. CONSERV GENET 2020. [DOI: 10.1007/s10592-020-01289-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
43
|
Bhattacharjee A, Bayzid MS. Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices. BMC Genomics 2020; 21:497. [PMID: 32689946 PMCID: PMC7370488 DOI: 10.1186/s12864-020-06892-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 07/07/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND With the rapid growth rate of newly sequenced genomes, species tree inference from genes sampled throughout the whole genome has become a basic task in comparative and evolutionary biology. However, substantial challenges remain in leveraging these large scale molecular data. One of the foremost challenges is to develop efficient methods that can handle missing data. Popular distance-based methods, such as NJ (neighbor joining) and UPGMA (unweighted pair group method with arithmetic mean) require complete distance matrices without any missing data. RESULTS We introduce two highly accurate machine learning based distance imputation techniques. These methods are based on matrix factorization and autoencoder based deep learning architectures. We evaluated these two methods on a collection of simulated and biological datasets. Experimental results suggest that our proposed methods match or improve upon the best alternate distance imputation techniques. Moreover, these methods are scalable to large datasets with hundreds of taxa, and can handle a substantial amount of missing data. CONCLUSIONS This study shows, for the first time, the power and feasibility of applying deep learning techniques for imputing distance matrices. Thus, this study advances the state-of-the-art in phylogenetic tree construction in the presence of missing data. The proposed methods are available in open source form at https://github.com/Ananya-Bhattacharjee/ImputeDistances .
Collapse
Affiliation(s)
- Ananya Bhattacharjee
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205 Bangladesh
- Department of Computer Science and Engineering, Eastern University, Dhaka, Bangladesh
| | - Md. Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205 Bangladesh
| |
Collapse
|
44
|
Yourdkhani S, Rhodes JA. Inferring Metric Trees from Weighted Quartets via an Intertaxon Distance. Bull Math Biol 2020; 82:97. [PMID: 32676801 DOI: 10.1007/s11538-020-00773-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 07/02/2020] [Indexed: 11/24/2022]
Abstract
A metric phylogenetic tree relating a collection of taxa induces weighted rooted triples and weighted quartets for all subsets of three and four taxa, respectively. New intertaxon distances are defined that can be calculated from these weights, and shown to exactly fit the same tree topology, but with edge weights rescaled by certain factors dependent on the associated split size. These distances are analogs for metric trees of similar ones recently introduced for topological trees that are based on induced unweighted rooted triples and quartets. The distances introduced here lead to new statistically consistent methods of inferring a metric species tree from a collection of topological gene trees generated under the multispecies coalescent model of incomplete lineage sorting. Simulations provide insight into their potential.
Collapse
Affiliation(s)
- Samaneh Yourdkhani
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, 99775, USA
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, 99775, USA.
| |
Collapse
|
45
|
Abstract
In this review, we discuss the current status and future challenges for fully elucidating the fungal tree of life. In the last 15 years, advances in genomic technologies have revolutionized fungal systematics, ushering the field into the phylogenomic era. This has made the unthinkable possible, namely access to the entire genetic record of all known extant taxa. We first review the current status of the fungal tree and highlight areas where additional effort will be required. We then review the analytical challenges imposed by the volume of data and discuss methods to recover the most accurate species tree given the sea of gene trees. Highly resolved and deeply sampled trees are being leveraged in novel ways to study fungal radiations, species delimitation, and metabolic evolution. Finally, we discuss the critical issue of incorporating the unnamed and uncultured dark matter taxa that represent the vast majority of fungal diversity.
Collapse
Affiliation(s)
- Timothy Y James
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan 48109, USA;
| | - Jason E Stajich
- Department of Microbiology and Plant Pathology, Institute for Integrative Genome Biology, University of California, Riverside, California 92521, USA;
| | - Chris Todd Hittinger
- Laboratory of Genetics, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, Center for Genomic Science and Innovation, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, Wisconsin 53726, USA;
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37235, USA;
| |
Collapse
|
46
|
Yin J, Zhang C, Mirarab S. ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization. Bioinformatics 2020; 35:3961-3969. [PMID: 30903685 DOI: 10.1093/bioinformatics/btz211] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 03/12/2019] [Accepted: 03/21/2019] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION Evolutionary histories can change from one part of the genome to another. The potential for discordance between the gene trees has motivated the development of summary methods that reconstruct a species tree from an input collection of gene trees. ASTRAL is a widely used summary method and has been able to scale to relatively large datasets. However, the size of genomic datasets is quickly growing. Despite its relative efficiency, the current single-threaded implementation of ASTRAL is falling behind the data growth trends is not able to analyze the largest available datasets in a reasonable time. RESULTS ASTRAL uses dynamic programing and is not trivially parallel. In this paper, we introduce ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps. Importantly, ASTRAL-MP can take advantage of not just multiple CPU cores but also one or several graphics processing units (GPUs). The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158× speedups compared to ASTRAL-III. Using GPUs and multiple cores, ASTRAL-MP is able to analyze datasets with 10 000 species or datasets with more than 100 000 genes in <2 days. AVAILABILITY AND IMPLEMENTATION ASTRAL-MP is available at https://github.com/smirarab/ASTRAL/tree/MP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- John Yin
- Department of Mathematics, University of California at San Diego, La Jolla, CA, USA
| | - Chao Zhang
- Bioinformatics and Systems Biology, University of California at San Diego, La Jolla, CA, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA, USA
| |
Collapse
|
47
|
Wen J, Yu Y, Xie DF, Peng C, Liu Q, Zhou SD, He XJ. A transcriptome-based study on the phylogeny and evolution of the taxonomically controversial subfamily Apioideae (Apiaceae). ANNALS OF BOTANY 2020; 125:937-953. [PMID: 32016402 PMCID: PMC7218814 DOI: 10.1093/aob/mcaa011] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 01/28/2020] [Indexed: 05/26/2023]
Abstract
BACKGROUND AND AIMS A long-standing controversy in the subfamily Apioideae concerns relationships among the major lineages, which has prevented a comprehensive study of their fruits and evolutionary history. Here we use single copy genes (SCGs) generated from transcriptome datasets to generate a reliable species tree and explore the evolutionary history of Apioideae. METHODS In total, 3351 SCGs were generated from 27 transcriptome datasets and one genome, and further used for phylogenetic analysis using coalescent-based methods. Fruit morphology and anatomy were studied in combination with the species tree. Eleven SCGs were screened out for dating analysis with two fossils selected for calibration. KEY RESULTS A well-supported species tree was generated with a topology [Chamaesieae, (Bupleureae, (Pleurospermeae, (Physospermopsis Clade, (Group C, (Group A, Group B)))))] that differed from previous trees. Daucinae and Torilidinae were not in the tribe Scandiceae and existed as sister groups to the Acronema Clade. Five branches (I-V) of the species tree showed low quartet support but strong local posterior probabilities. Dating analysis suggested that Apioideae originated around 56.64 Mya (95 % highest posterior density interval, 45.18-73.53 Mya). CONCLUSIONS This study resolves a controversial phylogenetic relationship in Apioideae based on 3351 SCGs and coalescent-based species tree estimation methods. Gene trees that contributed to the species tree may undergoing rapid evolutionary divergence and incomplete lineage sorting. Fruits of Apioideae might have evolved in two directions, anemochorous and hydrochorous, with epizoochorous as a derived mode. Molecular and morphological evidence suggests that Daucinae and Torilidinae should be restored to the tribe level. Our results provide new insights into the morphological evolution of this subfamily, which may contribute to a better understanding of species diversification in Apioideae. Molecular dating analysis suggests that uplift of the Qinghai-Tibetan Plateau (QTP) and climate changes probably drove rapid speciation and diversification of Apioideae in the QTP region.
Collapse
Affiliation(s)
- Jun Wen
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, P.R. China
- Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, Sichuan, P.R. China
| | - Yan Yu
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, P.R. China
| | - Deng-Feng Xie
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, P.R. China
| | - Chang Peng
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, P.R. China
| | - Qing Liu
- Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, Sichuan, P.R. China
| | - Song-Dong Zhou
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, P.R. China
| | - Xing-Jin He
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, P.R. China
| |
Collapse
|
48
|
Rabiee M, Mirarab S. INSTRAL: Discordance-Aware Phylogenetic Placement Using Quartet Scores. Syst Biol 2020; 69:384-391. [PMID: 31290974 DOI: 10.1093/sysbio/syz045] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Accepted: 07/02/2019] [Indexed: 11/13/2022] Open
Abstract
Phylogenomic analyses have increasingly adopted species tree reconstruction using methods that account for gene tree discordance using pipelines that require both human effort and computational resources. As the number of available genomes continues to increase, a new problem is facing researchers. Once more species become available, they have to repeat the whole process from the beginning because updating species trees is currently not possible. However, the de novo inference can be prohibitively costly in human effort or machine time. In this article, we introduce INSTRAL, a method that extends ASTRAL to enable phylogenetic placement. INSTRAL is designed to place a new species on an existing species tree after sequences from the new species have already been added to gene trees; thus, INSTRAL is complementary to existing placement methods that update gene trees. [ASTRAL; ILS; phylogenetic placement; species tree reconstruction.].
Collapse
Affiliation(s)
- Maryam Rabiee
- Department of Computer Science and Engineering, UC San Diego, La Jolla, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
49
|
Perea S, Sousa‐Santos C, Robalo J, Doadrio I. Multilocus phylogeny and systematics of Iberian endemicSqualius(Actinopterygii, Leuciscidae). ZOOL SCR 2020. [DOI: 10.1111/zsc.12420] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Silvia Perea
- Department of Biodiversity and Evolutionary Biology Museo Nacional de Ciencias Naturales - CSIC Madrid Spain
| | - Carla Sousa‐Santos
- MARE – Marine and Environmental Sciences Centre ISPA‐Instituto Universitário Lisbon Portugal
| | - Joana Robalo
- MARE – Marine and Environmental Sciences Centre ISPA‐Instituto Universitário Lisbon Portugal
| | - Ignacio Doadrio
- Department of Biodiversity and Evolutionary Biology Museo Nacional de Ciencias Naturales - CSIC Madrid Spain
| |
Collapse
|
50
|
Looney BP, Adamčík S, Matheny PB. Coalescent-based delimitation and species-tree estimations reveal Appalachian origin and Neogene diversification in Russula subsection Roseinae. Mol Phylogenet Evol 2020; 147:106787. [PMID: 32165159 DOI: 10.1016/j.ympev.2020.106787] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 03/04/2020] [Accepted: 03/06/2020] [Indexed: 11/19/2022]
Abstract
Numerous lineages of mushroom-forming fungi have been subject to bursts of diversification throughout their evolutionary history, events that can impact our ability to infer well-resolved phylogenies. However, groups that have undergone quick genetic change may have the highest adaptive potential. As the second largest genus of mushroom-forming fungi, Russula provides an excellent model for studying hyper-diversification and processes in evolution that drives it. This study focuses on the morphologically defined group - Russula subsection Roseinae. Species hypotheses based on morphological differentiation and multi-locus phylogenetic analyses are tested in the Roseinae using different applications of the multi-species coalescent model. Based on this combined approach, we recognize fourteen species in Roseinae including the Albida and wholly novel Magnarosea clades. Reconstruction of biogeographic and host association history suggest that parapatric speciation in refugia during glacial cycles of the Pleistocene drove diversification within the Roseinae, which is found to have a Laurasian distribution with an evolutionary origin in the Appalachian Mountains of eastern North America. Finally, we detect jump dispersal at a continental scale that has driven diversification since the most recent glacial cycles.
Collapse
Affiliation(s)
- Brian P Looney
- University of Tennessee, Department of Ecology and Evolutionary Biology, Knoxville, TN 37996, USA.
| | - Slavomír Adamčík
- Plant Science and Biodiversity Centre, Slovak Academy of Sciences, 84523 Bratislava, Slovakia
| | - P Brandon Matheny
- University of Tennessee, Department of Ecology and Evolutionary Biology, Knoxville, TN 37996, USA
| |
Collapse
|