1
|
Hurtado-Gómez JP, Vargas-Ramírez M, Iverson JB, Joyce WG, McCranie JR, Paetzold C, Fritz U. Diversity and biogeography of South American mud turtles elucidated by multilocus DNA sequencing (Testudines: Kinosternidae). Mol Phylogenet Evol 2024; 197:108083. [PMID: 38679303 DOI: 10.1016/j.ympev.2024.108083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 04/02/2024] [Accepted: 04/24/2024] [Indexed: 05/01/2024]
Abstract
Kinosternon is the most speciose genus of extant turtles, with 22 currently recognized species, distributed across large parts of the Americas. Most species have small distributions, but K. leucostomum and K. scorpioides range from Mexico to South America. Previous studies have found discordance between mitochondrial and nuclear phylogenies in some kinosternid groups, with the current taxonomy following the nuclear-based results. Herein, based on extended molecular, geographic, and taxonomic sampling, we explore the phylogeographic structure and taxonomic limits for K. leucostomum and the K. scorpioides group and present a fossil-calibrated nuclear time tree for Kinosternon. Our results reveal contrasting differentiation patterns for the K. scorpioides group and K. leucostomum, despite overlapping distributions. Kinosternon leucostomum shows only shallow geographic divergence, whereas the K. scorpioides group is polyphyletic with up to 10 distinct taxa, some of them undescribed. We support the elevation of K. s. albogulare and K. s. cruentatum to species level. Given the deep divergence within the genus Kinosternon, we propose the recognition of three subgenera, Kinosternon, Cryptochelys and Thyrosternum, and the abandonment of the group-based classification, at least for the K. leucostomum and K. scorpioides groups. Our results show an initial split in Kinosternon that gave rise to two main radiations, one Nearctic and one mainly Neotropical. Most speciation events in Kinosternon occurred during the Quaternary and we hypothesize that they were mediated by both climatic and geological events. Additionally, our data imply that at least three South American colonizations occurred, two in the K. leucostomum group, and one in the K. scorpioides group. Additionally, we hypothesize that discordance between mitochondrial and nuclear phylogenetic signal is due to mitochondrial capture from an extinct kinosternine lineage.
Collapse
Affiliation(s)
| | - Mario Vargas-Ramírez
- Grupo Biodiversidad y Conservación Genética, Instituto de Genética, Universidad Nacional de Colombia, Bogotá, Colombia; Estación de Biología Tropical Roberto Franco (EBTRF), Universidad Nacional de Colombia, Villavicencio, Colombia
| | - John B Iverson
- Department of Biology, Earlham College, Richmond, IN 47374, USA
| | - Walter G Joyce
- Department of Geosciences, University of Fribourg, 1700 Fribourg, Switzerland
| | - James R McCranie
- Smithsonian Research Associate, 10770 SW 164th Street, Miami, FL 33157, USA
| | - Claudia Paetzold
- Museum of Zoology, Senckenberg Natural History Collections Dresden, 01109 Dresden, Germany
| | - Uwe Fritz
- Museum of Zoology, Senckenberg Natural History Collections Dresden, 01109 Dresden, Germany.
| |
Collapse
|
2
|
Karbstein K, Kösters L, Hodač L, Hofmann M, Hörandl E, Tomasello S, Wagner ND, Emerson BC, Albach DC, Scheu S, Bradler S, de Vries J, Irisarri I, Li H, Soltis P, Mäder P, Wäldchen J. Species delimitation 4.0: integrative taxonomy meets artificial intelligence. Trends Ecol Evol 2024:S0169-5347(23)00296-3. [PMID: 38849221 DOI: 10.1016/j.tree.2023.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 10/20/2023] [Accepted: 11/08/2023] [Indexed: 06/09/2024]
Abstract
Although species are central units for biological research, recent findings in genomics are raising awareness that what we call species can be ill-founded entities due to solely morphology-based, regional species descriptions. This particularly applies to groups characterized by intricate evolutionary processes such as hybridization, polyploidy, or asexuality. Here, challenges of current integrative taxonomy (genetics/genomics + morphology + ecology, etc.) become apparent: different favored species concepts, lack of universal characters/markers, missing appropriate analytical tools for intricate evolutionary processes, and highly subjective ranking and fusion of datasets. Now, integrative taxonomy combined with artificial intelligence under a unified species concept can enable automated feature learning and data integration, and thus reduce subjectivity in species delimitation. This approach will likely accelerate revising and unraveling eukaryotic biodiversity.
Collapse
Affiliation(s)
- Kevin Karbstein
- Max Planck Institute for Biogeochemistry, Department of Biogeochemical Integration, 07745 Jena, Germany.
| | - Lara Kösters
- Max Planck Institute for Biogeochemistry, Department of Biogeochemical Integration, 07745 Jena, Germany
| | - Ladislav Hodač
- Max Planck Institute for Biogeochemistry, Department of Biogeochemical Integration, 07745 Jena, Germany
| | - Martin Hofmann
- Technical University of Ilmenau, Institute for Computer and Systems Engineering, 98693 Ilmenau, Germany
| | - Elvira Hörandl
- University of Göttingen, Albrecht-von-Haller Institute for Plant Sciences, Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium), 37073 Göttingen, Germany
| | - Salvatore Tomasello
- University of Göttingen, Albrecht-von-Haller Institute for Plant Sciences, Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium), 37073 Göttingen, Germany
| | - Natascha D Wagner
- University of Göttingen, Albrecht-von-Haller Institute for Plant Sciences, Department of Systematics, Biodiversity and Evolution of Plants (with Herbarium), 37073 Göttingen, Germany
| | - Brent C Emerson
- Institute of Natural Products and Agrobiology (IPNA-CSIC), Island Ecology and Evolution Research Group, 38206 La Laguna, Tenerife, Canary Islands, Spain
| | - Dirk C Albach
- Carl von Ossietzky-Universität Oldenburg, Institute of Biology and Environmental Science, 26129 Oldenburg, Germany
| | - Stefan Scheu
- University of Göttingen, Johann-Friedrich-Blumenbach Institute of Zoology and Anthropology, 37073 Göttingen, Germany; University of Göttingen, Centre of Biodiversity and Sustainable Land Use (CBL), 37073 Göttingen, Germany
| | - Sven Bradler
- University of Göttingen, Johann-Friedrich-Blumenbach Institute of Zoology and Anthropology, 37073 Göttingen, Germany
| | - Jan de Vries
- University of Göttingen, Institute for Microbiology and Genetics, Department of Applied Bioinformatics, 37077 Göttingen, Germany; University of Göttingen, Campus Institute Data Science (CIDAS), 37077 Göttingen, Germany; University of Göttingen, Göttingen Center for Molecular Biosciences (GZMB), Department of Applied Bioinformatics, 37077 Göttingen, Germany
| | - Iker Irisarri
- Leibniz Institute for the Analysis of Biodiversity Change (LIB), Centre for Molecular Biodiversity Research, Phylogenomics Section, Museum of Nature, 20146 Hamburg, Germany
| | - He Li
- Eastern China Conservation Centre for Wild Endangered Plant Resources, Chenshan Botanical Garden, 201602 Shanghai, China
| | - Pamela Soltis
- University of Florida, Florida Museum of Natural History, 32611 Gainesville, USA
| | - Patrick Mäder
- Technical University of Ilmenau, Institute for Computer and Systems Engineering, 98693 Ilmenau, Germany; German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Puschstrasse 4, 04103 Leipzig, Germany; Friedrich Schiller University Jena, Faculty of Biological Sciences, Institute of Ecology and Evolution, Philosophenweg 16, 07743 Jena, Germany
| | - Jana Wäldchen
- Max Planck Institute for Biogeochemistry, Department of Biogeochemical Integration, 07745 Jena, Germany; German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Puschstrasse 4, 04103 Leipzig, Germany
| |
Collapse
|
3
|
Gupta A, Mirarab S, Turakhia Y. Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596098. [PMID: 38854139 PMCID: PMC11160643 DOI: 10.1101/2024.05.27.596098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Inference of species trees plays a crucial role in advancing our understanding of evolutionary relationships and has immense significance for diverse biological and medical applications. Extensive genome sequencing efforts are currently in progress across a broad spectrum of life forms, holding the potential to unravel the intricate branching patterns within the tree of life. However, estimating species trees starting from raw genome sequences is quite challenging, and the current cutting-edge methodologies require a series of error-prone steps that are neither entirely automated nor standardized. In this paper, we present ROADIES, a novel pipeline for species tree inference from raw genome assemblies that is fully automated, easy to use, scalable, free from reference bias, and provides flexibility to adjust the tradeoff between accuracy and runtime. The ROADIES pipeline eliminates the need to align whole genomes, choose a single reference species, or pre-select loci such as functional genes found using cumbersome annotation steps. Moreover, it leverages recent advances in phylogenetic inference to allow multi-copy genes, eliminating the need to detect orthology. Using the genomic datasets released from large-scale sequencing consortia across three diverse life forms (placental mammals, pomace flies, and birds), we show that ROADIES infers species trees that are comparable in quality with the state-of-the-art approaches but in a fraction of the time. By incorporating optimal approaches and automating all steps from assembled genomes to species and gene trees, ROADIES is poised to improve the accuracy, scalability, and reproducibility of phylogenomic analyses.
Collapse
Affiliation(s)
- Anshu Gupta
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| |
Collapse
|
4
|
Arasti S, Tabaghi P, Tabatabaee Y, Mirarab S. Branch Length Transforms using Optimal Tree Metric Matching. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.13.566962. [PMID: 38746464 PMCID: PMC11092445 DOI: 10.1101/2023.11.13.566962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
The abundant discordance between evolutionary relationships across the genome has rekindled interest in ways of comparing and averaging trees on a shared leaf set. However, most attempts at reconciling trees have focused on tree topology, producing metrics for comparing topologies and methods for computing median tree topologies. Using branch lengths, however, has been more elusive, due to several challenges. Species tree branch lengths can be measured in many units, often different from gene trees. Moreover, rates of evolution change across the genome, the species tree, and specific branches of gene trees. These factors compound the stochasticity of coalescence times. Thus, branch lengths are highly heterogeneous across both the genome and the tree. For many downstream applications in phylogenomic analyses, branch lengths are as important as the topology, and yet, existing tools to compare and combine weighted trees are limited. In this paper, we make progress on the question of mapping one tree to another, incorporating both topology and branch length. We define a series of computational problems to formalize finding the best transformation of one tree to another while maintaining its topology and other constraints. We show that all these problems can be solved in quadratic time and memory using a linear algebraic formulation coupled with dynamic programming preprocessing. Our formulations lead to convex optimization problems, with efficient and theoretically optimal solutions. While many applications can be imagined for this framework, we apply it to measure species tree branch lengths in the unit of the expected number of substitutions per site while allowing divergence from ultrametricity across the tree. In these applications, our method matches or surpasses other methods designed directly for solving those problems. Thus, our approach provides a versatile toolkit that finds applications in similar evolutionary questions. Code availability The software is available at https://github.com/shayesteh99/TCMM.git . Data availability Data are available on Github https://github.com/shayesteh99/TCMM-Data.git .
Collapse
|
5
|
Balaban M, Jiang Y, Zhu Q, McDonald D, Knight R, Mirarab S. Generation of accurate, expandable phylogenomic trees with uDance. Nat Biotechnol 2024; 42:768-777. [PMID: 37500914 PMCID: PMC10818028 DOI: 10.1038/s41587-023-01868-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 06/20/2023] [Indexed: 07/29/2023]
Abstract
Phylogenetic trees provide a framework for organizing evolutionary histories across the tree of life and aid downstream comparative analyses such as metagenomic identification. Methods that rely on single-marker genes such as 16S rRNA have produced trees of limited accuracy with hundreds of thousands of organisms, whereas methods that use genome-wide data are not scalable to large numbers of genomes. We introduce updating trees using divide-and-conquer (uDance), a method that enables updatable genome-wide inference using a divide-and-conquer strategy that refines different parts of the tree independently and can build off of existing trees, with high accuracy and scalability. With uDance, we infer a species tree of roughly 200,000 genomes using 387 marker genes, totaling 42.5 billion amino acid residues.
Collapse
Affiliation(s)
- Metin Balaban
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Yueyu Jiang
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
6
|
Stiller J, Feng S, Chowdhury AA, Rivas-González I, Duchêne DA, Fang Q, Deng Y, Kozlov A, Stamatakis A, Claramunt S, Nguyen JMT, Ho SYW, Faircloth BC, Haag J, Houde P, Cracraft J, Balaban M, Mai U, Chen G, Gao R, Zhou C, Xie Y, Huang Z, Cao Z, Yan Z, Ogilvie HA, Nakhleh L, Lindow B, Morel B, Fjeldså J, Hosner PA, da Fonseca RR, Petersen B, Tobias JA, Székely T, Kennedy JD, Reeve AH, Liker A, Stervander M, Antunes A, Tietze DT, Bertelsen MF, Lei F, Rahbek C, Graves GR, Schierup MH, Warnow T, Braun EL, Gilbert MTP, Jarvis ED, Mirarab S, Zhang G. Complexity of avian evolution revealed by family-level genomes. Nature 2024; 629:851-860. [PMID: 38560995 PMCID: PMC11111414 DOI: 10.1038/s41586-024-07323-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 03/15/2024] [Indexed: 04/04/2024]
Abstract
Despite tremendous efforts in the past decades, relationships among main avian lineages remain heavily debated without a clear resolution. Discrepancies have been attributed to diversity of species sampled, phylogenetic method and the choice of genomic regions1-3. Here we address these issues by analysing the genomes of 363 bird species4 (218 taxonomic families, 92% of total). Using intergenic regions and coalescent methods, we present a well-supported tree but also a marked degree of discordance. The tree confirms that Neoaves experienced rapid radiation at or near the Cretaceous-Palaeogene boundary. Sufficient loci rather than extensive taxon sampling were more effective in resolving difficult nodes. Remaining recalcitrant nodes involve species that are a challenge to model due to either extreme DNA composition, variable substitution rates, incomplete lineage sorting or complex evolutionary events such as ancient hybridization. Assessment of the effects of different genomic partitions showed high heterogeneity across the genome. We discovered sharp increases in effective population size, substitution rates and relative brain size following the Cretaceous-Palaeogene extinction event, supporting the hypothesis that emerging ecological opportunities catalysed the diversification of modern birds. The resulting phylogenetic estimate offers fresh insights into the rapid radiation of modern birds and provides a taxon-rich backbone tree for future comparative studies.
Collapse
Affiliation(s)
- Josefin Stiller
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Shaohong Feng
- Center for Evolutionary & Organismal Biology, Liangzhu Laboratory & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, China
| | - Al-Aabid Chowdhury
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| | | | - David A Duchêne
- Center for Evolutionary Hologenomics, The Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Qi Fang
- BGI Research, Shenzhen, China
| | - Yuan Deng
- BGI Research, Shenzhen, China
- BGI Research, Wuhan, China
| | - Alexey Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Santiago Claramunt
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
- Department of Natural History, Royal Ontario Museum, Toronto, Ontario, Canada
| | - Jacqueline M T Nguyen
- College of Science and Engineering, Flinders University, Adelaide, South Australia, Australia
- Australian Museum Research Institute, Sydney, New South Wales, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Brant C Faircloth
- Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Julia Haag
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Peter Houde
- Department of Biology, New Mexico State University, Las Cruces, NM, USA
| | - Joel Cracraft
- Department of Ornithology, American Museum of Natural History, New York, NY, USA
| | - Metin Balaban
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Uyen Mai
- Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Guangji Chen
- BGI Research, Wuhan, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Rongsheng Gao
- BGI Research, Wuhan, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | | | - Yulong Xie
- Center for Evolutionary & Organismal Biology, Liangzhu Laboratory & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zijian Huang
- Center for Evolutionary & Organismal Biology, Liangzhu Laboratory & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhen Cao
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhi Yan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Huw A Ogilvie
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bent Lindow
- Natural History Museum Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece
| | - Jon Fjeldså
- Natural History Museum Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Peter A Hosner
- Natural History Museum Denmark, University of Copenhagen, Copenhagen, Denmark
- Center for Global Mountain Biodiversity, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Rute R da Fonseca
- Center for Global Mountain Biodiversity, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Bent Petersen
- Center for Evolutionary Hologenomics, The Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Centre of Excellence for Omics-Driven Computational Biodiscovery, Faculty of Applied Sciences, AIMST University, Bedong, Malaysia
| | - Joseph A Tobias
- Department of Life Sciences, Imperial College London, Silwood Park, Ascot, UK
| | - Tamás Székely
- Milner Centre for Evolution, University of Bath, Bath, UK
- ELKH-DE Reproductive Strategies Research Group, University of Debrecen, Debrecen, Hungary
| | - Jonathan David Kennedy
- Center for Macroecology, Evolution, and Climate, The Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Andrew Hart Reeve
- Natural History Museum Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Andras Liker
- HUN-REN-PE Evolutionary Ecology Research Group, University of Pannonia, Veszprém, Hungary
- Behavioural Ecology Research Group, Center for Natural Sciences, University of Pannonia, Veszprém, Hungary
| | | | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal
| | | | - Mads F Bertelsen
- Centre for Zoo and Wild Animal Health, Copenhagen Zoo, Frederiksberg, Denmark
| | - Fumin Lei
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Carsten Rahbek
- Center for Global Mountain Biodiversity, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Center for Macroecology, Evolution, and Climate, The Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Institute of Ecology, Peking University, Beijing, China
- Danish Institute for Advanced Study, University of Southern Denmark, Odense, Denmark
| | - Gary R Graves
- Center for Macroecology, Evolution, and Climate, The Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | | | - Tandy Warnow
- University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL, USA
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, The Globe Institute, University of Copenhagen, Copenhagen, Denmark
- University Museum, NTNU, Trondheim, Norway
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Durham, NC, USA
| | | | - Guojie Zhang
- Center for Evolutionary & Organismal Biology, Liangzhu Laboratory & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, China.
- BGI Research, Wuhan, China.
- Villum Center for Biodiversity Genomics, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
7
|
Mirarab S, Rivas-González I, Feng S, Stiller J, Fang Q, Mai U, Hickey G, Chen G, Brajuka N, Fedrigo O, Formenti G, Wolf JBW, Howe K, Antunes A, Schierup MH, Paten B, Jarvis ED, Zhang G, Braun EL. A region of suppressed recombination misleads neoavian phylogenomics. Proc Natl Acad Sci U S A 2024; 121:e2319506121. [PMID: 38557186 PMCID: PMC11009670 DOI: 10.1073/pnas.2319506121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 02/07/2024] [Indexed: 04/04/2024] Open
Abstract
Genomes are typically mosaics of regions with different evolutionary histories. When speciation events are closely spaced in time, recombination makes the regions sharing the same history small, and the evolutionary history changes rapidly as we move along the genome. When examining rapid radiations such as the early diversification of Neoaves 66 Mya, typically no consistent history is observed across segments exceeding kilobases of the genome. Here, we report an exception. We found that a 21-Mb region in avian genomes, mapped to chicken chromosome 4, shows an extremely strong and discordance-free signal for a history different from that of the inferred species tree. Such a strong discordance-free signal, indicative of suppressed recombination across many millions of base pairs, is not observed elsewhere in the genome for any deep avian relationships. Although long regions with suppressed recombination have been documented in recently diverged species, our results pertain to relationships dating circa 65 Mya. We provide evidence that this strong signal may be due to an ancient rearrangement that blocked recombination and remained polymorphic for several million years prior to fixation. We show that the presence of this region has misled previous phylogenomic efforts with lower taxon sampling, showing the interplay between taxon and locus sampling. We predict that similar ancient rearrangements may confound phylogenetic analyses in other clades, pointing to a need for new analytical models that incorporate the possibility of such events.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, CA95032
| | | | - Shaohong Feng
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou310058, China
- Liangzhu Laboratory, Zhejiang University, Hangzhou311121, China
| | - Josefin Stiller
- Section for Ecology & Evolution, Department of Biology, University of Copenhagen, København2100, Denmark
| | - Qi Fang
- BGI-Research, Shenzhen518083, China
| | - Uyen Mai
- Electrical and Computer Engineering Department, University of California, San Diego, CA95032
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, CA96064
| | - Guangji Chen
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou310058, China
- Liangzhu Laboratory, Zhejiang University, Hangzhou311121, China
| | - Nadolina Brajuka
- Vertebrate Genome Lab, Rockefeller University, New York, NY10065
| | - Olivier Fedrigo
- Vertebrate Genome Lab, Rockefeller University, New York, NY10065
| | - Giulio Formenti
- Vertebrate Genome Lab, Rockefeller University, New York, NY10065
| | - Jochen B. W. Wolf
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximillians-Universität, Munich82152, Germany
| | - Kerstin Howe
- Tree of Life Division, Wellcome Sanger Institute, CambridgeCB10 1RQ, United Kingdom
| | - Agostinho Antunes
- Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto4099-002, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Porto4099-002, Portugal
| | | | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, CA96064
| | - Erich D. Jarvis
- Vertebrate Genome Lab, Rockefeller University, New York, NY10065
| | - Guojie Zhang
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou310058, China
| | - Edward L. Braun
- Department of Biology, University of Florida, Gainesville, FL32611
| |
Collapse
|
8
|
Rivas-González I, Schierup MH, Wakeley J, Hobolth A. TRAILS: Tree reconstruction of ancestry using incomplete lineage sorting. PLoS Genet 2024; 20:e1010836. [PMID: 38330138 PMCID: PMC10880969 DOI: 10.1371/journal.pgen.1010836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 02/21/2024] [Accepted: 01/22/2024] [Indexed: 02/10/2024] Open
Abstract
Genome-wide genealogies of multiple species carry detailed information about demographic and selection processes on individual branches of the phylogeny. Here, we introduce TRAILS, a hidden Markov model that accurately infers time-resolved population genetics parameters, such as ancestral effective population sizes and speciation times, for ancestral branches using a multi-species alignment of three species and an outgroup. TRAILS leverages the information contained in incomplete lineage sorting fragments by modelling genealogies along the genome as rooted three-leaved trees, each with a topology and two coalescent events happening in discretized time intervals within the phylogeny. Posterior decoding of the hidden Markov model can be used to infer the ancestral recombination graph for the alignment and details on demographic changes within a branch. Since TRAILS performs posterior decoding at the base-pair level, genome-wide scans based on the posterior probabilities can be devised to detect deviations from neutrality. Using TRAILS on a human-chimp-gorilla-orangutan alignment, we recover speciation parameters and extract information about the topology and coalescent times at high resolution.
Collapse
Affiliation(s)
| | - Mikkel H. Schierup
- Bioinformatics Research Center (BiRC), Aarhus University, Aarhus, Denmark
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Massachusetts, United States of America
| | - Asger Hobolth
- Department of Mathematics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
9
|
Dalapicolla J, Rodrigues do Prado J, Lacey Knowles L, Reis Percequillo A. Phylogenomics and species delimitation of an abundant and little-studied Amazonian forest spiny rat. Mol Phylogenet Evol 2024; 191:107992. [PMID: 38092321 DOI: 10.1016/j.ympev.2023.107992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 12/01/2023] [Accepted: 12/09/2023] [Indexed: 12/21/2023]
Abstract
Species delimitation studies based on integrating different datasets such as genomic, morphometric, and cytogenetics data are rare in studies focused on Neotropical rodents. As a consequence, the evolutionary history of most of these genera remains poorly understood. Proechimys is a highly diverse and widely distributed genus of Neotropical spiny rats with unique traits like multiple sympatry, micro-habitat segregation, and fuzzy species limits. Here, we applied RAD-Seq to infer the phylogenetic relationships, estimate the species boundaries, and estimate the divergence times for Proechimys, one of the most common and least studied small mammals in the Amazon. We tested whether inferred lineages in the phylogenetic trees could be considered distinct species based on the genomic dataset and morphometric data. Analyses revealed the genus is not monophyletic, with Proechimys hoplomyoides sister to a group of Hoplomys gymnurus + all other Proechimys species, contesting the generic status of Hoplomys. There are five main clades in Proechimys stricto sensu (excluding H. gymnurus and P. hoplomyoides). Species delimitation analyses supported 25 species within the genus Proechimys. The five main clades in Proechimys stricto sensu also showed similar ages for their origins, and two rapid diversification events were identified in the Early Pliocene and in the Early Pleistocene. Most cases of sympatry in Proechimys occur among species from the different main clades, and although Proechimys is an inhabitant of the Amazon, three species occupied the Cerrado biome during the Pleistocene. We could associate available nominal taxon, cytogenetics information, and DNA sequences in Genbank to most of the 25 species we hypothesized from our delimitation analyses. Based on our analyses, we estimate that eight forms represent putative new species that need a taxonomic revision.
Collapse
Affiliation(s)
- Jeronymo Dalapicolla
- Departamento de Sistemática e Ecologia, Universidade Federal da Paraíba, João Pessoa, Paraíba, Brazil; Departamento de Ciências Biológicas, Escola Superior de Agricultura "Luiz de Queiroz", Universidade de São Paulo, São Paulo, Brazil; Instituto Tecnológico Vale, Belém, Pará, Brazil.
| | | | - L Lacey Knowles
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI, USA
| | - Alexandre Reis Percequillo
- Departamento de Ciências Biológicas, Escola Superior de Agricultura "Luiz de Queiroz", Universidade de São Paulo, São Paulo, Brazil
| |
Collapse
|
10
|
Souza LHB, Pierson TW, Tenório RO, Ferro JM, Gatto KP, Silva BC, de Andrade GV, Suárez P, Haddad CFB, Lourenço LB. Multiple contact zones and karyotypic evolution in a neotropical frog species complex. Sci Rep 2024; 14:1119. [PMID: 38212602 PMCID: PMC10784582 DOI: 10.1038/s41598-024-51421-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 01/04/2024] [Indexed: 01/13/2024] Open
Abstract
Previous studies of DNA sequence and karyotypic data have revealed high genetic diversity in the Physalaemus cuvieri - Physalaemus ephippifer species complex-a group of small leptodactylid frogs in South America. To date, seven major genetic lineages have been recognized in this group, with species delimitation tests supporting four to seven of them as valid species. Among these, only P. ephippifer shows heteromorphic sex chromosomes, but the implications of cytogenetic divergence for the evolution of this group are unknown. We analyzed karyotypic, mitochondrial DNA, and 3RAD genomic data to characterize a putative contact zone between P. ephippifer and P. cuvieri Lineage 1, finding evidence for admixture and karyotypic evolution. We also describe preliminary evidence for admixture between two other members of this species complex-Lineage 1 and Lineage 3 of P. cuvieri. Our study sheds new light on evolutionary relationships in the P. cuvieri - P. ephippifer species complex, suggesting an important role of karyotypic divergence in its evolutionary history and underscoring the importance of hybridization as a mechanism of sex chromosome evolution in amphibians.
Collapse
Affiliation(s)
- Lucas H B Souza
- Laboratório de Estudos Cromossômicos (LabEsC), Departamento de Biologia Estrutural e Funcional, Instituto de Biologia, Universidade Estadual de Campinas (UNICAMP), Campinas, SP, 13083-863, Brazil.
| | - Todd W Pierson
- Department of Ecology, Evolution, and Organismal Biology, Kennesaw State University, Kennesaw, GA, USA
| | - Renata O Tenório
- Laboratório de Estudos Cromossômicos (LabEsC), Departamento de Biologia Estrutural e Funcional, Instituto de Biologia, Universidade Estadual de Campinas (UNICAMP), Campinas, SP, 13083-863, Brazil
| | - Juan M Ferro
- Laboratorio de Genética Evolutiva "Dr. Claudio J. Bidau", Instituto de Biología Subtropical (CONICET-UNaM), Facultad de Ciencias Exactas, Químicas y Naturales, Universidad Nacional de Misiones, Posadas, Misiones, Argentina
| | - Kaleb P Gatto
- Laboratório de Estudos Cromossômicos (LabEsC), Departamento de Biologia Estrutural e Funcional, Instituto de Biologia, Universidade Estadual de Campinas (UNICAMP), Campinas, SP, 13083-863, Brazil
| | - Bruno C Silva
- Laboratório de Estudos Cromossômicos (LabEsC), Departamento de Biologia Estrutural e Funcional, Instituto de Biologia, Universidade Estadual de Campinas (UNICAMP), Campinas, SP, 13083-863, Brazil
| | - Gilda V de Andrade
- Departamento de Biologia, Centro de Ciências Biológicas e da Saúde, Universidade Federal do Maranhão (UFMA), Campus do Bacanga, São Luís, MA, 65080-040, Brazil
| | - Pablo Suárez
- Instituto de Biología Subtropical (CONICET-UNaM), Puerto Iguazú, Argentina
| | - Célio F B Haddad
- Departamento de Biodiversidade and Centro de Aquicultura (CAUNESP), Instituto de Biociências, Universidade Estadual Paulista, Rio Claro, SP, Brazil
| | - Luciana B Lourenço
- Laboratório de Estudos Cromossômicos (LabEsC), Departamento de Biologia Estrutural e Funcional, Instituto de Biologia, Universidade Estadual de Campinas (UNICAMP), Campinas, SP, 13083-863, Brazil
| |
Collapse
|
11
|
Thawornwattana Y, Seixas F, Yang Z, Mallet J. Major patterns in the introgression history of Heliconius butterflies. eLife 2023; 12:RP90656. [PMID: 38108819 PMCID: PMC10727504 DOI: 10.7554/elife.90656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023] Open
Abstract
Gene flow between species, although usually deleterious, is an important evolutionary process that can facilitate adaptation and lead to species diversification. It also makes estimation of species relationships difficult. Here, we use the full-likelihood multispecies coalescent (MSC) approach to estimate species phylogeny and major introgression events in Heliconius butterflies from whole-genome sequence data. We obtain a robust estimate of species branching order among major clades in the genus, including the 'melpomene-silvaniform' group, which shows extensive historical and ongoing gene flow. We obtain chromosome-level estimates of key parameters in the species phylogeny, including species divergence times, present-day and ancestral population sizes, as well as the direction, timing, and intensity of gene flow. Our analysis leads to a phylogeny with introgression events that differ from those obtained in previous studies. We find that Heliconius aoede most likely represents the earliest-branching lineage of the genus and that 'silvaniform' species are paraphyletic within the melpomene-silvaniform group. Our phylogeny provides new, parsimonious histories for the origins of key traits in Heliconius, including pollen feeding and an inversion involved in wing pattern mimicry. Our results demonstrate the power and feasibility of the full-likelihood MSC approach for estimating species phylogeny and key population parameters despite extensive gene flow. The methods used here should be useful for analysis of other difficult species groups with high rates of introgression.
Collapse
Affiliation(s)
| | - Fernando Seixas
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| |
Collapse
|
12
|
Pereira DS, Hilário S, Gonçalves MFM, Phillips AJL. Diaporthe Species on Palms: Molecular Re-Assessment and Species Boundaries Delimitation in the D. arecae Species Complex. Microorganisms 2023; 11:2717. [PMID: 38004729 PMCID: PMC10673533 DOI: 10.3390/microorganisms11112717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 10/25/2023] [Accepted: 11/03/2023] [Indexed: 11/26/2023] Open
Abstract
Due to cryptic diversification, phenotypic plasticity and host associations, multilocus phylogenetic analyses have become the most important tool in accurately identifying and circumscribing species in the Diaporthe genus. However, the application of the genealogical concordance criterion has often been overlooked, ultimately leading to an exponential increase in novel Diaporthe spp. Due to the large number of species, many lineages remain poorly understood under the so-called species complexes. For this reason, a robust delimitation of the species boundaries in Diaporthe is still an ongoing challenge. Therefore, the present study aimed to resolve the species boundaries of the Diaporthe arecae species complex (DASC) by implementing an integrative taxonomic approach. The Genealogical Phylogenetic Species Recognition (GCPSR) principle revealed incongruences between the individual gene genealogies. Moreover, the Poisson Tree Processes' (PTPs) coalescent-based species delimitation models identified three well-delimited subclades represented by the species D. arecae, D. chiangmaiensis and D. smilacicola. These results evidence that all species previously described in the D. arecae subclade are conspecific, which is coherent with the morphological indistinctiveness observed and the absence of reproductive isolation and barriers to gene flow. Thus, 52 Diaporthe spp. are reduced to synonymy under D. arecae. Recent population expansion and the possibility of incomplete lineage sorting suggested that the D. arecae subclade may be considered as ongoing evolving lineages under active divergence and speciation. Hence, the genetic diversity and intraspecific variability of D. arecae in the context of current global climate change and the role of D. arecae as a pathogen on palm trees and other hosts are also discussed. This study illustrates that species in Diaporthe are highly overestimated, and highlights the relevance of applying an integrative taxonomic approach to accurately circumscribe the species boundaries in the genus Diaporthe.
Collapse
Affiliation(s)
- Diana S. Pereira
- Faculdade de Ciências, Biosystems and Integrative Sciences Institute (BioISI), Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal;
| | - Sandra Hilário
- Interdisciplinary Centre of Marine and Environmental Research (CIIMAR), Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n, 4450-208 Porto, Portugal;
- Faculty of Sciences, Biology Department, University of Porto, Rua do Campo Alegre, Edifício FC4, 4169-007 Porto, Portugal
| | - Micael F. M. Gonçalves
- Faculty of Sciences, Biology Department, University of Porto, Rua do Campo Alegre, Edifício FC4, 4169-007 Porto, Portugal
- Centre for Environmental and Marine Studies, Department of Biology, Campus Universitário de Santiago, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Alan J. L. Phillips
- Faculdade de Ciências, Biosystems and Integrative Sciences Institute (BioISI), Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal;
| |
Collapse
|
13
|
Cervantes CR, Montes JR, Rosas U, Arias S. Phylogenetic discordance and integrative species delimitation in the Mammillaria haageana species complex (Cactaceae). Mol Phylogenet Evol 2023; 187:107891. [PMID: 37517507 DOI: 10.1016/j.ympev.2023.107891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 06/15/2023] [Accepted: 07/26/2023] [Indexed: 08/01/2023]
Abstract
Species complexes consist of very close phylogenetic relatives, where morphological similarities make it difficult to distinguish between them using traditional taxonomic methods. Here, we focused on the long-standing challenge of species delimitation in the Mammillaria haageana complex, a group that presents great morphological diversity that makes its taxonomy a puzzle. Our work integrates genomic, morphological, and ecological data to establish the taxonomic limits in the M. haageana complex, and we also studied the evolutionary relationships with the remainder of the M. ser. Supertextae species. Our genetic analyses, as well as morphological and ecological evidence, led us to propose that the M. haageana complex is made up of six distinct entities (M. acultzingensis, M. conspicua, M. haageana, M. lanigera, M. meissneri, and M. san-angelensis), mainly as a result of ecological speciation. A recent taxonomic proposal considered these taxa as a single species; therefore, we propose their recognition at the species level. Our results also show a high level of incomplete lineage sorting rather than reticulation, which is especially likely in recently diverged species such as those comprising M. ser. Supertextae. The species hypotheses proposed here may be useful in future extinction risk assessments and conservation strategies.
Collapse
Affiliation(s)
- Cristian R Cervantes
- Unidad de Síntesis en Sistemática y Evolución, Instituto de Biología, Circuito Exterior s.n., Ciudad Universitaria, Ciudad de México 04510, México; Posgrado en Ciencias Biológicas, Instituto de Biología, Universidad Nacional Autónoma de México, Ciudad Universitaria, Coyoacán, Ciudad de México 04510, México.
| | - José-Rubén Montes
- Posgrado en Ciencias Biológicas, Instituto de Biología, Universidad Nacional Autónoma de México, Ciudad Universitaria, Coyoacán, Ciudad de México 04510, México
| | - Ulises Rosas
- Jardín Botánico, Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito Exterior, Ciudad Universitaria, Coyoacán, Ciudad de México 04510, México
| | - Salvador Arias
- Jardín Botánico, Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito Exterior, Ciudad Universitaria, Coyoacán, Ciudad de México 04510, México
| |
Collapse
|
14
|
Fleming J, Eriksen PM, Struck TH. Scoutknife: A naïve, whole genome informed phylogenetic robusticity metric. F1000Res 2023; 12:945. [PMID: 38799242 PMCID: PMC11128044 DOI: 10.12688/f1000research.139356.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/01/2023] [Indexed: 05/29/2024] Open
Abstract
Background: The phylogenetic bootstrap, first proposed by Felsenstein in 1985, is a critically important statistical method in assessing the robusticity of phylogenetic datasets. Core to its concept was the use of pseudo sampling - assessing the data by generating new replicates derived from the initial dataset that was used to generate the phylogeny. In this way, phylogenetic support metrics could overcome the lack of perfect, infinite data. With infinite data, however, it is possible to sample smaller replicates directly from the data to obtain both the phylogeny and its statistical robusticity in the same analysis. Due to the growth of whole genome sequencing, the depth and breadth of our datasets have greatly expanded and are set to only expand further. With genome-scale datasets comprising thousands of genes, we can now obtain a proxy for infinite data. Accordingly, we can potentially abandon the notion of pseudo sampling and instead randomly sample small subsets of genes from the thousands of genes in our analyses. Methods: We introduce Scoutknife, a jackknife-style subsampling implementation that generates 100 datasets by randomly sampling a small number of genes from an initial large-gene dataset to jointly establish both a phylogenetic hypothesis and assess its robusticity. We assess its effectiveness by using 18 previously published datasets and 100 simulation studies. Results: We show that Scoutknife is conservative and informative as to conflicts and incongruence across the whole genome, without the need for subsampling based on traditional model selection criteria. Conclusions: Scoutknife reliably achieves comparable results to selecting the best genes on both real and simulation datasets, while being resistant to the potential biases caused by selecting for model fit. As the amount of genome data grows, it becomes an even more exciting option to assess the robusticity of phylogenetic hypotheses.
Collapse
Affiliation(s)
- James Fleming
- Natural History Museum, Universitetet i Oslo, Oslo, Oslo, 0562, Norway
| | | | | |
Collapse
|
15
|
Fleming JF, Valero‐Gracia A, Struck TH. Identifying and addressing methodological incongruence in phylogenomics: A review. Evol Appl 2023; 16:1087-1104. [PMID: 37360032 PMCID: PMC10286231 DOI: 10.1111/eva.13565] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/07/2023] [Accepted: 05/17/2023] [Indexed: 06/28/2023] Open
Abstract
The availability of phylogenetic data has greatly expanded in recent years. As a result, a new era in phylogenetic analysis is dawning-one in which the methods we use to analyse and assess our data are the bottleneck to producing valuable phylogenetic hypotheses, rather than the need to acquire more data. This makes the ability to accurately appraise and evaluate new methods of phylogenetic analysis and phylogenetic artefact identification more important than ever. Incongruence in phylogenetic reconstructions based on different datasets may be due to two major sources: biological and methodological. Biological sources comprise processes like horizontal gene transfer, hybridization and incomplete lineage sorting, while methodological ones contain falsely assigned data or violations of the assumptions of the underlying model. While the former provides interesting insights into the evolutionary history of the investigated groups, the latter should be avoided or minimized as best as possible. However, errors introduced by methodology must first be excluded or minimized to be able to conclude that biological sources are the cause. Fortunately, a variety of useful tools exist to help detect such misassignments and model violations and to apply ameliorating measurements. Still, the number of methods and their theoretical underpinning can be overwhelming and opaque. Here, we present a practical and comprehensive review of recent developments in techniques to detect artefacts arising from model violations and poorly assigned data. The advantages and disadvantages of the different methods to detect such misleading signals in phylogenetic reconstructions are also discussed. As there is no one-size-fits-all solution, this review can serve as a guide in choosing the most appropriate detection methods depending on both the actual dataset and the computational power available to the researcher. Ultimately, this informed selection will have a positive impact on the broader field, allowing us to better understand the evolutionary history of the group of interest.
Collapse
|
16
|
Raiyemo DA, Tranel PJ. Comparative analysis of dioecious Amaranthus plastomes and phylogenomic implications within Amaranthaceae s.s. BMC Ecol Evol 2023; 23:15. [PMID: 37149567 PMCID: PMC10164334 DOI: 10.1186/s12862-023-02121-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 04/28/2023] [Indexed: 05/08/2023] Open
Abstract
BACKGROUND The genus Amaranthus L. consists of 70-80 species distributed across temperate and tropical regions of the world. Nine species are dioecious and native to North America; two of which are agronomically important weeds of row crops. The genus has been described as taxonomically challenging and relationships among species including the dioecious ones are poorly understood. In this study, we investigated the phylogenetic relationships among the dioecious amaranths and sought to gain insights into plastid tree incongruence. A total of 19 Amaranthus species' complete plastomes were analyzed. Among these, seven dioecious Amaranthus plastomes were newly sequenced and assembled, an additional two were assembled from previously published short reads sequences and 10 other plastomes were obtained from a public repository (GenBank). RESULTS Comparative analysis of the dioecious Amaranthus species' plastomes revealed sizes ranged from 150,011 to 150,735 bp and consisted of 112 unique genes (78 protein-coding genes, 30 transfer RNAs and 4 ribosomal RNAs). Maximum likelihood trees, Bayesian inference trees and splits graphs support the monophyly of subgenera Acnida (7 dioecious species) and Amaranthus; however, the relationship of A. australis and A. cannabinus to the other dioecious species in Acnida could not be established, as it appears a chloroplast capture occurred from the lineage leading to the Acnida + Amaranthus clades. Our results also revealed intraplastome conflict at some tree branches that were in some cases alleviated with the use of whole chloroplast genome alignment, indicating non-coding regions contribute valuable phylogenetic signals toward shallow relationship resolution. Furthermore, we report a very low evolutionary distance between A. palmeri and A. watsonii, indicating that these two species are more genetically related than previously reported. CONCLUSIONS Our study provides valuable plastome resources as well as a framework for further evolutionary analyses of the entire Amaranthus genus as more species are sequenced.
Collapse
Affiliation(s)
- Damilola A Raiyemo
- Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Patrick J Tranel
- Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA.
| |
Collapse
|
17
|
Phylotranscriptomics interrogation uncovers a complex evolutionary history for the planarian genus Dugesia (Platyhelminthes, Tricladida) in the Western Mediterranean. Mol Phylogenet Evol 2023; 178:107649. [PMID: 36280167 DOI: 10.1016/j.ympev.2022.107649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 10/13/2022] [Accepted: 10/18/2022] [Indexed: 11/17/2022]
Abstract
The Mediterranean is one of the most biodiverse areas of the Paleartic region. Here, basing on large data sets of single copy orthologs obtained from transcriptomic data, we investigated the evolutionary history of the genus Dugesia in the Western Mediterranean area. The results corroborated that the complex paleogeological history of the region was an important driver of diversification for the genus, speciating as microplates and islands were forming. These processes led to the differentiation of three main biogeographic clades: Iberia-Apennines-Alps, Corsica-Sardinia, and Iberia-Africa. The internal relationships of these major clades were analysed with several representative samples per species. The use of large data sets regarding the number of loci and samples, as well as state-of-the-art phylogenomic inference methods allowed us to answer different unresolved questions about the evolution of particular groups, such as the diversification path of D. subtentaculata in the Iberian Peninsula and its colonization of Africa. Additionally, our results support the differentiation of D. benazzii in two lineages which could represent two species. Finally, we analysed here for the first time a comprehensive number of samples from several asexual Iberian populations whose assignment at the species level has been an enigma through the years. The phylogenies obtained with different inference methods showed a branching topology of asexual individuals at the base of sexual clades. We hypothesize that this unexpected topology is related to long-term asexuality. This work represents the first phylotranscriptomic analysis of Tricladida, laying the first stone of the genomic era in phylogenetic studies on this taxonomic group.
Collapse
|
18
|
Zaharias P, Warnow T. Recent progress on methods for estimating and updating large phylogenies. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210244. [PMID: 35989607 PMCID: PMC9393559 DOI: 10.1098/rstb.2021.0244] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 01/07/2022] [Indexed: 12/20/2022] Open
Abstract
With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
- Paul Zaharias
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
19
|
Zhang C, Mirarab S. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Mol Biol Evol 2022; 39:6750035. [PMID: 36201617 PMCID: PMC9750496 DOI: 10.1093/molbev/msac215] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/20/2022] [Accepted: 10/03/2022] [Indexed: 01/07/2023] Open
Abstract
Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, USA
| | | |
Collapse
|
20
|
Zhang C, Mirarab S. ASTRAL-Pro 2: ultrafast species tree reconstruction from multi-copy gene family trees. Bioinformatics 2022; 38:4949-4950. [PMID: 36094339 DOI: 10.1093/bioinformatics/btac620] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/03/2022] [Accepted: 09/09/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Species tree inference from multi-copy gene trees has long been a challenge in phylogenomics. The recent method ASTRAL-Pro has made large strides by enabling multi-copy gene family trees as input and has been quickly adopted. Yet, its scalability, especially memory usage, needs to improve to accommodate the ever-growing dataset size. RESULTS We present ASTRAL-Pro 2, an ultrafast and memory efficient version of ASTRAL-Pro that adopts a placement-based optimization algorithm for significantly better scalability without sacrificing accuracy. AVAILABILITY The source code and binary files are publicly available at https://github.com/chaoszhang/ASTER; data are available at https://github.com/chaoszhang/A-Pro2_data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, La Jolla, CA, 92093, USA
| |
Collapse
|
21
|
Barley AJ, Nieto-Montes de Oca A, Manríquez-Morán NL, Thomson RC. The evolutionary network of whiptail lizards reveals predictable outcomes of hybridization. Science 2022; 377:773-777. [PMID: 35951680 DOI: 10.1126/science.abn1593] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Hybridization between diverging lineages is associated with the generation and loss of species diversity, introgression, adaptation, and changes in reproductive mode, but it is unknown when and why it results in these divergent outcomes. We estimate a comprehensive evolutionary network for the largest group of unisexual vertebrates and use it to understand the evolutionary outcomes of hybridization. Our results show that rates of introgression between species decrease with time since divergence and suggest that species must attain a threshold of evolutionary divergence before hybridization results in transitions to unisexuality. Rates of hybridization also predict genome-wide patterns of genetic diversity in whiptail lizards. These results distinguish among models for hybridization that have not previously been tested and suggest that the evolutionary outcomes can be predictable.
Collapse
Affiliation(s)
- Anthony J Barley
- Department of Evolution and Ecology, University of California, Davis, Davis, CA 95616, USA.,School of Life Sciences, University of Hawai'i, Honolulu, HI 96822, USA
| | - Adrián Nieto-Montes de Oca
- Laboratorio de Herpetología and Museo de Zoología Alfonso L. Herrera, Departamento de Biología Evolutiva, Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad Universitaria, Alcadía Coyoacán, Ciudad de México, México
| | - Norma L Manríquez-Morán
- Laboratorio de Sistemática Molecular, Centro de Investigaciones Biológicas, Universidad Autónoma del Estado de Hidalgo, Colonia Carboneras, Mineral de la Reforma, Hidalgo, México
| | - Robert C Thomson
- School of Life Sciences, University of Hawai'i, Honolulu, HI 96822, USA
| |
Collapse
|
22
|
Pang XX, Zhang DY. Impact of Ghost Introgression on Coalescent-based Species Tree Inference and Estimation of Divergence Time. Syst Biol 2022; 72:35-49. [PMID: 35799362 DOI: 10.1093/sysbio/syac047] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 06/25/2022] [Accepted: 07/05/2022] [Indexed: 11/15/2022] Open
Abstract
The species studied in any evolutionary investigation generally constitute a small proportion of all the species currently existing or that have gone extinct. It is therefore likely that introgression, which is widespread across the tree of life, involves "ghosts," i.e., unsampled, unknown, or extinct lineages. However, the impact of ghost introgression on estimations of species trees has rarely been studied and is poorly understood. Here, we use mathematical analysis and simulations to examine the robustness of species tree methods based on the multispecies coalescent model to introgression from a ghost or extant lineage. We found that many results originally obtained for introgression between extant species can easily be extended to ghost introgression, such as the strongly interactive effects of incomplete lineage sorting (ILS) and introgression on the occurrence of anomalous gene trees (AGTs). The relative performance of the summary species tree method (ASTRAL) and the full-likelihood method (*BEAST) varies under different introgression scenarios, with the former being more robust to gene flow between non-sister species whereas the latter performing better under certain conditions of ghost introgression. When an outgroup ghost (defined as a lineage that diverged before the most basal species under investigation) acts as the donor of the introgressed genes, the time of root divergence among the investigated species generally was overestimated, whereas ingroup introgression, as commonly perceived, can only lead to underestimation. In many cases of ingroup introgression that may or may not involve ghost lineages, the stronger the ILS, the higher the accuracy achieved in estimating the time of root divergence, although the topology of the species tree is more prone to be biased by the effect of introgression.
Collapse
Affiliation(s)
- Xiao-Xu Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Da-Yong Zhang
- State Key Laboratory of Earth Surface Processes and Resource Ecology and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
23
|
Abstract
Motivation Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene trees independently and then combine them, are much more scalable but are prone to gene tree estimation error, which is inevitable when inferring trees from limited-length data. Gene tree estimation error is not just random noise and can create biases such as long-branch attraction. Results We introduce a scalable likelihood-based approach to co-estimation under the multi-species coalescent model. The method, called quartet co-estimation (QuCo), takes as input independently inferred distributions over gene trees and computes the most likely species tree topology and internal branch length for each quartet, marginalizing over gene tree topologies and ignoring branch lengths by making several simplifying assumptions. It then updates the gene tree posterior probabilities based on the species tree. The focus on gene tree topologies and the heuristic division to quartets enables fast likelihood calculations. We benchmark our method with extensive simulations for quartet trees in zones known to produce biased species trees and further with larger trees. We also run QuCo on a biological dataset of bees. Our results show better accuracy than the summary-based approach ASTRAL run on estimated gene trees. Availability and implementation QuCo is available on https://github.com/maryamrabiee/quco. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maryam Rabiee
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | | |
Collapse
|
24
|
Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022; 2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenomics is the inference of phylogenetic trees based on multiple marker genes sampled in the genomes of interest. An important challenge in phylogenomics is the potential incongruence among the evolutionary histories of individual genes, which can be widespread in microorganisms due to the prevalence of horizontal gene transfer. This protocol introduces the procedures for building a phylogenetic tree of a large number of microbial genomes using a broad sampling of marker genes that are representative of whole-genome evolution. The protocol highlights the use of a gene tree summary method, which can effectively reconstruct the species tree while accounting for the topological conflicts among individual gene trees. The pipeline described in this protocol is scalable to tens of thousands of genomes while retaining high accuracy. We discussed multiple software tools, libraries, and scripts to enable convenient adoption of the protocol. The protocol is suitable for microbiology and microbiome studies based on public genomes and metagenomic data.
Collapse
Affiliation(s)
- Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
25
|
Abstract
The reconstruction of evolutionary relationships among species is fundamental for our understanding of biodiversity. Today, evolutionary relationships are closely related with the depiction of the tree of life, and research on the topic is underpinned by methods in molecular phylogenetics that have grown in popularity since the 1960s. These methods depend on our understanding of how nucleotide or amino acid sequences evolve through time and in different lineages. Armed with this knowledge, researchers can make inferences about the relationships and amount of genomic divergence among species.
Collapse
Affiliation(s)
- David A Duchêne
- Centre for Evolutionary Hologenomics, University of Copenhagen, Øster Farimagsgade 5A, 1352 Copenhagen, Denmark.
| |
Collapse
|