1
|
Zhao M, Oswald JA, Allen JM, Owens HL, Hosner PA, Guralnick RP, Braun EL, Kimball RT. A phylogenomic tree of wood-warblers (Aves: Parulidae): Dealing with good, bad, and ugly samples. Mol Phylogenet Evol 2025; 202:108235. [PMID: 39542406 DOI: 10.1016/j.ympev.2024.108235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 10/26/2024] [Accepted: 11/10/2024] [Indexed: 11/17/2024]
Abstract
The New World warblers (Parulidae) are a model group for ecological and evolutionary analyses. However, current phylogenetic relationships across this family are based upon few loci. Here we use ultraconserved elements (UCEs) to estimate a rigorous species-level phylogeny for the family. As is true for many groups, high-quality tissues were unavailable for some taxa. Thus, we explored methods for incorporating sequences derived from historical (toe pad) samples to expand the phylogenetic datasets. We recovered an average of 4,186 UCE loci and mitochondrial bycatch data (supplemented with published mitochondrial data) from 96% of all currently recognized species. We found that the UCE phylogeny built with alignments with less than 70% of gaps and ambiguities recovered the most robust phylogenetic relationships for this family, representing 101 species. Using this phylogeny as a topological backbone and adding ten fair quality "bad" samples effectively generated an overall well supported phylogeny, representing 108 species (∼90% of all species). Based on this tree, we then added in seven poor quality "ugly" samples and six of those were placed within their expected genera. We also explored the phylogenetic positions of the likely extinct Leucopeza semperi and the endangered Catharopeza bishopi where limited data was obtained. Overall, taxonomic placements in our UCE trees largely correspond to previously published studies with the recovery of all currently recognized genera as monophyletic except for Basileuterus which was rendered paraphyletic by B. lachrymosus. Our study provides insights in understanding the phylogenetic relationships of a model Passeriformes family and outlines effective practices for managing sparse genomic data sourced from historical museum specimens. Variable topological arrangements across datasets and analyses reflect the evolutionary complexity of this group and provide future topics for in-depth studies.
Collapse
Affiliation(s)
- Min Zhao
- Department of Biology, University of Florida, Gainesville, FL 32611, USA; Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Jessica A Oswald
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA; U.S. Fish and Wildlife Service, National Fish and Wildlife Forensic Laboratory, Ashland, OR 97520, USA
| | - Julie M Allen
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24060, USA
| | - Hannah L Owens
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA; Center for Global Mountain Biodiversity, Section for Biodiversity, Globe Institute, University of Copenhagen, København Ø, Denmark
| | - Peter A Hosner
- Center for Global Mountain Biodiversity, Section for Biodiversity, Globe Institute, University of Copenhagen, København Ø, Denmark; Natural History Museum Denmark, University of Copenhagen, København Ø, Denmark
| | - Robert P Guralnick
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Rebecca T Kimball
- Department of Biology, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
2
|
Gupta A, Mirarab S, Turakhia Y. Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596098. [PMID: 38854139 PMCID: PMC11160643 DOI: 10.1101/2024.05.27.596098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Inference of species trees plays a crucial role in advancing our understanding of evolutionary relationships and has immense significance for diverse biological and medical applications. Extensive genome sequencing efforts are currently in progress across a broad spectrum of life forms, holding the potential to unravel the intricate branching patterns within the tree of life. However, estimating species trees starting from raw genome sequences is quite challenging, and the current cutting-edge methodologies require a series of error-prone steps that are neither entirely automated nor standardized. In this paper, we present ROADIES, a novel pipeline for species tree inference from raw genome assemblies that is fully automated, easy to use, scalable, free from reference bias, and provides flexibility to adjust the tradeoff between accuracy and runtime. The ROADIES pipeline eliminates the need to align whole genomes, choose a single reference species, or pre-select loci such as functional genes found using cumbersome annotation steps. Moreover, it leverages recent advances in phylogenetic inference to allow multi-copy genes, eliminating the need to detect orthology. Using the genomic datasets released from large-scale sequencing consortia across three diverse life forms (placental mammals, pomace flies, and birds), we show that ROADIES infers species trees that are comparable in quality with the state-of-the-art approaches but in a fraction of the time. By incorporating optimal approaches and automating all steps from assembled genomes to species and gene trees, ROADIES is poised to improve the accuracy, scalability, and reproducibility of phylogenomic analyses.
Collapse
Affiliation(s)
- Anshu Gupta
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| |
Collapse
|
3
|
Nicol DA, Saldivia P, Summerfield TC, Heads M, Lord JM, Khaing EP, Larcombe MJ. Phylogenomics and morphology of Celmisiinae (Asteraceae: Astereae): Taxonomic and evolutionary implications. Mol Phylogenet Evol 2024; 195:108064. [PMID: 38508479 DOI: 10.1016/j.ympev.2024.108064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/12/2024] [Accepted: 03/17/2024] [Indexed: 03/22/2024]
Abstract
The tribe Astereae (Asteraceae) includes 36 subtribes and 252 genera, and is distributed worldwide in temperate and tropical regions. One of the subtribes, Celmisiinae Saldivia, has been recently circumscribed to include six genera and ca. 160 species, and is restricted to eastern Australia, New Zealand, and New Guinea. The species show an impressive range of growth habit, from small herbs and ericoid subshrubs to medium-sized trees. They live in a wide range of habitats and are often dominant in subalpine and alpine vegetation. Despite the well-supported circumscription of Celmisiinae, uncertainties have remained about their internal relationships and classification at genus and species levels. This study exploited recent advances in high-throughput sequencing to build a robust multi-gene phylogeny for the subtribe Celmisiinae. The target enrichment Angiosperms353 bait set and the hybpiper-nf and paragone-nf pipelines were used to retrieve, infer, and assemble orthologous loci from 75 taxa representing all the main putative clades within the subtribe. Because of the diploidised ploidy level in Celmisiinae, as well as missing data in the assemblies, uncertainty remains surrounding the inference of orthology detection. However, based on a variety of gene-family sets, coalescent and concatenation-based phylogenetic reconstructions recovered similar topologies. Paralogy and missing data in the gene-families caused some problems, but the estimated phylogenies were well-supported and well-resolved. The phylogenomic evidence supported Celmisiinae and three main clades: the Pleurophyllum clade (Pleurophyllum, Macrolearia and Damnamenia), mostly in the New Zealand Subantarctic Islands, Celmisia of mainland New Zealand and Australia, and Shawia (including 'Olearia pro parte' and Pachystegia) of New Zealand, Australia and New Guinea. The results presented here add to the accumulating support for the Angiosperms353 bait set as an efficient method for documenting plant diversity.
Collapse
Affiliation(s)
- Duncan A Nicol
- Department of Botany, University of Otago, PO Box 56, Dunedin, New Zealand.
| | - Patricio Saldivia
- Biota Ltda. Av. Miguel Claro 1224, Providencia, Santiago, Chile; Museo Regional de Aysén, Km 3 Camino a Coyhaique Alto, Coyhaique, Chile
| | - Tina C Summerfield
- Department of Botany, University of Otago, PO Box 56, Dunedin, New Zealand
| | - Michael Heads
- Buffalo Museum of Science, Buffalo, NY 14211-1293, USA
| | - Janice M Lord
- Department of Botany, University of Otago, PO Box 56, Dunedin, New Zealand
| | - Ei P Khaing
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin, New Zealand
| | - Matthew J Larcombe
- Department of Botany, University of Otago, PO Box 56, Dunedin, New Zealand
| |
Collapse
|
4
|
Rancilhac L, Enbody ED, Harris R, Saitoh T, Irestedt M, Liu Y, Lei F, Andersson L, Alström P. Introgression Underlies Phylogenetic Uncertainty But Not Parallel Plumage Evolution in a Recent Songbird Radiation. Syst Biol 2024; 73:12-25. [PMID: 37801684 PMCID: PMC11129591 DOI: 10.1093/sysbio/syad062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/11/2023] [Accepted: 10/05/2023] [Indexed: 10/08/2023] Open
Abstract
Instances of parallel phenotypic evolution offer great opportunities to understand the evolutionary processes underlying phenotypic changes. However, confirming parallel phenotypic evolution and studying its causes requires a robust phylogenetic framework. One such example is the "black-and-white wagtails," a group of 5 species in the songbird genus Motacilla: 1 species, Motacilla alba, shows wide intra-specific plumage variation, while the 4r others form 2 pairs of very similar-looking species (M. aguimp + M. samveasnae and M. grandis + M. maderaspatensis, respectively). However, the 2 species in each of these pairs were not recovered as sisters in previous phylogenetic inferences. Their relationships varied depending on the markers used, suggesting that gene tree heterogeneity might have hampered accurate phylogenetic inference. Here, we use whole genome resequencing data to explore the phylogenetic relationships within this group, with a special emphasis on characterizing the extent of gene tree heterogeneity and its underlying causes. We first used multispecies coalescent methods to generate a "complete evidence" phylogenetic hypothesis based on genome-wide variants, while accounting for incomplete lineage sorting (ILS) and introgression. We then investigated the variation in phylogenetic signal across the genome to quantify the extent of discordance across genomic regions and test its underlying causes. We found that wagtail genomes are mosaics of regions supporting variable genealogies, because of ILS and inter-specific introgression. The most common topology across the genome, supporting M. alba and M. aguimp as sister species, appears to be influenced by ancient introgression. Additionally, we inferred another ancient introgression event, between M. alba and M. grandis. By combining results from multiple analyses, we propose a phylogenetic network for the black-and-white wagtails that confirms that similar phenotypes evolved in non-sister lineages, supporting parallel plumage evolution. Furthermore, the inferred reticulations do not connect species with similar plumage coloration, suggesting that introgression does not underlie parallel plumage evolution in this group. Our results demonstrate the importance of investing genome-wide patterns of gene tree heterogeneity to help understand the mechanisms underlying phenotypic evolution. [Gene tree heterogeneity; incomplete lineage sorting; introgression; parallel evolution; phylogenomics; plumage evolution; wagtails.].
Collapse
Affiliation(s)
- Loïs Rancilhac
- Animal Ecology, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18 D, 752 36 Uppsala, Sweden
| | - Erik D Enbody
- Department of Medical Biochemistry and Microbiology, Uppsala University, 751 23 Uppsala, Sweden
- Biomolecular Engineering, University of California, 95064 Santa Cruz, CA, USA
| | - Rebecca Harris
- Department of Biology, University of Washington, Seattle, WA 98105, USA
| | - Takema Saitoh
- Yamashina Institute for Ornithology, 115 Konoyama, Abiko, Chiba 270-1145, Japan
| | - Martin Irestedt
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, P.O. Box 50007, 104 05 Stockholm, Sweden
| | - Yang Liu
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Shenzhen 518107, China
| | - Fumin Lei
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 100101 Beijing, China
| | - Leif Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, 751 23 Uppsala, Sweden
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Per Alström
- Animal Ecology, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18 D, 752 36 Uppsala, Sweden
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 100101 Beijing, China
| |
Collapse
|
5
|
Rincón-Barrado M, Villaverde T, Perez MF, Sanmartín I, Riina R. The sweet tabaiba or there and back again: phylogeographical history of the Macaronesian Euphorbia balsamifera. ANNALS OF BOTANY 2024; 133:883-904. [PMID: 38197716 PMCID: PMC11082519 DOI: 10.1093/aob/mcae001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 03/01/2024] [Indexed: 01/11/2024]
Abstract
BACKGROUND AND AIMS Biogeographical relationships between the Canary Islands and north-west Africa are often explained by oceanic dispersal and geographical proximity. Sister-group relationships between Canarian and eastern African/Arabian taxa, the 'Rand Flora' pattern, are rare among plants and have been attributed to the extinction of north-western African populations. Euphorbia balsamifera is the only representative species of this pattern that is distributed in the Canary Islands and north-west Africa; it is also one of few species present in all seven islands. Previous studies placed African populations of E. balsamifera as sister to the Canarian populations, but this relationship was based on herbarium samples with highly degraded DNA. Here, we test the extinction hypothesis by sampling new continental populations; we also expand the Canarian sampling to examine the dynamics of island colonization and diversification. METHODS Using target enrichment with genome skimming, we reconstructed phylogenetic relationships within E. balsamifera and between this species and its disjunct relatives. A single nucleotide polymorphism dataset obtained from the target sequences was used to infer population genetic diversity patterns. We used convolutional neural networks to discriminate among alternative Canary Islands colonization scenarios. KEY RESULTS The results confirmed the Rand Flora sister-group relationship between western E. balsamifera and Euphorbia adenensis in the Eritreo-Arabian region and recovered an eastern-western geographical structure among E. balsamifera Canarian populations. Convolutional neural networks supported a scenario of east-to-west island colonization, followed by population extinctions in Lanzarote and Fuerteventura and recolonization from Tenerife and Gran Canaria; a signal of admixture between the eastern island and north-west African populations was recovered. CONCLUSIONS Our findings support the Surfing Syngameon Hypothesis for the colonization of the Canary Islands by E. balsamifera, but also a recent back-colonization to the continent. Populations of E. balsamifera from northwest Africa are not the remnants of an ancestral continental stock, but originated from migration events from Lanzarote and Fuerteventura. This is further evidence that oceanic archipelagos are not a sink for biodiversity, but may be a source of new genetic variability.
Collapse
Affiliation(s)
- Mario Rincón-Barrado
- Real Jardín Botánico (RJB), CSIC, Madrid, 28014, Spain
- Centro Nacional de Biotecnología (CNB), CSIC, Madrid, 28049, Spain
| | - Tamara Villaverde
- Universidad Rey Juan Carlos (URJC), Área de Biodiversidad y Conservación, Móstoles, 28933, Spain
| | - Manolo F Perez
- Institut de Systématique, Evolution, Biodiversité (ISYEB – URM 7205 CNRS), Muséum National d’Histoire Naturelle, SU, EPHE & UA, Paris, France
| | | | - Ricarda Riina
- Real Jardín Botánico (RJB), CSIC, Madrid, 28014, Spain
| |
Collapse
|
6
|
Balaban M, Jiang Y, Zhu Q, McDonald D, Knight R, Mirarab S. Generation of accurate, expandable phylogenomic trees with uDance. Nat Biotechnol 2024; 42:768-777. [PMID: 37500914 PMCID: PMC10818028 DOI: 10.1038/s41587-023-01868-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 06/20/2023] [Indexed: 07/29/2023]
Abstract
Phylogenetic trees provide a framework for organizing evolutionary histories across the tree of life and aid downstream comparative analyses such as metagenomic identification. Methods that rely on single-marker genes such as 16S rRNA have produced trees of limited accuracy with hundreds of thousands of organisms, whereas methods that use genome-wide data are not scalable to large numbers of genomes. We introduce updating trees using divide-and-conquer (uDance), a method that enables updatable genome-wide inference using a divide-and-conquer strategy that refines different parts of the tree independently and can build off of existing trees, with high accuracy and scalability. With uDance, we infer a species tree of roughly 200,000 genomes using 387 marker genes, totaling 42.5 billion amino acid residues.
Collapse
Affiliation(s)
- Metin Balaban
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Yueyu Jiang
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
7
|
Zaharias P, Kantor YI, Fedosov AE, Puillandre N. Coupling DNA barcodes and exon-capture to resolve the phylogeny of Turridae (Gastropoda, Conoidea). Mol Phylogenet Evol 2024; 191:107969. [PMID: 38007006 DOI: 10.1016/j.ympev.2023.107969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 11/05/2023] [Accepted: 11/20/2023] [Indexed: 11/27/2023]
Abstract
Taxon sampling in most phylogenomic studies is often based on known taxa and/or morphospecies, thus ignoring undescribed diversity and/or cryptic lineages. The family Turridae is a group of venomous snails within the hyperdiverse superfamily Conoidea that includes many undescribed and cryptic species. Therefore 'traditional' taxon sampling could constitute a strong risk of undersampling or oversampling Turridae lineages. To minimize potential biases, we establish a robust sampling strategy, from species delimitation to phylogenomics. More than 3,000 cox-1 "barcode" sequences were used to propose 201 primary species hypotheses, nearly half of them corresponding to species potentially new to science, including several cryptic species. A 110-taxa exon-capture tree, including species representatives of the diversity uncovered with the cox-1 dataset, was build using up to 4,178 loci. Our results show the polyphyly of the genus Gemmula, that is split into up to 10 separate lineages, of which half would not have been detected if the sampling strategy was based only on described species. Our results strongly suggest that the use of blind, exploratory and intensive barcode sampling is necessary to avoid sampling biases in phylogenomic studies.
Collapse
Affiliation(s)
- Paul Zaharias
- Institut Systématique Evolution Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, 43 rue Cuvier, CP 51, 75005 Paris, France.
| | - Yuri I Kantor
- Institut Systématique Evolution Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, 43 rue Cuvier, CP 51, 75005 Paris, France; A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Leninski prospect 33, 119071 Moscow, Russian Federation
| | - Alexander E Fedosov
- Institut Systématique Evolution Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, 43 rue Cuvier, CP 51, 75005 Paris, France; Swedish Museum of Natural History, Box 50007, SE-104 05 Stockholm, Sweden
| | - Nicolas Puillandre
- Institut Systématique Evolution Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, 43 rue Cuvier, CP 51, 75005 Paris, France
| |
Collapse
|
8
|
Piñeiro C, Pichel JC. Efficient phylogenetic tree inference for massive taxonomic datasets: harnessing the power of a server to analyze 1 million taxa. Gigascience 2024; 13:giae055. [PMID: 39115958 PMCID: PMC11308190 DOI: 10.1093/gigascience/giae055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 04/17/2024] [Accepted: 07/11/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Phylogenies play a crucial role in biological research. Unfortunately, the search for the optimal phylogenetic tree incurs significant computational costs, and most of the existing state-of-the-art tools cannot deal with extremely large datasets in reasonable times. RESULTS In this work, we introduce the new VeryFastTree code (version 4.0), which is able to construct a tree on 1 server using single-precision arithmetic from a massive 1 million alignment dataset in only 36 hours, which is 3 times and 3.2 times faster than its previous version and FastTree-2, respectively. This new version further boosts performance by parallelizing all tree traversal operations during the tree construction process, including subtree pruning and regrafting moves. Additionally, it introduces significant new features such as support for new and compressed file formats, enhanced compatibility across a broader range of operating systems, and the integration of disk computing functionality. The latter feature is particularly advantageous for users without access to high-end servers, as it allows them to manage very large datasets, albeit with an increase in computing time. CONCLUSIONS Experimental results establish VeryFastTree as the fastest tool in the state-of-the-art for maximum likelihood phylogeny estimation. It is publicly available at https://github.com/citiususc/veryfasttree. In addition, VeryFastTree is included as a package in Bioconda, MacPorts, and all Debian-based Linux distributions.
Collapse
Affiliation(s)
- César Piñeiro
- Information Retrieval Lab, CITIC, Universidade da Coruña, A Coruña 15008, Spain
| | - Juan C Pichel
- CiTIUS, Universidade de Santiago de Compostela, Santiago de Compostela 15782, Spain
| |
Collapse
|
9
|
DeSalle R, Narechania A, Tessler M. Multiple Outgroups Can Cause Random Rooting in Phylogenomics. Mol Phylogenet Evol 2023; 184:107806. [PMID: 37172862 DOI: 10.1016/j.ympev.2023.107806] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 02/06/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Outgroup selection has been a major challenge since the rise of phylogenetics, and it has remained so in the phylogenomic era. Our goal here is to use large phylogenomic animal datasets to examine the impact of outgroup selection on the final topology. The results of our analyses further solidify the fact that distant outgroups can cause random rooting, and that this holds for concatenated and coalescent-based methods. The results also indicate that the standard practice of using multiple outgroups often causes random rooting. Most researchers go out of their way to get multiple outgroups, as this has been standard practice for decades. Based on our findings, this practice should stop. Instead, our results suggest that a single (most closely) related relative should be selected as the outgroup, unless all outgroups are roughly equally closely related to the ingroup.
Collapse
Affiliation(s)
- Rob DeSalle
- Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA; Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Apurva Narechania
- Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| | - Michael Tessler
- Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA; Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA; St. Francis College, Department of Biology, Brooklyn, NY 11201, USA
| |
Collapse
|
10
|
Zaharias P, Warnow T. Recent progress on methods for estimating and updating large phylogenies. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210244. [PMID: 35989607 PMCID: PMC9393559 DOI: 10.1098/rstb.2021.0244] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 01/07/2022] [Indexed: 12/20/2022] Open
Abstract
With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
- Paul Zaharias
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
11
|
Smith BT, Merwin J, Provost KL, Thom G, Brumfield RT, Ferreira M, Mauck Iii WM, Moyle RG, Wright T, Joseph L. Phylogenomic analysis of the parrots of the world distinguishes artifactual from biological sources of gene tree discordance. Syst Biol 2022; 72:228-241. [PMID: 35916751 DOI: 10.1093/sysbio/syac055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 02/22/2022] [Accepted: 07/22/2022] [Indexed: 11/14/2022] Open
Abstract
Gene tree discordance is expected in phylogenomic trees and biological processes are often invoked to explain it. However, heterogeneous levels of phylogenetic signal among individuals within datasets may cause artifactual sources of topological discordance. We examined how the information content in tips and subclades impacts topological discordance in the parrots (Order: Psittaciformes), a diverse and highly threatened clade of nearly 400 species. Using ultraconserved elements from 96% of the clade's species-level diversity, we estimated concatenated and species trees for 382 ingroup taxa. We found that discordance among tree topologies was most common at nodes dating between the late Miocene and Pliocene, and often at the taxonomic level of genus. Accordingly, we used two metrics to characterize information content in tips and assess the degree to which conflict between trees was being driven by lower quality samples. Most instances of topological conflict and non-monophyletic genera in the species tree could be objectively identified using these metrics. For subclades still discordant after tip-based filtering, we used a machine learning approach to determine whether phylogenetic signal or noise was the more important predictor of metrics supporting the alternative topologies. We found that when signal favored one of the topologies, noise was the most important variable in poorly performing models that favored the alternative topology. In sum, we show that artifactual sources of gene tree discordance, which are likely a common phenomenon in many datasets, can be distinguished from biological sources by quantifying the information content in each tip and modeling which factors support each topology.
Collapse
Affiliation(s)
- Brian Tilston Smith
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
| | - Jon Merwin
- Department of Ornithology, Academy of Natural Sciences of Drexel University, 1900 Benjamin Franklin Parkway, Philadelphia, PA 19103, USA.,Department of Biodiversity, Earth, and Environmental Science, Drexel University, Philadelphia, PA 19103, USA
| | - Kaiya L Provost
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, 318 W. 12th Avenue, Columbus, OH 43210, USA
| | - Gregory Thom
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Robb T Brumfield
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Mateus Ferreira
- Centro de Estudos da Biodiversidade, Universidade Federal de Roraima, Av. Cap. Ene Garcez, 2413, Boa Vista, RR, Brazil
| | - William M Mauck Iii
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
| | - Robert G Moyle
- Department of Ecology and Evolutionary Biology and Biodiversity Institute, University of Kansas, 1345 Jayhawk Blvd., Lawrence, KS 66045, USA
| | - Timothy Wright
- Department of Biology, New Mexico State University, Las Cruces, NM, 88003, USA
| | - Leo Joseph
- Australian National Wildlife Collection, National Research Collections Australia, CSIRO, GPO Box 1700, Canberra, ACT, 2601, Australia
| |
Collapse
|
12
|
Ufimov R, Gorospe JM, Fér T, Kandziora M, Salomon L, van Loo M, Schmickl R. Utilizing paralogs for phylogenetic reconstruction has the potential to increase species tree support and reduce gene tree discordance in target enrichment data. Mol Ecol Resour 2022; 22:3018-3034. [PMID: 35796729 DOI: 10.1111/1755-0998.13684] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 05/28/2022] [Accepted: 06/22/2022] [Indexed: 11/30/2022]
Abstract
The analysis of target enrichment data in phylogenetics lacks optimization toward using paralogs for phylogenetic reconstruction. We developed a novel approach of detecting paralogs and utilizing them for phylogenetic tree inference, by retrieving both ortho- and paralogous copies and creating orthologous alignments, from which the gene trees are built. We implemented this approach in ParalogWizard and demonstrate its performance in plant groups that underwent a whole genome duplication relatively recently: the subtribe Malinae (family Rosaceae), using Angiosperms353 as well as Malinae481 probes, the genus Oritrophium (family Asteraceae), using Compositae1061 probes, and the genus Amomum (family Zingiberaceae), using Zingiberaceae1180 probes. Discriminating between orthologs and paralogs reduced gene tree discordance and increased the species tree support in the case of the Malinae, but not for Oritrophium and Amomum. This may relate to the difference in the proportion of paralogous loci between the datasets, which was highest for the Malinae. Overall, retrieving paralogs for phylogenetic reconstruction following ParalogWizard has the potential to increase the species tree support and reduce gene tree discordance in target enrichment data, particularly if the proportion of paralogous loci is high.
Collapse
Affiliation(s)
- Roman Ufimov
- Department of Forest Growth, Silviculture and Genetics, Austrian Research Centre for Forests, Seckendorff-Gudent-Weg 8, 1130, Vienna, Austria.,Komarov Botanical Institute, Russian Academy of Sciences, ul. Prof. Popova 2, 197376, St. Petersburg, Russian Federation
| | - Juan Manuel Gorospe
- Institute of Botany, The Czech Academy of Sciences, Zámek 1, 252 43, Průhonice, Czech Republic.,Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 01, Prague, Czech Republic
| | - Tomáš Fér
- Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 01, Prague, Czech Republic
| | - Martha Kandziora
- Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 01, Prague, Czech Republic
| | - Luciana Salomon
- Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 01, Prague, Czech Republic
| | - Marcela van Loo
- Department of Forest Growth, Silviculture and Genetics, Austrian Research Centre for Forests, Seckendorff-Gudent-Weg 8, 1130, Vienna, Austria
| | - Roswitha Schmickl
- Institute of Botany, The Czech Academy of Sciences, Zámek 1, 252 43, Průhonice, Czech Republic.,Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 01, Prague, Czech Republic
| |
Collapse
|
13
|
Salter JF, Hosner PA, Tsai WLE, McCormack JE, Braun EL, Kimball RT, Brumfield RT, Faircloth BC. Historical specimens and the limits of subspecies phylogenomics in the New World quails (Odontophoridae). Mol Phylogenet Evol 2022; 175:107559. [PMID: 35803448 DOI: 10.1016/j.ympev.2022.107559] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 05/19/2022] [Accepted: 05/26/2022] [Indexed: 01/22/2023]
Abstract
As phylogenomics focuses on comprehensive taxon sampling at the species and population/subspecies levels, incorporating genomic data from historical specimens has become increasingly common. While historical samples can fill critical gaps in our understanding of the evolutionary history of diverse groups, they also introduce additional sources of phylogenomic uncertainty, making it difficult to discern novel evolutionary relationships from artifacts caused by sample quality issues. These problems highlight the need for improved strategies to disentangle artifactual patterns from true biological signal as historical specimens become more prevalent in phylogenomic datasets. Here, we tested the limits of historical specimen-driven phylogenomics to resolve subspecies-level relationships within a highly polytypic family, the New World quails (Odontophoridae), using thousands of ultraconserved elements (UCEs). We found that relationships at and above the species-level were well-resolved and highly supported across all analyses, with the exception of discordant relationships within the two most polytypic genera which included many historical specimens. We examined the causes of discordance and found that inferring phylogenies from subsets of taxa resolved the disagreements, suggesting that analyzing subclades can help remove artifactual causes of discordance in datasets that include historical samples. At the subspecies-level, we found well-resolved geographic structure within the two most polytypic genera, including the most polytypic species in this family, Northern Bobwhites (Colinus virginianus), demonstrating that variable sites within UCEs are capable of resolving phylogenetic structure below the species level. Our results highlight the importance of complete taxonomic sampling for resolving relationships among polytypic species, often through the inclusion of historical specimens, and we propose an integrative strategy for understanding and addressing the uncertainty that historical samples sometimes introduce to phylogenetic analyses.
Collapse
Affiliation(s)
- Jessie F Salter
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA.
| | - Peter A Hosner
- Natural History Museum of Denmark, Center for Global Mountain Biodiversity, and Center for Macroecology, Evolution, and Climate, University of Copenhagen, Copenhagen, Denmark; Department of Biology, University of Florida, Gainesville, FL, USA
| | - Whitney L E Tsai
- Moore Laboratory of Biology, Occidental College, Los Angeles, CA, USA
| | - John E McCormack
- Moore Laboratory of Biology, Occidental College, Los Angeles, CA, USA; Biology Department, Occidental College, Los Angeles, CA, USA
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL, USA
| | | | - Robb T Brumfield
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Brant C Faircloth
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| |
Collapse
|
14
|
Gatesy J, Springer MS. Phylogenomic Coalescent Analyses of Avian Retroelements Infer Zero-Length Branches at the Base of Neoaves, Emergent Support for Controversial Clades, and Ancient Introgressive Hybridization in Afroaves. Genes (Basel) 2022; 13:1167. [PMID: 35885951 PMCID: PMC9324441 DOI: 10.3390/genes13071167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 06/20/2022] [Accepted: 06/21/2022] [Indexed: 01/25/2023] Open
Abstract
Retroelement insertions (RIs) are low-homoplasy characters that are ideal data for addressing deep evolutionary radiations, where gene tree reconstruction errors can severely hinder phylogenetic inference with DNA and protein sequence data. Phylogenomic studies of Neoaves, a large clade of birds (>9000 species) that first diversified near the Cretaceous−Paleogene boundary, have yielded an array of robustly supported, contradictory relationships among deep lineages. Here, we reanalyzed a large RI matrix for birds using recently proposed quartet-based coalescent methods that enable inference of large species trees including branch lengths in coalescent units, clade-support, statistical tests for gene flow, and combined analysis with DNA-sequence-based gene trees. Genome-scale coalescent analyses revealed extremely short branches at the base of Neoaves, meager branch support, and limited congruence with previous work at the most challenging nodes. Despite widespread topological conflicts with DNA-sequence-based trees, combined analyses of RIs with thousands of gene trees show emergent support for multiple higher-level clades (Columbea, Passerea, Columbimorphae, Otidimorphae, Phaethoquornithes). RIs express asymmetrical support for deep relationships within the subclade Afroaves that hints at ancient gene flow involving the owl lineage (Strigiformes). Because DNA-sequence data are challenged by gene tree-reconstruction error, analysis of RIs represents one approach for improving gene tree-based methods when divergences are deep, internodes are short, terminal branches are long, and introgressive hybridization further confounds species−tree inference.
Collapse
Affiliation(s)
- John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Mark S. Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA;
| |
Collapse
|
15
|
Shen C, Park M, Warnow T. WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment. J Comput Biol 2022; 29:782-801. [PMID: 35575747 DOI: 10.1089/cmb.2021.0585] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Accurate multiple sequence alignment is challenging on many data sets, including those that are large, evolve under high rates of evolution, or have sequence length heterogeneity. While substantial progress has been made over the last decade in addressing the first two challenges, sequence length heterogeneity remains a significant issue for many data sets. Sequence length heterogeneity occurs for biological and technological reasons, including large insertions or deletions (indels) that occurred in the evolutionary history relating the sequences, or the inclusion of sequences that are not fully assembled. Ultra-large alignments using Phylogeny-Aware Profiles (UPP) (Nguyen et al. 2015) is one of the most accurate approaches for aligning data sets that exhibit sequence length heterogeneity: it constructs an alignment on the subset of sequences it considers "full-length," represents this "backbone alignment" using an ensemble of hidden Markov models (HMMs), and then adds each remaining sequence into the backbone alignment based on an HMM selected for that sequence from the ensemble. Our new method, WeIghTed Consensus Hmm alignment (WITCH), improves on UPP in three important ways: first, it uses a statistically principled technique to weight and rank the HMMs; second, it uses k>1 HMMs from the ensemble rather than a single HMM; and third, it combines the alignments for each of the selected HMMs using a consensus algorithm that takes the weights into account. We show that this approach provides improved alignment accuracy compared with UPP and other leading alignment methods, as well as improved accuracy for maximum likelihood trees based on these alignments.
Collapse
Affiliation(s)
- Chengze Shen
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Minhyuk Park
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
16
|
Tumescheit C, Firth AE, Brown K. CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments. PeerJ 2022; 10:e12983. [PMID: 35310163 PMCID: PMC8932311 DOI: 10.7717/peerj.12983] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 02/01/2022] [Indexed: 01/11/2023] Open
Abstract
Background Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to poor alignment and large gaps. This slows down computation and can impact conclusions without being biologically relevant. Cleaning the alignment by removing common issues such as gaps, divergent sequences, large insertions and deletions and poorly aligned sequence ends can substantially improve analyses. Manual editing of MSAs is very widespread but is time-consuming and difficult to reproduce. Results We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed. The main functionalities of the tool include removing regions of low coverage due to insertions, removing gaps, cropping poorly aligned sequence ends and removing sequences that are too divergent or too short. The thresholds for each function can be specified by the user and parameters can be adjusted to each individual MSA. CIAlign is designed with an emphasis on solving specific and common alignment problems and on providing transparency to the user. Conclusion CIAlign effectively removes problematic regions and sequences from MSAs and provides novel visualisation options. This tool can be used to fine-tune alignments for further analysis and processing. The tool is aimed at anyone who wishes to automatically clean up parts of an MSA and those requiring a new, accessible way of visualising large MSAs.
Collapse
Affiliation(s)
| | - Andrew E. Firth
- Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| | - Katherine Brown
- Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
17
|
Mai U, Mirarab S. Completing gene trees without species trees in sub-quadratic time. Bioinformatics 2022; 38:1532-1541. [PMID: 34978565 DOI: 10.1093/bioinformatics/btab875] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 11/27/2021] [Accepted: 12/30/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION As genome-wide reconstruction of phylogenetic trees becomes more widespread, limitations of available data are being appreciated more than ever before. One issue is that phylogenomic datasets are riddled with missing data, and gene trees, in particular, almost always lack representatives from some species otherwise available in the dataset. Since many downstream applications of gene trees require or can benefit from access to complete gene trees, it will be beneficial to algorithmically complete gene trees. Also, gene trees are often unrooted, and rooting them is useful for downstream applications. While completing and rooting a gene tree with respect to a given species tree has been studied, those problems are not studied in depth when we lack such a reference species tree. RESULTS We study completion of gene trees without a need for a reference species tree. We formulate an optimization problem to complete the gene trees while minimizing their quartet distance to the given set of gene trees. We extend a seminal algorithm by Brodal et al. to solve this problem in quasi-linear time. In simulated studies and on a large empirical data, we show that completion of gene trees using other gene trees is relatively accurate and, unlike the case where a species tree is available, is unbiased. AVAILABILITY AND IMPLEMENTATION Our method, tripVote, is available at https://github.com/uym2/tripVote. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Uyen Mai
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| |
Collapse
|
18
|
Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022; 2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenomics is the inference of phylogenetic trees based on multiple marker genes sampled in the genomes of interest. An important challenge in phylogenomics is the potential incongruence among the evolutionary histories of individual genes, which can be widespread in microorganisms due to the prevalence of horizontal gene transfer. This protocol introduces the procedures for building a phylogenetic tree of a large number of microbial genomes using a broad sampling of marker genes that are representative of whole-genome evolution. The protocol highlights the use of a gene tree summary method, which can effectively reconstruct the species tree while accounting for the topological conflicts among individual gene trees. The pipeline described in this protocol is scalable to tens of thousands of genomes while retaining high accuracy. We discussed multiple software tools, libraries, and scripts to enable convenient adoption of the protocol. The protocol is suitable for microbiology and microbiome studies based on public genomes and metagenomic data.
Collapse
Affiliation(s)
- Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
19
|
Kohli M, Letsch H, Greve C, Béthoux O, Deregnaucourt I, Liu S, Zhou X, Donath A, Mayer C, Podsiadlowski L, Gunkel S, Machida R, Niehuis O, Rust J, Wappler T, Yu X, Misof B, Ware J. Evolutionary history and divergence times of Odonata (dragonflies and damselflies) revealed through transcriptomics. iScience 2021; 24:103324. [PMID: 34805787 PMCID: PMC8586788 DOI: 10.1016/j.isci.2021.103324] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 05/14/2021] [Accepted: 10/19/2021] [Indexed: 11/22/2022] Open
Abstract
Dragonflies and damselflies are among the earliest flying insects with extant representatives. However, unraveling details of their long evolutionary history, such as egg laying (oviposition) strategies, is impeded by unresolved phylogenetic relationships, particularly in damselflies. Here we present a transcriptome-based phylogenetic reconstruction of Odonata, analyzing 2,980 protein-coding genes in 105 species representing nearly all the order's families. All damselfly and most dragonfly families are recovered as monophyletic. Our data suggest a sister relationship between dragonfly families of Gomphidae and Petaluridae. According to our divergence time estimates, both crown-Zygoptera and -Anisoptera arose during the late Triassic. Egg-laying with a reduced ovipositor apparently evolved in dragonflies during the late Jurassic/early Cretaceous. Lastly, we also test the impact of fossil choice and placement, particularly, of the extinct fossil species, †Triassolestodes asiaticus, and †Proterogomphus renateae on divergence time estimates. We find placement of †Proterogomphus renateae to be much more impactful than †Triassolestodes asiaticus.
Collapse
Affiliation(s)
- Manpreet Kohli
- Department of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - Harald Letsch
- Department for Animal Biodiversity, Universität Wien, Vienna, Austria
| | - Carola Greve
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt am Main, Germany
| | - Olivier Béthoux
- CR2P (Centre de Recherche en Paléontologie – Paris), MNHN – CNRS – Sorbonne Université, Paris, France
| | - Isabelle Deregnaucourt
- CR2P (Centre de Recherche en Paléontologie – Paris), MNHN – CNRS – Sorbonne Université, Paris, France
| | - Shanlin Liu
- Department of Entomology, China Agricultural University,Beijing 100193, People’s Republic of China
| | - Xin Zhou
- Department of Entomology, China Agricultural University,Beijing 100193, People’s Republic of China
| | - Alexander Donath
- Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Christoph Mayer
- Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Lars Podsiadlowski
- Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Simon Gunkel
- Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Ryuichiro Machida
- Sugadaira Research Station, Mountain Research Center, University of Tsukuba, Sugadaira Kogen, Ueda, Nagano, Japan
| | - Oliver Niehuis
- Department of Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert Ludwig University, Freiburg, Germany
| | - Jes Rust
- Palaeontology Section, Institute of Geosciences, Rheinische Friedrich-Wilhelms Universität Bonn, Bonn 53115, Germany
| | - Torsten Wappler
- Palaeontology Section, Institute of Geosciences, Rheinische Friedrich-Wilhelms Universität Bonn, Bonn 53115, Germany
| | - Xin Yu
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China
| | - Bernhard Misof
- Leibniz Institute for the Analysis of Biodiversity Change, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Jessica Ware
- Department of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA
| |
Collapse
|
20
|
How challenging RADseq data turned out to favor coalescent-based species tree inference. A case study in Aichryson (Crassulaceae). Mol Phylogenet Evol 2021; 167:107342. [PMID: 34785384 DOI: 10.1016/j.ympev.2021.107342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 07/05/2021] [Accepted: 10/29/2021] [Indexed: 12/24/2022]
Abstract
Analysing multiple genomic regions while incorporating detection and qualification of discordance among regions has become standard for understanding phylogenetic relationships. In plants, which usually have comparatively large genomes, this is feasible by the combination of reduced-representation library (RRL) methods and high-throughput sequencing enabling the cost effective acquisition of genomic data for thousands of loci from hundreds of samples. One popular RRL method is RADseq. A major disadvantage of established RADseq approaches is the rather short fragment and sequencing range, leading to loci of little individual phylogenetic information. This issue hampers the application of coalescent-based species tree inference. The modified RADseq protocol presented here targets ca. 5,000 loci of 300-600nt length, sequenced with the latest short-read-sequencing (SRS) technology, has the potential to overcome this drawback. To illustrate the advantages of this approach we use the study group Aichryson Webb & Berthelott (Crassulaceae), a plant genus that diversified on the Canary Islands. The data analysis approach used here aims at a careful quality control of the long loci dataset. It involves an informed selection of thresholds for accurate clustering, a thorough exploration of locus properties, such as locus length, coverage and variability, to identify potential biased data and a comparative phylogenetic inference of filtered datasets, accompanied by an evaluation of resulting BS support, gene and site concordance factor values, to improve overall resolution of the resulting phylogenetic trees. The final dataset contains variable loci with an average length of 373nt and facilitates species tree estimation using a coalescent-based summary approach. Additional improvements brought by the approach are critically discussed.
Collapse
|
21
|
Borowiec ML, Cover SP, Rabeling C. The evolution of social parasitism in Formica ants revealed by a global phylogeny. Proc Natl Acad Sci U S A 2021; 118:e2026029118. [PMID: 34535549 PMCID: PMC8463886 DOI: 10.1073/pnas.2026029118] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2021] [Indexed: 02/07/2023] Open
Abstract
Studying the behavioral and life history transitions from a cooperative, eusocial life history to exploitative social parasitism allows for deciphering the conditions under which changes in behavior and social organization lead to diversification. The Holarctic ant genus Formica is ideally suited for studying the evolution of social parasitism because half of its 172 species are confirmed or suspected social parasites, which includes all three major classes of social parasitism known in ants. However, the life history transitions associated with the evolution of social parasitism in this genus are largely unexplored. To test competing hypotheses regarding the origins and evolution of social parasitism, we reconstructed a global phylogeny of Formica ants. The genus originated in the Old World ∼30 Ma ago and dispersed multiple times to the New World and back. Within Formica, obligate dependent colony-founding behavior arose once from a facultatively polygynous common ancestor practicing independent and facultative dependent colony foundation. Temporary social parasitism likely preceded or arose concurrently with obligate dependent colony founding, and dulotic social parasitism evolved once within the obligate dependent colony-founding clade. Permanent social parasitism evolved twice from temporary social parasitic ancestors that rarely practiced colony budding, demonstrating that obligate social parasitism can originate from a facultative parasitic background in socially polymorphic organisms. In contrast to permanently socially parasitic ants in other genera, the high parasite diversity in Formica likely originated via allopatric speciation, highlighting the diversity of convergent evolutionary trajectories resulting in nearly identical parasitic life history syndromes.
Collapse
Affiliation(s)
- Marek L Borowiec
- School of Life Sciences, Arizona State University, Tempe, AZ 85287;
- Department of Entomology, Plant Pathology, and Nematology, University of Idaho, Moscow, ID 83844
- Institute of Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID 83844
| | - Stefan P Cover
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138
| | | |
Collapse
|
22
|
Forthman M, Braun EL, Kimball RT. Gene tree quality affects empirical coalescent branch length estimation. ZOOL SCR 2021. [DOI: 10.1111/zsc.12512] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Michael Forthman
- Department of Entomology & Nematology University of Florida Gainesville FL USA
- California State Collection of Arthropods Plant Pest Diagnostics Branch California Department of Food & Agriculture Sacramento CA USA
| | - Edward L. Braun
- Department of Biology University of Florida Gainesville FL USA
| | | |
Collapse
|
23
|
Zhang C, Zhao Y, Braun EL, Mirarab S. TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13696] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology Program University of California San Diego CA USA
| | - Yiming Zhao
- Electrical and Computer Engineering Department University of California San Diego CA USA
| | - Edward L. Braun
- Department of Biology and Genetics Institute University of Florida Gainesville FL USA
| | - Siavash Mirarab
- Electrical and Computer Engineering Department University of California San Diego CA USA
| |
Collapse
|
24
|
Ferrer Obiol J, James HF, Chesser RT, Bretagnolle V, González-Solís J, Rozas J, Riutort M, Welch AJ. Integrating Sequence Capture and Restriction Site-Associated DNA Sequencing to Resolve Recent Radiations of Pelagic Seabirds. Syst Biol 2021; 70:976-996. [PMID: 33512506 PMCID: PMC8357341 DOI: 10.1093/sysbio/syaa101] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 11/13/2020] [Accepted: 12/15/2020] [Indexed: 01/01/2023] Open
Abstract
The diversification of modern birds has been shaped by a number of radiations. Rapid diversification events make reconstructing the evolutionary relationships among taxa challenging due to the convoluted effects of incomplete lineage sorting (ILS) and introgression. Phylogenomic data sets have the potential to detect patterns of phylogenetic incongruence, and to address their causes. However, the footprints of ILS and introgression on sequence data can vary between different phylogenomic markers at different phylogenetic scales depending on factors such as their evolutionary rates or their selection pressures. We show that combining phylogenomic markers that evolve at different rates, such as paired-end double-digest restriction site-associated DNA (PE-ddRAD) and ultraconserved elements (UCEs), allows a comprehensive exploration of the causes of phylogenetic discordance associated with short internodes at different timescales. We used thousands of UCE and PE-ddRAD markers to produce the first well-resolved phylogeny of shearwaters, a group of medium-sized pelagic seabirds that are among the most phylogenetically controversial and endangered bird groups. We found that phylogenomic conflict was mainly derived from high levels of ILS due to rapid speciation events. We also documented a case of introgression, despite the high philopatry of shearwaters to their breeding sites, which typically limits gene flow. We integrated state-of-the-art concatenated and coalescent-based approaches to expand on previous comparisons of UCE and RAD-Seq data sets for phylogenetics, divergence time estimation, and inference of introgression, and we propose a strategy to optimize RAD-Seq data for phylogenetic analyses. Our results highlight the usefulness of combining phylogenomic markers evolving at different rates to understand the causes of phylogenetic discordance at different timescales. [Aves; incomplete lineage sorting; introgression; PE-ddRAD-Seq; phylogenomics; radiations; shearwaters; UCEs.].
Collapse
Affiliation(s)
- Joan Ferrer Obiol
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Barcelona, Catalonia, Spain
| | - Helen F James
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - R Terry Chesser
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- U.S. Geological Survey, Patuxent Wildlife Research Center, Laurel, MD, USA
| | - Vincent Bretagnolle
- Centre d’Études Biologiques de Chizé, CNRS & La Rochelle Université, 79360, Villiers en Bois, France
| | - Jacob González-Solís
- Institut de Recerca de la Biodiversitat (IRBio), Barcelona, Catalonia, Spain
- Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Barcelona, Catalonia, Spain
| | - Marta Riutort
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Barcelona, Catalonia, Spain
| | | |
Collapse
|
25
|
Shah T, Schneider JV, Zizka G, Maurin O, Baker W, Forest F, Brewer GE, Savolainen V, Darbyshire I, Larridon I. Joining forces in Ochnaceae phylogenomics: a tale of two targeted sequencing probe kits. AMERICAN JOURNAL OF BOTANY 2021; 108:1201-1216. [PMID: 34180046 DOI: 10.1002/ajb2.1682] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 02/23/2021] [Indexed: 05/10/2023]
Abstract
PREMISE Both universal and family-specific targeted sequencing probe kits are becoming widely used for reconstruction of phylogenetic relationships in angiosperms. Within the pantropical Ochnaceae, we show that with careful data filtering, universal kits are equally as capable in resolving intergeneric relationships as custom probe kits. Furthermore, we show the strength in combining data from both kits to mitigate bias and provide a more robust result to resolve evolutionary relationships. METHODS We sampled 23 Ochnaceae genera and used targeted sequencing with two probe kits, the universal Angiosperms353 kit and a family-specific kit. We used maximum likelihood inference with a concatenated matrix of loci and multispecies-coalescence approaches to infer relationships in the family. We explored phylogenetic informativeness and the impact of missing data on resolution and tree support. RESULTS For the Angiosperms353 data set, the concatenation approach provided results more congruent with those of the Ochnaceae-specific data set. Filtering missing data was most impactful on the Angiosperms353 data set, with a relaxed threshold being the optimum scenario. The Ochnaceae-specific data set resolved consistent topologies using both inference methods, and no major improvements were obtained after data filtering. Merging of data obtained with the two kits resulted in a well-supported phylogenetic tree. CONCLUSIONS The Angiosperms353 data set improved upon data filtering, and missing data played an important role in phylogenetic reconstruction. The Angiosperms353 data set resolved the phylogenetic backbone of Ochnaceae as equally well as the family specific data set. All analyses indicated that both Sauvagesia L. and Campylospermum Tiegh. as currently circumscribed are polyphyletic and require revised delimitation.
Collapse
Affiliation(s)
- Toral Shah
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
- Department of Life Sciences, Imperial College, Silwood Park Campus, Ascot, Berks, SL5 7PY, UK
| | - Julio V Schneider
- Department of Botany and Molecular Evolution, Senckenberg Research Institute and Natural History Museum Frankfurt, Senckenberganlage 25, Frankfurt am Main, D-60325, Germany
| | - Georg Zizka
- Department of Botany and Molecular Evolution, Senckenberg Research Institute and Natural History Museum Frankfurt, Senckenberganlage 25, Frankfurt am Main, D-60325, Germany
- Institute of Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Str. 13, Frankfurt am Main, 60438, Germany
| | - Olivier Maurin
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | - William Baker
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | - Grace E Brewer
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | - Vincent Savolainen
- Department of Life Sciences, Imperial College, Silwood Park Campus, Ascot, Berks, SL5 7PY, UK
| | | | - Isabel Larridon
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
- Systematic and Evolutionary Botany Lab, Department of Biology, Ghent University, K.L., Ledeganckstraat 35, Gent, 9000, Belgium
| |
Collapse
|
26
|
Abstract
The estimation of phylogenetic trees for individual genes or multi-locus datasets is a basic part of considerable biological research. In order to enable large trees to be computed, Disjoint Tree Mergers (DTMs) have been developed; these methods operate by dividing the input sequence dataset into disjoint sets, constructing trees on each subset, and then combining the subset trees (using auxiliary information) into a tree on the full dataset. DTMs have been used to advantage for multi-locus species tree estimation, enabling highly accurate species trees at reduced computational effort, compared to leading species tree estimation methods. Here, we evaluate the feasibility of using DTMs to improve the scalability of maximum likelihood (ML) gene tree estimation to large numbers of input sequences. Our study shows distinct differences between the three selected ML codes—RAxML-NG, IQ-TREE 2, and FastTree 2—and shows that good DTM pipeline design can provide advantages over these ML codes on large datasets.
Collapse
|
27
|
Phylogenomic and ecological analyses reveal the spatiotemporal evolution of global pines. Proc Natl Acad Sci U S A 2021; 118:2022302118. [PMID: 33941644 PMCID: PMC8157994 DOI: 10.1073/pnas.2022302118] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
How coniferous forests evolved in the Northern Hemisphere remains largely unknown. Unlike most groups of organisms that generally follow a latitudinal diversity gradient, most conifer species in the Northern Hemisphere are distributed in mountainous areas at middle latitudes. It is of great interest to know whether the midlatitude region has been an evolutionary cradle or museum for conifers and how evolutionary and ecological factors have driven their spatiotemporal evolution. Here, we investigated the macroevolution of Pinus, the largest conifer genus and characteristic of northern temperate coniferous forests, based on nearly complete species sampling. Using 1,662 genes from transcriptome sequences, we reconstructed a robust species phylogeny and reestimated divergence times of global pines. We found that ∼90% of extant pine species originated in the Miocene in sharp contrast to the ancient origin of Pinus, indicating a Neogene rediversification. Surprisingly, species at middle latitudes are much older than those at other latitudes. This finding, coupled with net diversification rate analysis, indicates that the midlatitude region has provided an evolutionary museum for global pines. Analyses of 31 environmental variables, together with a comparison of evolutionary rates of niche and phenotypic traits with a net diversification rate, found that topography played a primary role in pine diversification, and the aridity index was decisive for the niche rate shift. Moreover, fire has forced diversification and adaptive evolution of Pinus Our study highlights the importance of integrating phylogenomic and ecological approaches to address evolution of biological groups at the global scale.
Collapse
|
28
|
Minh BQ, Dang CC, Vinh LS, Lanfear R. QMaker: Fast and accurate method to estimate empirical models of protein evolution. Syst Biol 2021; 70:1046-1060. [PMID: 33616668 PMCID: PMC8357343 DOI: 10.1093/sysbio/syab010] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 12/25/2020] [Accepted: 02/10/2021] [Indexed: 11/29/2022] Open
Abstract
Amino acid substitution models play a crucial role in phylogenetic analyses. Maximum likelihood (ML) methods have been proposed to estimate amino acid substitution models; however, they are typically complicated and slow. In this article, we propose QMaker, a new ML method to estimate a general time-reversible \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$Q$\end{document} matrix from a large protein data set consisting of multiple sequence alignments. QMaker combines an efficient ML tree search algorithm, a model selection for handling the model heterogeneity among alignments, and the consideration of rate mixture models among sites. We provide QMaker as a user-friendly function in the IQ-TREE software package (http://www.iqtree.org) supporting the use of multiple CPU cores so that biologists can easily estimate amino acid substitution models from their own protein alignments. We used QMaker to estimate new empirical general amino acid substitution models from the current Pfam database as well as five clade-specific models for mammals, birds, insects, yeasts, and plants. Our results show that the new models considerably improve the fit between model and data and in some cases influence the inference of phylogenetic tree topologies.[Amino acid replacement matrices; amino acid substitution models; maximum likelihood estimation; phylogenetic inferences.]
Collapse
Affiliation(s)
- Bui Quang Minh
- School of Computing, Australian National University, 145 Science Road, Acton, ACT 2601, Canberra, Australia
- Department of Ecology and Evolution, Research School of Biology, Australian National University, 145 Science Road, Acton, ACT 2601, Canberra, Australia
| | - Cuong Cao Dang
- Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, 10000 Hanoi, Vietnam Bui Quang Minh and Cuong Cao Dang contributed equally to this article
| | - Le Sy Vinh
- Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, 10000 Hanoi, Vietnam Bui Quang Minh and Cuong Cao Dang contributed equally to this article
- Correspondence to be sent to: University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, 10000 Hanoi, Vietnam; E-mail: and Department of Ecology and Evolution, Research School of Biology, Australian National University, 145 Science Road, Acton, ACT 2601, Canberra, Australia; E-mail:
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, 145 Science Road, Acton, ACT 2601, Canberra, Australia
- Correspondence to be sent to: University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, 10000 Hanoi, Vietnam; E-mail: and Department of Ecology and Evolution, Research School of Biology, Australian National University, 145 Science Road, Acton, ACT 2601, Canberra, Australia; E-mail:
| |
Collapse
|
29
|
Bayless KM, Trautwein MD, Meusemann K, Shin S, Petersen M, Donath A, Podsiadlowski L, Mayer C, Niehuis O, Peters RS, Meier R, Kutty SN, Liu S, Zhou X, Misof B, Yeates DK, Wiegmann BM. Beyond Drosophila: resolving the rapid radiation of schizophoran flies with phylotranscriptomics. BMC Biol 2021; 19:23. [PMID: 33557827 PMCID: PMC7871583 DOI: 10.1186/s12915-020-00944-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 12/17/2020] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The most species-rich radiation of animal life in the 66 million years following the Cretaceous extinction event is that of schizophoran flies: a third of fly diversity including Drosophila fruit fly model organisms, house flies, forensic blow flies, agricultural pest flies, and many other well and poorly known true flies. Rapid diversification has hindered previous attempts to elucidate the phylogenetic relationships among major schizophoran clades. A robust phylogenetic hypothesis for the major lineages containing these 55,000 described species would be critical to understand the processes that contributed to the diversity of these flies. We use protein encoding sequence data from transcriptomes, including 3145 genes from 70 species, representing all superfamilies, to improve the resolution of this previously intractable phylogenetic challenge. RESULTS Our results support a paraphyletic acalyptrate grade including a monophyletic Calyptratae and the monophyly of half of the acalyptrate superfamilies. The primary branching framework of Schizophora is well supported for the first time, revealing the primarily parasitic Pipunculidae and Sciomyzoidea stat. rev. as successive sister groups to the remaining Schizophora. Ephydroidea, Drosophila's superfamily, is the sister group of Calyptratae. Sphaeroceroidea has modest support as the sister to all non-sciomyzoid Schizophora. We define two novel lineages corroborated by morphological traits, the 'Modified Oviscapt Clade' containing Tephritoidea, Nerioidea, and other families, and the 'Cleft Pedicel Clade' containing Calyptratae, Ephydroidea, and other families. Support values remain low among a challenging subset of lineages, including Diopsidae. The placement of these families remained uncertain in both concatenated maximum likelihood and multispecies coalescent approaches. Rogue taxon removal was effective in increasing support values compared with strategies that maximise gene coverage or minimise missing data. CONCLUSIONS Dividing most acalyptrate fly groups into four major lineages is supported consistently across analyses. Understanding the fundamental branching patterns of schizophoran flies provides a foundation for future comparative research on the genetics, ecology, and biocontrol.
Collapse
Affiliation(s)
- Keith M Bayless
- Australian National Insect Collection, CSIRO National Research Collections Australia (NRCA), Acton, Canberra, ACT, Australia.
- Department of Entomology, California Academy of Sciences, San Francisco, CA, USA.
- Department of Entomology & Plant Pathology, North Carolina State University, Raleigh, NC, USA.
| | - Michelle D Trautwein
- Department of Entomology, California Academy of Sciences, San Francisco, CA, USA
| | - Karen Meusemann
- Australian National Insect Collection, CSIRO National Research Collections Australia (NRCA), Acton, Canberra, ACT, Australia
- Centre for Molecular Biodiversity Research (ZMB), Zoologisches Forschungsmuseum Alexander Koenig (ZFMK), Bonn, Germany
- Department of Evolutionary Biology & Ecology, Institute of Biology I, Albert Ludwig University of Freiburg, Hauptstraße 1, Freiburg i. Br., Germany
| | - Seunggwan Shin
- Department of Entomology & Plant Pathology, North Carolina State University, Raleigh, NC, USA
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Malte Petersen
- Max-Planck-Institut of Immunobiology and Epigenetics, Freiburg, Germany
| | - Alexander Donath
- Centre for Molecular Biodiversity Research (ZMB), Zoologisches Forschungsmuseum Alexander Koenig (ZFMK), Bonn, Germany
| | - Lars Podsiadlowski
- Centre for Molecular Biodiversity Research (ZMB), Zoologisches Forschungsmuseum Alexander Koenig (ZFMK), Bonn, Germany
| | - Christoph Mayer
- Centre for Molecular Biodiversity Research (ZMB), Zoologisches Forschungsmuseum Alexander Koenig (ZFMK), Bonn, Germany
| | - Oliver Niehuis
- Department of Evolutionary Biology & Ecology, Institute of Biology I, Albert Ludwig University of Freiburg, Hauptstraße 1, Freiburg i. Br., Germany
| | - Ralph S Peters
- Centre of Taxonomy and Evolutionary Research, Arthropoda Department, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Rudolf Meier
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
- Lee Kong Chian Natural History Museum, National University of Singapore, Singapore, Singapore
| | - Sujatha Narayanan Kutty
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
- Tropical Marine Science Institute, National University of Singapore, Singapore, Singapore
| | - Shanlin Liu
- Department of Entomology, China Agricultural University, Beijing, People's Republic of China
| | - Xin Zhou
- Department of Entomology, China Agricultural University, Beijing, People's Republic of China
| | - Bernhard Misof
- Zoological Research Museum Alexander Koenig (ZFMK), Bonn, Germany
| | - David K Yeates
- Australian National Insect Collection, CSIRO National Research Collections Australia (NRCA), Acton, Canberra, ACT, Australia
| | - Brian M Wiegmann
- Department of Entomology & Plant Pathology, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
30
|
Warnow T, Mirarab S. Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP. Methods Mol Biol 2021; 2231:99-119. [PMID: 33289889 DOI: 10.1007/978-1-0716-1036-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The estimation of very large multiple sequence alignments is a challenging problem that requires special techniques in order to achieve high accuracy. Here we describe two software packages-PASTA and UPP-for constructing alignments on large and ultra-large datasets. Both methods have been able to produce highly accurate alignments on 1,000,000 sequences, and trees computed on these alignments are also highly accurate. PASTA provides the best tree accuracy when the input sequences are all full-length, but UPP provides improved accuracy compared to PASTA and other methods when the input contains a large number of fragmentary sequences. Both methods are available in open source form on GitHub.
Collapse
Affiliation(s)
- Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| | - Siavash Mirarab
- Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA, USA
| |
Collapse
|
31
|
Bossert S, Murray EA, Pauly A, Chernyshov K, Brady SG, Danforth BN. Gene Tree Estimation Error with Ultraconserved Elements: An Empirical Study on Pseudapis Bees. Syst Biol 2020; 70:803-821. [PMID: 33367855 DOI: 10.1093/sysbio/syaa097] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 11/18/2020] [Accepted: 12/02/2020] [Indexed: 11/12/2022] Open
Abstract
Summarizing individual gene trees to species phylogenies using two-step coalescent methods is now a standard strategy in the field of phylogenomics. However, practical implementations of summary methods suffer from gene tree estimation error, which is caused by various biological and analytical factors. Greatly understudied is the choice of gene tree inference method and downstream effects on species tree estimation for empirical data sets. To better understand the impact of this method choice on gene and species tree accuracy, we compare gene trees estimated through four widely used programs under different model-selection criteria: PhyloBayes, MrBayes, IQ-Tree, and RAxML. We study their performance in the phylogenomic framework of $>$800 ultraconserved elements from the bee subfamily Nomiinae (Halictidae). Our taxon sampling focuses on the genus Pseudapis, a distinct lineage with diverse morphological features, but contentious morphology-based taxonomic classifications and no molecular phylogenetic guidance. We approximate topological accuracy of gene trees by assessing their ability to recover two uncontroversial, monophyletic groups, and compare branch lengths of individual trees using the stemminess metric (the relative length of internal branches). We further examine different strategies of removing uninformative loci and the collapsing of weakly supported nodes into polytomies. We then summarize gene trees with ASTRAL and compare resulting species phylogenies, including comparisons to concatenation-based estimates. Gene trees obtained with the reversible jump model search in MrBayes were most concordant on average and all Bayesian methods yielded gene trees with better stemminess values. The only gene tree estimation approach whose ASTRAL summary trees consistently produced the most likely correct topology, however, was IQ-Tree with automated model designation (ModelFinder program). We discuss these findings and provide practical advice on gene tree estimation for summary methods. Lastly, we establish the first phylogeny-informed classification for Pseudapis s. l. and map the distribution of distinct morphological features of the group. [ASTRAL; Bees; concordance; gene tree estimation error; IQ-Tree; MrBayes, Nomiinae; PhyloBayes; RAxML; phylogenomics; stemminess].
Collapse
Affiliation(s)
- Silas Bossert
- Department of Entomology, Cornell University, Comstock Hall, Ithaca, NY 14853, USA.,Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.,Department of Entomology, Washington State University, Pullman, Washington 99164, USA
| | - Elizabeth A Murray
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.,Department of Entomology, Washington State University, Pullman, Washington 99164, USA
| | - Alain Pauly
- O.D. Taxonomy and Phylogeny, Royal Belgian Institute of Natural Sciences, Rue Vautier 29, 1000 Brussels, Belgium
| | - Kyrylo Chernyshov
- College of Arts and Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Seán G Brady
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - Bryan N Danforth
- Department of Entomology, Cornell University, Comstock Hall, Ithaca, NY 14853, USA
| |
Collapse
|
32
|
Gardner EM, Johnson MG, Pereira JT, Puad ASA, Arifiani D, Sahromi , Wickett NJ, Zerega NJC. Paralogs and off-target sequences improve phylogenetic resolution in a densely-sampled study of the breadfruit genus (Artocarpus, Moraceae). Syst Biol 2020; 70:syaa073. [PMID: 32970819 PMCID: PMC8048387 DOI: 10.1093/sysbio/syaa073] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 08/31/2020] [Accepted: 09/08/2020] [Indexed: 12/21/2022] Open
Abstract
We present a 517-gene phylogenetic framework for the breadfruit genus Artocarpus (ca. 70 spp., Moraceae), making use of silica-dried leaves from recent fieldwork and herbarium specimens (some up to 106 years old) to achieve 96% taxon sampling. We explore issues relating to assembly, paralogous loci, partitions, and analysis method to reconstruct a phylogeny that is robust to variation in data and available tools. While codon partitioning did not result in any substantial topological differences, the inclusion of flanking non-coding sequence in analyses significantly increased the resolution of gene trees. We also found that increasing the size of datasets increased convergence between analysis methods but did not reduce gene tree conflict. We optimized the HybPiper targeted-enrichment sequence assembly pipeline for short sequences derived from degraded DNA extracted from museum specimens. While the subgenera of Artocarpus were monophyletic, revision is required at finer scales, particularly with respect to widespread species. We expect our results to provide a basis for further studies in Artocarpus and provide guidelines for future analyses of datasets based on target enrichment data, particularly those using sequences from both fresh and museum material, counseling careful attention to the potential of off-target sequences to improve resolution.
Collapse
Affiliation(s)
- Elliot M Gardner
- Chicago Botanic Garden, Negaunee Institute for Plant Conservation Science and Action, 1000 Lake Cook Road, Glencoe, IL 60022, USA
- Northwestern University, Plant Biology and Conservation Program, 2205 Tech Dr., Evanston, IL 60208, USA
- The Morton Arboretum, 4100 IL-53, Lisle, IL 60532, USA
- Singapore Botanic Gardens, National Parks Board, 1 Cluny Road, 259569, Singapore
- Florida International University, Institute of Environment, 11200 SW 8th Street, OE 148 Miami, Florida 33199, USA
| | - Matthew G Johnson
- Chicago Botanic Garden, Negaunee Institute for Plant Conservation Science and Action, 1000 Lake Cook Road, Glencoe, IL 60022, USA
- Texas Tech University, Department of Biological Sciences, 2901 Main Street, Lubbock, TX 79409-3131, USA
| | - Joan T Pereira
- Forest Research Centre, Sabah Forestry Department, P.O. Box 1407, 90715 Sandakan, Sabah, Malaysia
| | - Aida Shafreena Ahmad Puad
- Faculty of Resource Science & Technology, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak 94300, Malaysia
| | - Deby Arifiani
- Herbarium Bogoriense, Research Center for Biology, Indonesian Institute of Sciences, Cibinong, Jawa Barat, Indonesia
| | - Sahromi
- Center for Plant Conservation Botanic Gardens, Indonesian Institute Of Sciences, Bogor, Jawa Barat, Indonesia Elliot M. Gardner and Matthew G. Johnson are co-first authors
| | - Norman J Wickett
- Chicago Botanic Garden, Negaunee Institute for Plant Conservation Science and Action, 1000 Lake Cook Road, Glencoe, IL 60022, USA
- Northwestern University, Plant Biology and Conservation Program, 2205 Tech Dr., Evanston, IL 60208, USA
| | - Nyree J C Zerega
- Chicago Botanic Garden, Negaunee Institute for Plant Conservation Science and Action, 1000 Lake Cook Road, Glencoe, IL 60022, USA
- Northwestern University, Plant Biology and Conservation Program, 2205 Tech Dr., Evanston, IL 60208, USA
| |
Collapse
|
33
|
Smirnov V, Warnow T. Phylogeny Estimation Given Sequence Length Heterogeneity. Syst Biol 2020; 70:268-282. [PMID: 32692823 PMCID: PMC7875441 DOI: 10.1093/sysbio/syaa058] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 07/14/2020] [Accepted: 07/15/2020] [Indexed: 12/21/2022] Open
Abstract
Phylogeny estimation is a major step in many biological studies, and has many well known challenges. With the dropping cost of sequencing technologies, biologists now have increasingly large datasets available for use in phylogeny estimation. Here we address the challenge of estimating a tree given large datasets with a combination of full-length sequences and fragmentary sequences, which can arise due to a variety of reasons, including sample collection, sequencing technologies, and analytical pipelines. We compare two basic approaches: (1) computing an alignment on the full dataset and then computing a maximum likelihood tree on the alignment, or (2) constructing an alignment and tree on the full length sequences and then using phylogenetic placement to add the remaining sequences (which will generally be fragmentary) into the tree. We explore these two approaches on a range of simulated datasets, each with 1000 sequences and varying in rates of evolution, and two biological datasets. Our study shows some striking performance differences between methods, especially when there is substantial sequence length heterogeneity and high rates of evolution. We find in particular that using UPP to align sequences and RAxML to compute a tree on the alignment provides the best accuracy, substantially outperforming trees computed using phylogenetic placement methods. We also find that FastTree has poor accuracy on alignments containing fragmentary sequences. Overall, our study provides insights into the literature comparing different methods and pipelines for phylogenetic estimation, and suggests directions for future method development. [Phylogeny estimation, sequence length heterogeneity, phylogenetic placement.]
Collapse
Affiliation(s)
- Vladimir Smirnov
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
34
|
Yin J, Zhang C, Mirarab S. ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization. Bioinformatics 2020; 35:3961-3969. [PMID: 30903685 DOI: 10.1093/bioinformatics/btz211] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 03/12/2019] [Accepted: 03/21/2019] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION Evolutionary histories can change from one part of the genome to another. The potential for discordance between the gene trees has motivated the development of summary methods that reconstruct a species tree from an input collection of gene trees. ASTRAL is a widely used summary method and has been able to scale to relatively large datasets. However, the size of genomic datasets is quickly growing. Despite its relative efficiency, the current single-threaded implementation of ASTRAL is falling behind the data growth trends is not able to analyze the largest available datasets in a reasonable time. RESULTS ASTRAL uses dynamic programing and is not trivially parallel. In this paper, we introduce ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps. Importantly, ASTRAL-MP can take advantage of not just multiple CPU cores but also one or several graphics processing units (GPUs). The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158× speedups compared to ASTRAL-III. Using GPUs and multiple cores, ASTRAL-MP is able to analyze datasets with 10 000 species or datasets with more than 100 000 genes in <2 days. AVAILABILITY AND IMPLEMENTATION ASTRAL-MP is available at https://github.com/smirarab/ASTRAL/tree/MP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- John Yin
- Department of Mathematics, University of California at San Diego, La Jolla, CA, USA
| | - Chao Zhang
- Bioinformatics and Systems Biology, University of California at San Diego, La Jolla, CA, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA, USA
| |
Collapse
|
35
|
Vasilikopoulos A, Misof B, Meusemann K, Lieberz D, Flouri T, Beutel RG, Niehuis O, Wappler T, Rust J, Peters RS, Donath A, Podsiadlowski L, Mayer C, Bartel D, Böhm A, Liu S, Kapli P, Greve C, Jepson JE, Liu X, Zhou X, Aspöck H, Aspöck U. An integrative phylogenomic approach to elucidate the evolutionary history and divergence times of Neuropterida (Insecta: Holometabola). BMC Evol Biol 2020; 20:64. [PMID: 32493355 PMCID: PMC7268685 DOI: 10.1186/s12862-020-01631-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 05/19/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The latest advancements in DNA sequencing technologies have facilitated the resolution of the phylogeny of insects, yet parts of the tree of Holometabola remain unresolved. The phylogeny of Neuropterida has been extensively studied, but no strong consensus exists concerning the phylogenetic relationships within the order Neuroptera. Here, we assembled a novel transcriptomic dataset to address previously unresolved issues in the phylogeny of Neuropterida and to infer divergence times within the group. We tested the robustness of our phylogenetic estimates by comparing summary coalescent and concatenation-based phylogenetic approaches and by employing different quartet-based measures of phylogenomic incongruence, combined with data permutations. RESULTS Our results suggest that the order Raphidioptera is sister to Neuroptera + Megaloptera. Coniopterygidae is inferred as sister to all remaining neuropteran families suggesting that larval cryptonephry could be a ground plan feature of Neuroptera. A clade that includes Nevrorthidae, Osmylidae, and Sisyridae (i.e. Osmyloidea) is inferred as sister to all other Neuroptera except Coniopterygidae, and Dilaridae is placed as sister to all remaining neuropteran families. Ithonidae is inferred as the sister group of monophyletic Myrmeleontiformia. The phylogenetic affinities of Chrysopidae and Hemerobiidae were dependent on the data type analyzed, and quartet-based analyses showed only weak support for the placement of Hemerobiidae as sister to Ithonidae + Myrmeleontiformia. Our molecular dating analyses suggest that most families of Neuropterida started to diversify in the Jurassic and our ancestral character state reconstructions suggest a primarily terrestrial environment of the larvae of Neuropterida and Neuroptera. CONCLUSION Our extensive phylogenomic analyses consolidate several key aspects in the backbone phylogeny of Neuropterida, such as the basal placement of Coniopterygidae within Neuroptera and the monophyly of Osmyloidea. Furthermore, they provide new insights into the timing of diversification of Neuropterida. Despite the vast amount of analyzed molecular data, we found that certain nodes in the tree of Neuroptera are not robustly resolved. Therefore, we emphasize the importance of integrating the results of morphological analyses with those of sequence-based phylogenomics. We also suggest that comparative analyses of genomic meta-characters should be incorporated into future phylogenomic studies of Neuropterida.
Collapse
Affiliation(s)
- Alexandros Vasilikopoulos
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany.
| | - Bernhard Misof
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany.
| | - Karen Meusemann
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
- Department of Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert-Ludwigs-Universität Freiburg, 79104, Freiburg, Germany
- Australian National Insect Collection, National Research Collections Australia, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, ACT 2601, Australia
| | - Doria Lieberz
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Rolf G Beutel
- Institut für Zoologie und Evolutionsforschung, Friedrich-Schiller-Universität Jena, 07743, Jena, Germany
| | - Oliver Niehuis
- Department of Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert-Ludwigs-Universität Freiburg, 79104, Freiburg, Germany
| | - Torsten Wappler
- Natural History Department, Hessisches Landesmuseum Darmstadt, 64283, Darmstadt, Germany
| | - Jes Rust
- Steinmann-Institut für Geologie, Mineralogie und Paläontologie, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115, Bonn, Germany
| | - Ralph S Peters
- Centre for Taxonomy and Evolutionary Research, Arthropoda Department, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
| | - Alexander Donath
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
| | - Lars Podsiadlowski
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
| | - Christoph Mayer
- Centre for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113, Bonn, Germany
| | - Daniela Bartel
- Department of Evolutionary Biology, University of Vienna, 1090, Vienna, Austria
| | - Alexander Böhm
- Department of Evolutionary Biology, University of Vienna, 1090, Vienna, Austria
| | - Shanlin Liu
- Department of Entomology, China Agricultural University, 100193, Beijing, People's Republic of China
| | - Paschalia Kapli
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Carola Greve
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), 60325, Frankfurt, Germany
| | - James E Jepson
- School of Biological, Earth and Environmental Sciences, University College Cork, Distillery Fields, North Mall, T23 N73K, Cork, Ireland
| | - Xingyue Liu
- Department of Entomology, China Agricultural University, 100193, Beijing, People's Republic of China
| | - Xin Zhou
- Department of Entomology, China Agricultural University, 100193, Beijing, People's Republic of China
| | - Horst Aspöck
- Institute of Specific Prophylaxis and Tropical Medicine, Medical Parasitology, Medical University of Vienna (MUW), 1090, Vienna, Austria
| | - Ulrike Aspöck
- Department of Evolutionary Biology, University of Vienna, 1090, Vienna, Austria
- Zoological Department II, Natural History Museum of Vienna, 1010, Vienna, Austria
| |
Collapse
|
36
|
Rabiee M, Mirarab S. INSTRAL: Discordance-Aware Phylogenetic Placement Using Quartet Scores. Syst Biol 2020; 69:384-391. [PMID: 31290974 DOI: 10.1093/sysbio/syz045] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Accepted: 07/02/2019] [Indexed: 11/13/2022] Open
Abstract
Phylogenomic analyses have increasingly adopted species tree reconstruction using methods that account for gene tree discordance using pipelines that require both human effort and computational resources. As the number of available genomes continues to increase, a new problem is facing researchers. Once more species become available, they have to repeat the whole process from the beginning because updating species trees is currently not possible. However, the de novo inference can be prohibitively costly in human effort or machine time. In this article, we introduce INSTRAL, a method that extends ASTRAL to enable phylogenetic placement. INSTRAL is designed to place a new species on an existing species tree after sequences from the new species have already been added to gene trees; thus, INSTRAL is complementary to existing placement methods that update gene trees. [ASTRAL; ILS; phylogenetic placement; species tree reconstruction.].
Collapse
Affiliation(s)
- Maryam Rabiee
- Department of Computer Science and Engineering, UC San Diego, La Jolla, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
37
|
Wong GKS, Soltis DE, Leebens-Mack J, Wickett NJ, Barker MS, Van de Peer Y, Graham SW, Melkonian M. Sequencing and Analyzing the Transcriptomes of a Thousand Species Across the Tree of Life for Green Plants. ANNUAL REVIEW OF PLANT BIOLOGY 2020; 71:741-765. [PMID: 31851546 DOI: 10.1146/annurev-arplant-042916-041040] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The 1,000 Plants (1KP) initiative was the first large-scale effort to collect next-generation sequencing (NGS) data across a phylogenetically representative sampling of species for a major clade of life, in this case theViridiplantae, or green plants. As an international multidisciplinary consortium, we focused on plant evolution and its practical implications. Among the major outcomes were the inference of a reference species tree for green plants by phylotranscriptomic analysis of low-copy genes, a survey of paleopolyploidy (whole-genome duplications) across the Viridiplantae, the inferred evolutionary histories for many gene families and biological processes, the discovery of novel light-sensitive proteins for optogenetic studies in mammalian neuroscience, and elucidation of the genetic network for a complex trait (C4 photosynthesis). Altogether, 1KP demonstrated how value can be extracted from a phylodiverse sequencing data set, providing a template for future projects that aim to generate even more data, including complete de novo genomes, across the tree of life.
Collapse
Affiliation(s)
- Gane Ka-Shu Wong
- Department of Biological Sciences and Department of Medicine, University of Alberta, Edmonton, Alberta T6G 2E9, Canada;
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Douglas E Soltis
- Florida Museum of Natural History, Gainesville, Florida 32611, USA
- Department of Biology, University of Florida, Gainesville, Florida 32611, USA
| | - Jim Leebens-Mack
- Department of Plant Biology, University of Georgia, Athens, Georgia 30602, USA
| | - Norman J Wickett
- Negaunee Institute for Plant Conservation Science and Action, Chicago Botanic Garden, Glencoe, Illinois 60022, USA
| | - Michael S Barker
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721, USA
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, VIB Center for Plant Systems Biology, Ghent University, 9052 Ghent, Belgium
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa
| | - Sean W Graham
- Department of Botany, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Michael Melkonian
- Faculty of Biology, University of Duisburg-Essen, D-45141 Essen, Germany
| |
Collapse
|
38
|
Abstract
Background Phylogeny estimation is an important part of much biological research, but large-scale tree estimation is infeasible using standard methods due to computational issues. Recently, an approach to large-scale phylogeny has been proposed that divides a set of species into disjoint subsets, computes trees on the subsets, and then merges the trees together using a computed matrix of pairwise distances between the species. The novel component of these approaches is the last step: Disjoint Tree Merger (DTM) methods. Results We present GTM (Guide Tree Merger), a polynomial time DTM method that adds edges to connect the subset trees, so as to provably minimize the topological distance to a computed guide tree. Thus, GTM performs unblended mergers, unlike the previous DTM methods. Yet, despite the potential limitation, our study shows that GTM has excellent accuracy, generally matching or improving on two previous DTMs, and is much faster than both. Conclusions The proposed GTM approach to the DTM problem is a useful new tool for large-scale phylogenomic analysis, and shows the surprising potential for unblended DTM methods.
Collapse
Affiliation(s)
- Vladimir Smirnov
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N Goodwin Ave, Urbana, 61801, IL, US
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N Goodwin Ave, Urbana, 61801, IL, US.
| |
Collapse
|
39
|
Murphy B, Forest F, Barraclough T, Rosindell J, Bellot S, Cowan R, Golos M, Jebb M, Cheek M. A phylogenomic analysis of Nepenthes (Nepenthaceae). Mol Phylogenet Evol 2020; 144:106668. [DOI: 10.1016/j.ympev.2019.106668] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 10/28/2019] [Accepted: 10/29/2019] [Indexed: 10/25/2022]
|
40
|
Springer MS, Molloy EK, Sloan DB, Simmons MP, Gatesy J. ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets. J Hered 2019; 111:147-168. [DOI: 10.1093/jhered/esz076] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 12/12/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the “no intralocus-recombination” assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.
Collapse
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA
| | - Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO
| | - Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY
| |
Collapse
|
41
|
Zhu Q, Mai U, Pfeiffer W, Janssen S, Asnicar F, Sanders JG, Belda-Ferre P, Al-Ghalith GA, Kopylova E, McDonald D, Kosciolek T, Yin JB, Huang S, Salam N, Jiao JY, Wu Z, Xu ZZ, Cantrell K, Yang Y, Sayyari E, Rabiee M, Morton JT, Podell S, Knights D, Li WJ, Huttenhower C, Segata N, Smarr L, Mirarab S, Knight R. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat Commun 2019; 10:5477. [PMID: 31792218 PMCID: PMC6889312 DOI: 10.1038/s41467-019-13443-4] [Citation(s) in RCA: 171] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Accepted: 11/06/2019] [Indexed: 11/10/2022] Open
Abstract
Rapid growth of genome data provides opportunities for updating microbial evolutionary relationships, but this is challenged by the discordant evolution of individual genes. Here we build a reference phylogeny of 10,575 evenly-sampled bacterial and archaeal genomes, based on a comprehensive set of 381 markers, using multiple strategies. Our trees indicate remarkably closer evolutionary proximity between Archaea and Bacteria than previous estimates that were limited to fewer "core" genes, such as the ribosomal proteins. The robustness of the results was tested with respect to several variables, including taxon and site sampling, amino acid substitution heterogeneity and saturation, non-vertical evolution, and the impact of exclusion of candidate phyla radiation (CPR) taxa. Our results provide an updated view of domain-level relationships.
Collapse
Affiliation(s)
- Qiyun Zhu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Uyen Mai
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Wayne Pfeiffer
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA
| | - Stefan Janssen
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Algorithmic Bioinformatics, Department of Biology and Chemistry, Justus Liebig University Gießen, Giessen, Germany
| | | | - Jon G Sanders
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Pedro Belda-Ferre
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Gabriel A Al-Ghalith
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Evguenia Kopylova
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Tomasz Kosciolek
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - John B Yin
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Mathematics, University of California San Diego, La Jolla, CA, USA
| | - Shi Huang
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Single-Cell Center, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
| | - Nimaichand Salam
- State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Jian-Yu Jiao
- State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zijun Wu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | - Zhenjiang Z Xu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Kalen Cantrell
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Yimeng Yang
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Erfan Sayyari
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Maryam Rabiee
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - James T Morton
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Sheila Podell
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Dan Knights
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Wen-Jun Li
- State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
| | - Larry Smarr
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
- California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, CA, USA
| | - Siavash Mirarab
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
42
|
Gatesy J, Sloan DB, Warren JM, Baker RH, Simmons MP, Springer MS. Partitioned coalescence support reveals biases in species-tree methods and detects gene trees that determine phylogenomic conflicts. Mol Phylogenet Evol 2019; 139:106539. [DOI: 10.1016/j.ympev.2019.106539] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 06/10/2019] [Accepted: 06/17/2019] [Indexed: 12/26/2022]
|
43
|
Abstract
Green plants (Viridiplantae) include around 450,000-500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida), including green plants (Viridiplantae), glaucophytes (Glaucophyta) and red algae (Rhodophyta). Our analysis provides a robust phylogenomic framework for examining the evolution of green plants. Most inferred species relationships are well supported across multiple species tree and supermatrix analyses, but discordance among plastid and nuclear gene trees at a few important nodes highlights the complexity of plant genome evolution, including polyploidy, periods of rapid speciation, and extinction. Incomplete sorting of ancestral variation, polyploidization and massive expansions of gene families punctuate the evolutionary history of green plants. Notably, we find that large expansions of gene families preceded the origins of green plants, land plants and vascular plants, whereas whole-genome duplications are inferred to have occurred repeatedly throughout the evolution of flowering plants and ferns. The increasing availability of high-quality plant genome sequences and advances in functional genomics are enabling research on genome evolution across the green tree of life.
Collapse
|
44
|
Ciezarek AG, Osborne OG, Shipley ON, Brooks EJ, Tracey SR, McAllister JD, Gardner LD, Sternberg MJE, Block B, Savolainen V. Phylotranscriptomic Insights into the Diversification of Endothermic Thunnus Tunas. Mol Biol Evol 2019; 36:84-96. [PMID: 30364966 PMCID: PMC6340463 DOI: 10.1093/molbev/msy198] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Birds, mammals, and certain fishes, including tunas, opahs and lamnid sharks, are endothermic, conserving internally generated, metabolic heat to maintain body or tissue temperatures above that of the environment. Bluefin tunas are commercially important fishes worldwide, and some populations are threatened. They are renowned for their endothermy, maintaining elevated temperatures of the oxidative locomotor muscle, viscera, brain and eyes, and occupying cold, productive high-latitude waters. Less cold-tolerant tunas, such as yellowfin tuna, by contrast, remain in warm-temperate to tropical waters year-round, reproducing more rapidly than most temperate bluefin tuna populations, providing resiliency in the face of large-scale industrial fisheries. Despite the importance of these traits to not only fisheries but also habitat utilization and responses to climate change, little is known of the genetic processes underlying the diversification of tunas. In collecting and analyzing sequence data across 29,556 genes, we found that parallel selection on standing genetic variation is associated with the evolution of endothermy in bluefin tunas. This includes two shared substitutions in genes encoding glycerol-3 phosphate dehydrogenase, an enzyme that contributes to thermogenesis in bumblebees and mammals, as well as four genes involved in the Krebs cycle, oxidative phosphorylation, β-oxidation, and superoxide removal. Using phylogenetic techniques, we further illustrate that the eight Thunnus species are genetically distinct, but found evidence of mitochondrial genome introgression across two species. Phylogeny-based metrics highlight conservation needs for some of these species.
Collapse
Affiliation(s)
- Adam G Ciezarek
- Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, United Kingdom
| | - Owen G Osborne
- Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, United Kingdom
| | - Oliver N Shipley
- Shark Research and Conservation Program, The Cape Eleuthera Institute, Rock Sound, Eleuthera, The Bahamas
- School of Marine and Atmospheric Science, Stony Brook University, Stony Brook, NY
| | - Edward J Brooks
- Shark Research and Conservation Program, The Cape Eleuthera Institute, Rock Sound, Eleuthera, The Bahamas
| | - Sean R Tracey
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart, TAS, Australia
| | - Jaime D McAllister
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart, TAS, Australia
| | - Luke D Gardner
- Department of Biology, Hopkins Marine Station, Stanford University, Pacific Grove, CA
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, Kensington, London, United Kingdom
| | - Barbara Block
- Department of Biology, Hopkins Marine Station, Stanford University, Pacific Grove, CA
| | - Vincent Savolainen
- Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, United Kingdom
- Corresponding author: E-mail:
| |
Collapse
|
45
|
Boutte J, Fishbein M, Liston A, Straub SCK. NGS-Indel Coder: A pipeline to code indel characters in phylogenomic data with an example of its application in milkweeds (Asclepias). Mol Phylogenet Evol 2019; 139:106534. [PMID: 31212081 DOI: 10.1016/j.ympev.2019.106534] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 05/12/2019] [Accepted: 06/13/2019] [Indexed: 12/30/2022]
Abstract
Targeted genome sequencing approaches allow characterization of evolutionary relationships using a considerable number of nuclear genes and informative characters. However, most phylogenomic analyses only utilize single nucleotide polymorphisms (SNPs). Studies at the species level, especially in groups that have recently radiated, often recover low amounts of phylogenetically informative variation in coding regions, and require non-coding sequences, which are richer in indels, to resolve gene trees. Here, NGS-Indel Coder, a pipeline to detect and omit false positive indels inferred from assemblies of short read sequence data, was developed to resolve the relationships among and within major clades of the American milkweeds (Asclepias), which are the result of a rapid and recent evolutionary radiation, and whose phylogeny has been difficult to resolve. This pipeline was applied to a Hyb-Seq data set of 768 loci including targeted exons and flanking intron regions from 33 milkweed species. Robust species tree inference was improved by excluding small alignment partitions (<100 bp) that increased gene tree ambiguity and incongruence. To further investigate the robustness of indel coding, data sets that included small and large indels were explored, and species trees derived from concatenated loci versus coalescent methods based on gene trees were compared. The phylogeny of Asclepias obtained using nuclear data was well resolved, and phylogenetic information from indels improved resolution of specific nodes. The Temperate North American, Mexican Highland, and Incarnatae clades were well supported as monophyletic. Asclepias coulteri, which has been considered part of the Sonoran Desert clade based on plastome analyses, was placed as sister to all the other milkweed species studied here, rather than as a member of that clade. Two groups within the Temperate North American and Mexican clades were not resolved, and the inferred relationships strongly conflicted when comparing results based on data sets that did or did not include indel characters. This new pipeline represents a step forward in making maximal use of the information content in phylogenomic data sets.
Collapse
Affiliation(s)
- Julien Boutte
- Department of Biology, Hobart and William Smith Colleges, Geneva, NY, USA
| | - Mark Fishbein
- Department of Plant Biology, Ecology and Evolution, Oklahoma State University, Stillwater, OK, USA
| | - Aaron Liston
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Shannon C K Straub
- Department of Biology, Hobart and William Smith Colleges, Geneva, NY, USA.
| |
Collapse
|
46
|
Vasilikopoulos A, Balke M, Beutel RG, Donath A, Podsiadlowski L, Pflug JM, Waterhouse RM, Meusemann K, Peters RS, Escalona HE, Mayer C, Liu S, Hendrich L, Alarie Y, Bilton DT, Jia F, Zhou X, Maddison DR, Niehuis O, Misof B. Phylogenomics of the superfamily Dytiscoidea (Coleoptera: Adephaga) with an evaluation of phylogenetic conflict and systematic error. Mol Phylogenet Evol 2019; 135:270-285. [DOI: 10.1016/j.ympev.2019.02.022] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 02/22/2019] [Accepted: 02/25/2019] [Indexed: 02/07/2023]
|
47
|
Piližota I, Train CM, Altenhoff A, Redestig H, Dessimoz C. Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome. Bioinformatics 2019; 35:1159-1166. [PMID: 30184069 PMCID: PMC6449756 DOI: 10.1093/bioinformatics/bty772] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Revised: 07/30/2018] [Accepted: 08/31/2018] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION As the time and cost of sequencing decrease, the number of available genomes and transcriptomes rapidly increases. Yet the quality of the assemblies and the gene annotations varies considerably and often remains poor, affecting downstream analyses. This is particularly true when fragments of the same gene are annotated as distinct genes, which may cause them to be mistaken as paralogs. RESULTS In this study, we introduce two novel phylogenetic tests to infer non-overlapping or partially overlapping genes that are in fact parts of the same gene. One approach collapses branches with low bootstrap support and the other computes a likelihood ratio test. We extensively validated these methods by (i) introducing and recovering fragmentation on the bread wheat, Triticum aestivum cv. Chinese Spring, chromosome 3B; (ii) by applying the methods to the low-quality 3B assembly and validating predictions against the high-quality 3B assembly; and (iii) by comparing the performance of the proposed methods to the performance of existing methods, namely Ensembl Compara and ESPRIT. Application of this combination to a draft shotgun assembly of the entire bread wheat genome revealed 1221 pairs of genes that are highly likely to be fragments of the same gene. Our approach demonstrates the power of fine-grained evolutionary inferences across multiple species to improving genome assemblies and annotations. AVAILABILITY AND IMPLEMENTATION An open source software tool is available at https://github.com/DessimozLab/esprit2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ivana Piližota
- Department of Genetics Evolution & Environment, University College London, UK.,Department of Computer Science, University College London, UK
| | - Clément-Marie Train
- Department of Computational Biology, Lausanne, Switzerland.,Center for Integrative Genomics University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Biophore Building, Lausanne, Switzerland
| | - Adrian Altenhoff
- Swiss Institute of Bioinformatics, Biophore Building, Lausanne, Switzerland
| | | | - Christophe Dessimoz
- Department of Genetics Evolution & Environment, University College London, UK.,Department of Computer Science, University College London, UK.,Department of Computational Biology, Lausanne, Switzerland.,Center for Integrative Genomics University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Biophore Building, Lausanne, Switzerland
| |
Collapse
|
48
|
Couvreur TLP, Helmstetter AJ, Koenen EJM, Bethune K, Brandão RD, Little SA, Sauquet H, Erkens RHJ. Phylogenomics of the Major Tropical Plant Family Annonaceae Using Targeted Enrichment of Nuclear Genes. FRONTIERS IN PLANT SCIENCE 2019; 9:1941. [PMID: 30687347 PMCID: PMC6334231 DOI: 10.3389/fpls.2018.01941] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 12/13/2018] [Indexed: 05/19/2023]
Abstract
Targeted enrichment and sequencing of hundreds of nuclear loci for phylogenetic reconstruction is becoming an important tool for plant systematics and evolution. Annonaceae is a major pantropical plant family with 110 genera and ca. 2,450 species, occurring across all major and minor tropical forests of the world. Baits were designed by sequencing the transcriptomes of five species from two of the largest Annonaceae subfamilies. Orthologous loci were identified. The resulting baiting kit was used to reconstruct phylogenetic relationships at two different levels using concatenated and gene tree approaches: a family wide Annonaceae analysis sampling 65 genera and a species level analysis of tribe Piptostigmateae sampling 29 species with multiple individuals per species. DNA extraction was undertaken mainly on silicagel dried leaves, with two samples from herbarium dried leaves. Our kit targets 469 exons (364,653 bp of sequence data), successfully capturing sequences from across Annonaceae. Silicagel dried and herbarium DNA worked equally well. We present for the first time a nuclear gene-based phylogenetic tree at the generic level based on 317 supercontigs. Results mainly confirm previous chloroplast based studies. However, several new relationships are found and discussed. We show significant differences in branch lengths between the two large subfamilies Annonoideae and Malmeoideae. A new tribe, Annickieae, is erected containing a single African genus Annickia. We also reconstructed a well-resolved species-level phylogenetic tree of the Piptostigmteae tribe. Our baiting kit is useful for reconstructing well-supported phylogenetic relationships within Annonaceae at different taxonomic levels. The nuclear genome is mainly concordant with plastome information with a few exceptions. Moreover, we find that substitution rate heterogeneity between the two subfamilies is also found within the nuclear compartment, and not just plastomes and ribosomal DNA as previously shown. Our results have implications for understanding the biogeography, molecular dating and evolution of Annonaceae.
Collapse
Affiliation(s)
| | | | - Erik J. M. Koenen
- Institute of Systematic Botany, University of Zurich, Zurich, Switzerland
| | - Kevin Bethune
- IRD, UMR DIADE, Univ. Montpellier, Montpellier, France
| | - Rita D. Brandão
- Maastricht Science Programme, Maastricht University, Maastricht, Netherlands
| | - Stefan A. Little
- Ecologie Systématique Evolution, Univ. Paris-Sud, CNRS, AgroParisTech, Université-Paris Saclay, Orsay, France
| | - Hervé Sauquet
- Ecologie Systématique Evolution, Univ. Paris-Sud, CNRS, AgroParisTech, Université-Paris Saclay, Orsay, France
- National Herbarium of New South Wales (NSW), Royal Botanic Gardens and Domain Trust, Sydney, NSW, Australia
| | - Roy H. J. Erkens
- Maastricht Science Programme, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
49
|
Villaverde T, Pokorny L, Olsson S, Rincón-Barrado M, Johnson MG, Gardner EM, Wickett NJ, Molero J, Riina R, Sanmartín I. Bridging the micro- and macroevolutionary levels in phylogenomics: Hyb-Seq solves relationships from populations to species and above. THE NEW PHYTOLOGIST 2018; 220:636-650. [PMID: 30016546 DOI: 10.1111/nph.15312] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 06/04/2018] [Indexed: 05/20/2023]
Abstract
Reconstructing phylogenetic relationships at the micro- and macroevoutionary levels within the same tree is problematic because of the need to use different data types and analytical frameworks. We test the power of target enrichment to provide phylogenetic resolution based on DNA sequences from above species to within populations, using a large herbarium sampling and Euphorbia balsamifera (Euphorbiaceae) as a case study. Target enrichment with custom probes was combined with genome skimming (Hyb-Seq) to sequence 431 low-copy nuclear genes and partial plastome DNA. We used supermatrix, multispecies-coalescent approaches, and Bayesian dating to estimate phylogenetic relationships and divergence times. Euphorbia balsamifera, with a disjunct Rand Flora-type distribution at opposite sides of Africa, comprises three well-supported subspecies: western Sahelian sepium is sister to eastern African-southern Arabian adenensis and Macaronesian-southwest Moroccan balsamifera. Lineage divergence times support Late Miocene to Pleistocene diversification and climate-driven vicariance to explain the Rand Flora pattern. We show that probes designed using genomic resources from taxa not directly related to the focal group are effective in providing phylogenetic resolution at deep and shallow evolutionary levels. Low capture efficiency in herbarium samples increased the proportion of missing data but did not bias estimation of phylogenetic relationships or branch lengths.
Collapse
Affiliation(s)
- Tamara Villaverde
- Real Jardín Botánico (RJB-CSIC), Plaza de Murillo 2, 28014, Madrid, Spain
| | - Lisa Pokorny
- Comparative Plant and Fungal Biology Department, Royal Botanic Gardens, Kew, Richmond, TW9 3DS, UK
| | - Sanna Olsson
- Department of Forest Ecology and Genetics, INIA Forest Research Centre (INIA-CIFOR), Ctra. de la Coruña km. 7.5, 28040, Madrid, Spain
| | | | - Matthew G Johnson
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX, 79409-43131, USA
- Department of Plant Science and Conservation, Chicago Botanical Garden, 1000 Lake Cook Road, Glencoe, IL, 60022, USA
| | | | - Norman J Wickett
- Department of Plant Science and Conservation, Chicago Botanical Garden, 1000 Lake Cook Road, Glencoe, IL, 60022, USA
- Program in Plant Biology and Conservation, Northwestern University, 2205 Tech Drive, Evanston, IL, 60208, USA
| | - Julià Molero
- Laboratori de Botànica, Departament de Biologia, Sanitat i Medi Ambient, Facultat de Farmàcia, Universitat de Barcelona, 08028, Barcelona, Spain
| | - Ricarda Riina
- Real Jardín Botánico (RJB-CSIC), Plaza de Murillo 2, 28014, Madrid, Spain
| | - Isabel Sanmartín
- Real Jardín Botánico (RJB-CSIC), Plaza de Murillo 2, 28014, Madrid, Spain
| |
Collapse
|
50
|
Abstract
BACKGROUND Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. RESULTS We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. CONCLUSIONS TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .
Collapse
Affiliation(s)
- Uyen Mai
- Computer Science and Engineering, University of California at San Diego, San Diego, 92093 CA USA
| | - Siavash Mirarab
- Electrical and Computer Engineering, University of California at San Diego, San Diego, 92093 CA USA
| |
Collapse
|