1
|
Jin Y, Du X, Jiang C, Ji W, Yang P. Disentangling sources of gene tree discordance for Hordeum species via target-enriched sequencing assays. Mol Phylogenet Evol 2024; 199:108160. [PMID: 39019201 DOI: 10.1016/j.ympev.2024.108160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Revised: 07/04/2024] [Accepted: 07/14/2024] [Indexed: 07/19/2024]
Abstract
Hordeum is an economically and evolutionarily important genus within the Triticeae tribe of the family Poaceae, and contains 33 widely distributed and diverse species which cytologically represent four subgenomes (H, Xa, Xu and I). These wild species (except Hordeum spontaneum, which is the primary gene pool of barley) are secondary or tertiary gene-pool germplasms for barley and wheat improvement, and uncovering their complicated evolutionary relationships would benefit for future breeding programs. Here, we developed a complexity-reduced pipeline via capturing genome-wide distributed fragments via two novel target-enriched assays (HorCap v1.0 and BarPlex v1.0) in conjugation with high-throughput sequencing of the enrichments. Both assays were tested for genotyping 40 species from three genera (Hordeum, Triticum, and Aegilops) containing 82 samples 67 accessions. Either of both assays worked efficiently in genotyping, while integration of both assays can significantly improve the robustness and resolution of the Hordeum phylogenetic trees. Interestingly, the incomplete lineage sorting (ILS) was inferred for the first time as the major factor causing phylogenetic discordance among the four subgenomes, whereas in New World species (carrying I genome) post-speciation introgression events were revealed. Through revising the evolutionary relationships of the Hordeum species based on an ancestral state reconstruction for the diploids and parental donor inference for the polyploids, our results raised new queries about the Hordeum phylogeny. Moreover, both newly-developed assays are applicable in genotyping and phylogenetic analysis of Hordeum and other Triticeae wild species.
Collapse
Affiliation(s)
- Yanlong Jin
- State Key Laboratory of Crop Gene Resources and Breeding, Key Laboratory of Grain Crop Genetic Resources Evaluation and Utilization (MARA), Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest AandF University, Yangling 712100, China
| | - Xin Du
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest AandF University, Yangling 712100, China
| | - Congcong Jiang
- State Key Laboratory of Crop Gene Resources and Breeding, Key Laboratory of Grain Crop Genetic Resources Evaluation and Utilization (MARA), Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Wanquan Ji
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Agronomy, Northwest AandF University, Yangling 712100, China
| | - Ping Yang
- State Key Laboratory of Crop Gene Resources and Breeding, Key Laboratory of Grain Crop Genetic Resources Evaluation and Utilization (MARA), Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| |
Collapse
|
2
|
Springer MS, Gatesy J. A new phylogeny for Aves is compromised by pervasive misalignment and homology problems. Proc Natl Acad Sci U S A 2024; 121:e2406494121. [PMID: 38976728 PMCID: PMC11260159 DOI: 10.1073/pnas.2406494121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024] Open
Affiliation(s)
- Mark S. Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA92521
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY10024
| |
Collapse
|
3
|
Gupta A, Mirarab S, Turakhia Y. Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596098. [PMID: 38854139 PMCID: PMC11160643 DOI: 10.1101/2024.05.27.596098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Inference of species trees plays a crucial role in advancing our understanding of evolutionary relationships and has immense significance for diverse biological and medical applications. Extensive genome sequencing efforts are currently in progress across a broad spectrum of life forms, holding the potential to unravel the intricate branching patterns within the tree of life. However, estimating species trees starting from raw genome sequences is quite challenging, and the current cutting-edge methodologies require a series of error-prone steps that are neither entirely automated nor standardized. In this paper, we present ROADIES, a novel pipeline for species tree inference from raw genome assemblies that is fully automated, easy to use, scalable, free from reference bias, and provides flexibility to adjust the tradeoff between accuracy and runtime. The ROADIES pipeline eliminates the need to align whole genomes, choose a single reference species, or pre-select loci such as functional genes found using cumbersome annotation steps. Moreover, it leverages recent advances in phylogenetic inference to allow multi-copy genes, eliminating the need to detect orthology. Using the genomic datasets released from large-scale sequencing consortia across three diverse life forms (placental mammals, pomace flies, and birds), we show that ROADIES infers species trees that are comparable in quality with the state-of-the-art approaches but in a fraction of the time. By incorporating optimal approaches and automating all steps from assembled genomes to species and gene trees, ROADIES is poised to improve the accuracy, scalability, and reproducibility of phylogenomic analyses.
Collapse
Affiliation(s)
- Anshu Gupta
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| |
Collapse
|
4
|
Lähteenaro M, Benda D, Straka J, Nylander JAA, Bergsten J. Phylogenomic analysis of Stylops reveals the evolutionary history of a Holarctic Strepsiptera radiation parasitizing wild bees. Mol Phylogenet Evol 2024; 195:108068. [PMID: 38554985 DOI: 10.1016/j.ympev.2024.108068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 03/07/2024] [Accepted: 03/24/2024] [Indexed: 04/02/2024]
Abstract
Holarctic Stylops is the largest genus of the enigmatic insect order Strepsiptera, twisted winged parasites. Members of Stylops are obligate endoparasites of Andrena mining bees and exhibit extreme sexual dimorphism typical of Strepsiptera. So far, molecular studies on Stylops have focused on questions on species delimitation. Here, we utilize the power of whole genome sequencing to infer the phylogeny of this morphologically challenging genus from thousands of loci. We use a species tree method, concatenated maximum likelihood analysis and Bayesian analysis with a relaxed clock model to reconstruct the phylogeny of 46 Stylops species, estimate divergence times, evaluate topological consistency across methods and infer the root position. Furthermore, the biogeographical history and coevolutionary patterns with host species are assessed. All methods recovered a well resolved topology with close to all nodes maximally supported and only a handful of minor topological variations. Based on the result, we find that included species can be divided into 12 species groups, seven of them including only Palaearctic species, three Nearctic and two were geographically mixed. We find a strongly supported root position between a clade formed by the spreta, thwaitesi and gwynanae species groups and the remaining species and that the sister group of Stylops is Eurystylops or Eurystylops + Kinzelbachus. Our results indicate that Stylops originated in the Western Palaearctic or Western Palaearctic and Nearctic in the early Neogene or late Paleogene, with four independent dispersal events to the Nearctic. Cophylogenetic analyses indicate that the diversification of Stylops has been shaped by both significant coevolution with the mining bee hosts and host-shifting. The well resolved and strongly supported phylogeny will provide a valuable phylogenetic basis for further studies into the fascinating world of Strepsipterans.
Collapse
Affiliation(s)
- Meri Lähteenaro
- Department of Zoology, Swedish Museum of Natural History, P. O. Box 50007, SE-104 05 Stockholm, Sweden; Department of Zoology, Faculty of Science, Stockholm University, SE-106 91 Stockholm, Sweden.
| | - Daniel Benda
- Department of Zoology, Faculty of Science, Charles University, Vinicna 7, CZ-128 44, Prague 2, Czech Republic; Department of Entomology, National Museum of the Czech Republic, Cirkusová 1740, CZ-19300 Prague 9, Czech Republic.
| | - Jakub Straka
- Department of Zoology, Faculty of Science, Charles University, Vinicna 7, CZ-128 44, Prague 2, Czech Republic.
| | - Johan A A Nylander
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, P.O. Box 50007, SE-106 91 Stockholm, Sweden.
| | - Johannes Bergsten
- Department of Zoology, Swedish Museum of Natural History, P. O. Box 50007, SE-104 05 Stockholm, Sweden; Department of Zoology, Faculty of Science, Stockholm University, SE-106 91 Stockholm, Sweden.
| |
Collapse
|
5
|
Wicke K, Haque MR, Kubatko L. Implications of gene tree heterogeneity on downstream phylogenetic analyses: A case study employing the Fair Proportion index. PLoS One 2024; 19:e0300900. [PMID: 38662751 PMCID: PMC11045071 DOI: 10.1371/journal.pone.0300900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 03/01/2024] [Indexed: 04/28/2024] Open
Abstract
Many questions in evolutionary biology require the specification of a phylogeny for downstream phylogenetic analyses. However, with the increasingly widespread availability of genomic data, phylogenetic studies are often confronted with conflicting signal in the form of genomic heterogeneity and incongruence between gene trees and the species tree. This raises the question of determining what data and phylogeny should be used in downstream analyses, and to what extent the choice of phylogeny (e.g., gene trees versus species trees) impacts the analyses and their outcomes. In this paper, we study this question in the realm of phylogenetic diversity indices, which provide ways to prioritize species for conservation based on their relative evolutionary isolation on a phylogeny, and are thus one example of downstream phylogenetic analyses. We use the Fair Proportion (FP) index, also known as the evolutionary distinctiveness score, and explore the variability in species rankings based on gene trees as compared to the species tree for several empirical data sets. Our results indicate that prioritization rankings among species vary greatly depending on the underlying phylogeny, suggesting that the choice of phylogeny is a major influence in assessing phylogenetic diversity in a conservation setting. While we use phylogenetic diversity conservation as an example, we suspect that other types of downstream phylogenetic analyses such as ancestral state reconstruction are similarly affected by genomic heterogeneity and incongruence. Our aim is thus to raise awareness of this issue and inspire new research on which evolutionary information (species trees, gene trees, or a combination of both) should form the basis for analyses in these settings.
Collapse
Affiliation(s)
- Kristina Wicke
- Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, NJ, United States of America
| | - Md. Rejuan Haque
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, United States of America
| | - Laura Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, United States of America
- Department of Statistics, The Ohio State University, Columbus, OH, United States of America
| |
Collapse
|
6
|
Dietz L, Mayer C, Stolle E, Eberle J, Misof B, Podsiadlowski L, Niehuis O, Ahrens D. Metazoa-level USCOs as markers in species delimitation and classification. Mol Ecol Resour 2024; 24:e13921. [PMID: 38146909 DOI: 10.1111/1755-0998.13921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 12/06/2023] [Accepted: 12/13/2023] [Indexed: 12/27/2023]
Abstract
Metazoa-level universal single-copy orthologs (mzl-USCOs) are universally applicable markers for DNA taxonomy in animals that can replace or supplement single-gene barcodes. Previously, mzl-USCOs from target enrichment data were shown to reliably distinguish species. Here, we tested whether USCOs are an evenly distributed, representative sample of a given metazoan genome and therefore able to cope with past hybridization events and incomplete lineage sorting. This is relevant for coalescent-based species delimitation approaches, which critically depend on the assumption that the investigated loci do not exhibit autocorrelation due to physical linkage. Based on 239 chromosome-level assembled genomes, we confirmed that mzl-USCOs are genetically unlinked for practical purposes and a representative sample of a genome in terms of reciprocal distances between USCOs on a chromosome and of distribution across chromosomes. We tested the suitability of mzl-USCOs extracted from genomes for species delimitation and phylogeny in four case studies: Anopheles mosquitos, Drosophila fruit flies, Heliconius butterflies and Darwin's finches. In almost all instances, USCOs allowed delineating species and yielded phylogenies that corresponded to those generated from whole genome data. Our phylogenetic analyses demonstrate that USCOs may complement single-gene DNA barcodes and provide more accurate taxonomic inferences. Combining USCOs from sources that used different versions of ortholog reference libraries to infer marker orthology may be challenging and, at times, impact taxonomic conclusions. However, we expect this problem to become less severe as the rapidly growing number of reference genomes provides a better representation of the number and diversity of organismal lineages.
Collapse
Affiliation(s)
- Lars Dietz
- Museum A. Koenig, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany
| | - Christoph Mayer
- Museum A. Koenig, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany
| | - Eckart Stolle
- Museum A. Koenig, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany
| | - Jonas Eberle
- Museum A. Koenig, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany
- Paris-Lodron-University, Salzburg, Austria
| | - Bernhard Misof
- Museum A. Koenig, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Lars Podsiadlowski
- Museum A. Koenig, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany
| | - Oliver Niehuis
- Abt. Evolutionsbiologie und Ökologie, Institut für Biologie I, Albert-Ludwigs-Universität Freiburg, Freiburg, Germany
| | - Dirk Ahrens
- Museum A. Koenig, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, Germany
| |
Collapse
|
7
|
Li X, Breinholt JW, Martinez JI, Keegan K, Ellis EA, Homziak NT, Zwick A, Storer CG, McKenna D, Kawahara AY. Large-scale genomic data reveal the phylogeny and evolution of owlet moths (Noctuoidea). Cladistics 2024; 40:21-33. [PMID: 37787424 DOI: 10.1111/cla.12559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 08/24/2023] [Accepted: 08/28/2023] [Indexed: 10/04/2023] Open
Abstract
The owlet moths (Noctuoidea; ~43-45K described species) are one of the most ecologically diverse and speciose superfamilies of animals. Moreover, they comprise some of the world's most notorious pests of agriculture and forestry. Despite their contributions to terrestrial biodiversity and impacts on ecosystems and economies, the evolutionary history of Noctuoidea remains unclear because the superfamily lacks a statistically robust phylogenetic and temporal framework. We reconstructed the phylogeny of Noctuoidea using data from 1234 genes (946.4 kb nucleotides) obtained from the genome and transcriptome sequences of 76 species. The relationships among the six families of Noctuoidea were well resolved and consistently recovered based on both concatenation and gene coalescence approaches, supporting the following relationships: Oenosandridae + (Notodontidae + (Erebidae + (Nolidae + (Euteliidae + Noctuidae)))). A Yule tree prior with three unlinked molecular clocks was identified as the preferred BEAST analysis using marginal-likelihood estimations. The crown age of Noctuoidea was estimated at 74.5 Ma, with most families originating before the end of the Paleogene (23 Ma). Our study provides the first statistically robust phylogenetic and temporal framework for Noctuoidea, including all families of owlet moths, based on large-scale genomic data.
Collapse
Affiliation(s)
- Xuankun Li
- Department of Entomology, College of Plant Protection, China Agricultural University, Beijing, 100193, China
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
- Department of Biological Sciences, University of Memphis, Memphis, TN, 38152, USA
- Center for Biodiversity Research, University of Memphis, Memphis, TN, 38152, USA
| | - Jesse W Breinholt
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
- Precision Genomics, Intermountain Healthcare, St George, UT, 84790, USA
| | - Jose I Martinez
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
- Entomology and Nematology Department, University of Florida, Gainesville, FL, 32608, USA
| | - Kevin Keegan
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, 06268, USA
- Section of Invertebrate Zoology, Carnegie Museum of Natural History, 4400 Forbes Ave, Pittsburgh, PA, 15213-4080, USA
| | - Emily A Ellis
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
| | - Nicholas T Homziak
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
| | - Andreas Zwick
- Australian National Insect Collection, CSIRO National Research Collections Australia, Canberra, ACT, 2601, Australia
| | - Caroline G Storer
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
| | - Duane McKenna
- Department of Biological Sciences, University of Memphis, Memphis, TN, 38152, USA
- Center for Biodiversity Research, University of Memphis, Memphis, TN, 38152, USA
| | - Akito Y Kawahara
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
- Entomology and Nematology Department, University of Florida, Gainesville, FL, 32608, USA
| |
Collapse
|
8
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
9
|
Steenwyk JL, Li Y, Zhou X, Shen XX, Rokas A. Incongruence in the phylogenomics era. Nat Rev Genet 2023; 24:834-850. [PMID: 37369847 DOI: 10.1038/s41576-023-00620-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2023] [Indexed: 06/29/2023]
Abstract
Genome-scale data and the development of novel statistical phylogenetic approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved many of its branches. However, incongruence - the inference of conflicting evolutionary histories - remains pervasive in phylogenomic data, hampering our ability to reconstruct and interpret the tree of life. Biological factors, such as incomplete lineage sorting, horizontal gene transfer, hybridization, introgression, recombination and convergent molecular evolution, can lead to gene phylogenies that differ from the species tree. In addition, analytical factors, including stochastic, systematic and treatment errors, can drive incongruence. Here, we review these factors, discuss methodological advances to identify and handle incongruence, and highlight avenues for future research.
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Howards Hughes Medical Institute and the Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA
| | - Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Xiaofan Zhou
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, China
| | - Xing-Xing Shen
- Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA.
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN, USA.
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
| |
Collapse
|
10
|
Han Y, Molloy EK. Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model. Algorithms Mol Biol 2023; 18:19. [PMID: 38041123 PMCID: PMC10691101 DOI: 10.1186/s13015-023-00248-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 11/19/2023] [Indexed: 12/03/2023] Open
Abstract
Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics.
Collapse
Affiliation(s)
- Yunheng Han
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA.
- University of Maryland Institute for Advanced Computer Studies, College Park, MD, USA.
| |
Collapse
|
11
|
Simmons MP, Goloboff PA, Stöver BC, Springer MS, Gatesy J. Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses. Cladistics 2023; 39:418-436. [PMID: 37096985 DOI: 10.1111/cla.12540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/24/2023] [Indexed: 04/26/2023] Open
Abstract
Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Pablo A Goloboff
- CONICET, INSUE, Fundación Miguel Lillo, Miguel Lillo 251, 4000, S.M. de Tucumán, Argentina
| | - Ben C Stöver
- Institute for Evolution and Biodiversity, WMU Münster, 48149, Münster, Germany
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
12
|
DeSalle R, Narechania A, Tessler M. Multiple Outgroups Can Cause Random Rooting in Phylogenomics. Mol Phylogenet Evol 2023; 184:107806. [PMID: 37172862 DOI: 10.1016/j.ympev.2023.107806] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 02/06/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Outgroup selection has been a major challenge since the rise of phylogenetics, and it has remained so in the phylogenomic era. Our goal here is to use large phylogenomic animal datasets to examine the impact of outgroup selection on the final topology. The results of our analyses further solidify the fact that distant outgroups can cause random rooting, and that this holds for concatenated and coalescent-based methods. The results also indicate that the standard practice of using multiple outgroups often causes random rooting. Most researchers go out of their way to get multiple outgroups, as this has been standard practice for decades. Based on our findings, this practice should stop. Instead, our results suggest that a single (most closely) related relative should be selected as the outgroup, unless all outgroups are roughly equally closely related to the ingroup.
Collapse
Affiliation(s)
- Rob DeSalle
- Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA; Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Apurva Narechania
- Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| | - Michael Tessler
- Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA; Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA; St. Francis College, Department of Biology, Brooklyn, NY 11201, USA
| |
Collapse
|
13
|
Yi H, Dong S, Yang L, Wang J, Kidner C, Kang M. Genome-wide data reveal cryptic diversity and hybridization in a group of tree ferns. Mol Phylogenet Evol 2023; 184:107801. [PMID: 37088242 DOI: 10.1016/j.ympev.2023.107801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 04/07/2023] [Accepted: 04/18/2023] [Indexed: 04/25/2023]
Abstract
Discovery of cryptic diversity is essential to understanding both the process of speciation and the conservation of species. Determining species boundaries in fern lineages represents a major challenge due to lack of morphologically diagnostic characters and frequent hybridization. Genomic data has substantially enhanced our understanding of the speciation process, increased the resolution of species delimitation studies, and led to the discovery of cryptic diversity. Here, we employed restriction-site-associated DNA sequencing (RAD-seq) and integrated phylogenomic and population genomic analyses to investigate phylogenetic relationships and evolutionary history of 16 tree ferns with marginate scales (Cyatheaceae) from China and Vietnam. We conducted multiple species delimitation analyses using the multispecies coalescent (MSC) model and novel approaches based on genealogical divergence index (gdi) and isolation by distance (IBD). In addition, we inferred species trees using concatenation and several coalescent-based methods, and assessed hybridization patterns and rate of gene flow across the phylogeny. We obtained highly supported and generally congruent phylogenies inferred from concatenated and summary-coalescent methods, and the monophyly of all currently recognized species were strongly supported. Our results revealed substantial evidence of cryptic diversity in three widely distributed Gymnosphaera species, each of which was composite of two highly structure lineages that may correspond to cryptic species. We found that hybridization was fairly common between not only closely related species, but also distantly related species. Collectively, it appears that scaly tree ferns may contain cryptic diversity and hybridization has played an important role throughout the evolutionary history of this group.
Collapse
Affiliation(s)
- Huiqin Yi
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China; South China National Botanical Garden, Guangzhou 510650, China
| | - Shiying Dong
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China; South China National Botanical Garden, Guangzhou 510650, China
| | - Lihua Yang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China; South China National Botanical Garden, Guangzhou 510650, China
| | - Jing Wang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China; South China National Botanical Garden, Guangzhou 510650, China
| | - Catherine Kidner
- Institute of Molecular Plant Sciences, University of Edinburgh, Daniel Rutherford Building Max Born Crescent, The King's Buildings, Edinburgh EH9 3BF, UK; Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh EH3 5LR, UK
| | - Ming Kang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China; South China National Botanical Garden, Guangzhou 510650, China.
| |
Collapse
|
14
|
Yan Z, Ogilvie HA, Nakhleh L. Comparing inference under the multispecies coalescent with and without recombination. Mol Phylogenet Evol 2023; 181:107724. [PMID: 36720421 DOI: 10.1016/j.ympev.2023.107724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 01/17/2023] [Accepted: 01/24/2023] [Indexed: 01/29/2023]
Abstract
Accurate inference of population parameters plays a pivotal role in unravelling evolutionary histories. While recombination has been universally accepted as a fundamental process in the evolution of sexually reproducing organisms, it remains challenging to model it exactly. Thus, existing coalescent-based approaches make different assumptions or approximations to facilitate phylogenetic inference, which can potentially bring about biases in estimates of evolutionary parameters when recombination is present. In this article, we evaluate the performance of population parameter estimation using three methods-StarBEAST2, SNAPP, and diCal2-that represent three different types of inference. We performed whole-genome simulations in which recombination rates, mutation rates, and levels of incomplete lineage sorting were varied. We show that StarBEAST2 using short or medium-sized loci is robust to realistic rates of recombination, which is in agreement with previous studies. SNAPP, as expected, is generally unaffected by recombination events. Most surprisingly, diCal2, a method that is designed to explicitly account for recombination, performs considerably worse than other methods under comparison.
Collapse
Affiliation(s)
- Zhi Yan
- Department of Computer Science, Rice University, 6100 Main Street, Houston 77005, TX, USA.
| | - Huw A Ogilvie
- Department of Computer Science, Rice University, 6100 Main Street, Houston 77005, TX, USA.
| | - Luay Nakhleh
- Department of Computer Science, Rice University, 6100 Main Street, Houston 77005, TX, USA.
| |
Collapse
|
15
|
Wilson D, Rogers JD. Evaluating Compression-Based Phylogeny Estimation in the Presence of Incomplete Lineage Sorting. J Comput Biol 2023; 30:250-260. [PMID: 36848254 DOI: 10.1089/cmb.2022.0197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Abstract
This study assesses characteristics of the normalized compression distance (NCD) technique for building phylogenetic trees from molecular data. We examined results from a mammalian biological data set as well as a collection of simulated data with varying levels of incomplete lineage sorting. The implementation of NCD we analyze is a concatenation-based, distance-based, alignment-free, and model-free phylogeny estimation method, which takes concatenated unaligned sequence data as input and outputs a matrix of distances. We compare the NCD phylogeny estimation method with various other methods, including coalescent- and concatenation-based methods.
Collapse
Affiliation(s)
- Deangelo Wilson
- School of Computing, DePaul University, Chicago, Illinois, USA
| | - John D Rogers
- School of Computing, DePaul University, Chicago, Illinois, USA
| |
Collapse
|
16
|
Cognato AI, Taft W, Osborn RK, Rubinoff D. Multi-gene phylogeny of North American clear-winged moths (Lepidoptera: Sesiidae): a foundation for future evolutionary study of a speciose mimicry complex. Cladistics 2023; 39:1-17. [PMID: 35944148 DOI: 10.1111/cla.12515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 06/06/2022] [Accepted: 07/11/2022] [Indexed: 01/13/2023] Open
Abstract
Sesiids are a diverse group of predominantly diurnal moths, many of which are Batesian mimics of Hymenoptera. However, their diversity and relationships are poorly understood. A multi-gene phylogenetic analysis of 48 North American sesiid species confirmed the traditional taxonomic tribal ranks, demonstrated the paraphyly of Carmenta and Synanthedon with respect to several other genera and ultimately provided minimal phylogenetic resolution within and between North American and European groups. Character support from each gene suggested inconsistency between the phylogenetic signal of the CAD gene and that of the other four genes. However, removal of CAD from subsequent phylogenetic analyses did not substantially change the initial phylogenetic results or return Carmenta and Synanthedon as reciprocally monophyletic, suggesting that it was not impacting the overall phylogenetic signal. The lack of resolution using genes that are typically informative at the species level for other lepidopterans suggests a surprisingly rapid radiation of species in Carmenta/Synanthedon. This group also exhibits a wide range of mimicry strategies and hostplant usage, which could be fertile ground for future study.
Collapse
Affiliation(s)
- Anthony I Cognato
- Department of Entomology, Michigan State University, 288 Farm Lane, room 243, East Lansing, Michigan, 48824, USA
| | - William Taft
- Department of Entomology, Michigan State University, 288 Farm Lane, room 243, East Lansing, Michigan, 48824, USA
| | - Rachel K Osborn
- Department of Entomology, Michigan State University, 288 Farm Lane, room 243, East Lansing, Michigan, 48824, USA
| | - Daniel Rubinoff
- Department of Plant and Environmental Protection Sciences, University of Hawaii, 310 Gilmore Hall, 3050 Maile Way, Honolulu, Hawaii, 96822, USA
| |
Collapse
|
17
|
On the effects of selection and mutation on species tree inference. Mol Phylogenet Evol 2023; 179:107650. [PMID: 36441104 DOI: 10.1016/j.ympev.2022.107650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 10/17/2022] [Accepted: 10/18/2022] [Indexed: 11/24/2022]
Abstract
The effect of selection acting on regions of the genome on the accuracy of species-level phylogenetic inference using methods that do not explicitly model selection is an open question that is relevant to most, if not all, phylogenomic studies. To address this, we derive a mathematical approximation to the Wright-Fisher model with mutation and selection in the limit as the population size becomes large. In contrast to previous approximations based on diffusion processes, our approximation can be used to study the distribution of coalescent times for an arbitrary number of lineages, allowing calculation of the probability distribution of gene genealogies under the coalescent model. We use these calculations to show that direct selection at strengths typically encountered in practice has only a small effect on the distribution of coalescent times, and hence on the distribution of gene trees. This implies that many coalescent-based methods for estimating the species tree topology will be robust to the presence of selection in a subset of the underlying genes. Selection will, however, bias the estimation of speciation times, causing them to underestimate the true speciation times. Our model captures the effects of selection on the genealogies that generate the observed sequence data, but does not model selective pressures that act only on the subsequent sequences or that negatively impact gene tree estimation.
Collapse
|
18
|
Wicke K, Fischer M, Kubatko L. Effects of discordance between species and gene trees on phylogenetic diversity conservation. J Math Biol 2022; 86:13. [PMID: 36482146 DOI: 10.1007/s00285-022-01845-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 11/16/2022] [Accepted: 11/22/2022] [Indexed: 12/13/2022]
Abstract
Phylogenetic diversity indices such as the Fair Proportion (FP) index are frequently discussed as prioritization criteria in biodiversity conservation. They rank species according to their contribution to overall diversity by taking into account the unique and shared evolutionary history of each species as indicated by its placement in an underlying phylogenetic tree. Traditionally, phylogenetic trees were inferred from single genes and the resulting gene trees were assumed to be a valid estimate for the species tree, i.e., the "true" evolutionary history of the species under consideration. However, nowadays it is common to sequence whole genomes of hundreds or thousands of genes, and it is often the case that conflicting genealogical histories exist in different genes throughout the genome, resulting in discordance between individual gene trees and the species tree. Here, we analyze the effects of gene and species tree discordance on prioritization decisions based on the FP index. In particular, we consider the ranking order of taxa induced by (i) The FP index on a species tree, and (ii) The expected FP index across all gene tree histories associated with the species tree. On the one hand, we show that for particular tree shapes, the two rankings always coincide. On the other hand, we show that for all leaf numbers greater than or equal to five, there exist species trees for which the two rankings differ. Finally, we illustrate the variability in the rankings obtained from the FP index across different gene tree and species tree estimates for an empirical multilocus mammal data set.
Collapse
Affiliation(s)
- Kristina Wicke
- Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, NJ, USA. .,Department of Mathematics, The Ohio State University, Columbus, OH, USA.
| | - Mareike Fischer
- Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, Germany
| | - Laura Kubatko
- Department of Statistics, Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
19
|
Liu Y, Qin A, Wang Y, Nie W, Tan C, An S, Wang J, Chang E, Jiang Z, Jia Z. Interspecific Gene Flow and Selective Sweeps in Picea wilsonii, P. neoveitchii and P. likiangensis. PLANTS (BASEL, SWITZERLAND) 2022; 11:2993. [PMID: 36365446 PMCID: PMC9658573 DOI: 10.3390/plants11212993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/31/2022] [Accepted: 11/03/2022] [Indexed: 06/16/2023]
Abstract
Genome-wide single nucleotide polymorphism (SNP) markers were obtained by genotyping-by-sequencing (GBS) technology to study the genetic relationships, population structure, gene flow and selective sweeps during species differentiation of Picea wilsonii, P. neoveitchii and P. likiangensis from a genome-wide perspective. We used P. jezoensis and P. pungens as outgroups, and three evolutionary branches were obtained: P. likiangensis was located on one branch, two P. wilsonii populations were grouped onto a second branch, and two P. neoveitchii populations were grouped onto a third branch. The relationship of P. wilsonii with P. likiangensis was closer than that with P. neoveitchii. ABBA-BABA analysis revealed that the gene flow between P. neoveitchii and P. wilsonii was greater than that between P. neoveitchii and P. likiangensis. Compared with the background population of P. neoveitchii, the genes that were selected in the P. wilsonii population were mainly related to plant stress resistance, stomatal regulation, plant morphology and flowering. The genes selected in the P. likiangensis population were mainly related to plant stress resistance, leaf morphology and flowering. Selective sweeps were beneficial for improving the adaptability of spruce species to different habitats as well as to accelerate species differentiation. The frequent gene flow between spruce species makes their evolutionary relationships complicated. Insight into gene flow and selection pressure in spruce species will help us further understand their phylogenetic relationships and provide a scientific basis for their introduction, domestication and genetic improvement.
Collapse
Affiliation(s)
- Yifu Liu
- Key Laboratory of Forest Ecology and Environment of National Forestry and Grassland Administration, Ecology and Nature Conservation Institute, Chinese Academy of Forestry, Beijing 100091, China
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing 100091, China
| | - Aili Qin
- Key Laboratory of Forest Ecology and Environment of National Forestry and Grassland Administration, Ecology and Nature Conservation Institute, Chinese Academy of Forestry, Beijing 100091, China
| | - Ya Wang
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing 100091, China
- Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Wen Nie
- Key Laboratory of Forest Ecology and Environment of National Forestry and Grassland Administration, Ecology and Nature Conservation Institute, Chinese Academy of Forestry, Beijing 100091, China
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing 100091, China
| | - Cancan Tan
- Key Laboratory of Forest Ecology and Environment of National Forestry and Grassland Administration, Ecology and Nature Conservation Institute, Chinese Academy of Forestry, Beijing 100091, China
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing 100091, China
| | - Sanping An
- Research Institute of Forestry of Xiaolong Mountain, Gansu Provincial Key Laboratory of Secondary Forest Cultivation, Tianshui 741002, China
| | - Junhui Wang
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing 100091, China
- Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Ermei Chang
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing 100091, China
- Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Zeping Jiang
- Key Laboratory of Forest Ecology and Environment of National Forestry and Grassland Administration, Ecology and Nature Conservation Institute, Chinese Academy of Forestry, Beijing 100091, China
| | - Zirui Jia
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing 100091, China
- Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| |
Collapse
|
20
|
Zhang C, Mirarab S. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Mol Biol Evol 2022; 39:6750035. [PMID: 36201617 PMCID: PMC9750496 DOI: 10.1093/molbev/msac215] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/20/2022] [Accepted: 10/03/2022] [Indexed: 01/07/2023] Open
Abstract
Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, USA
| | | |
Collapse
|
21
|
Lozano-Fernandez J. A Practical Guide to Design and Assess a Phylogenomic Study. Genome Biol Evol 2022; 14:evac129. [PMID: 35946263 PMCID: PMC9452790 DOI: 10.1093/gbe/evac129] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2022] [Indexed: 11/13/2022] Open
Abstract
Over the last decade, molecular systematics has undergone a change of paradigm as high-throughput sequencing now makes it possible to reconstruct evolutionary relationships using genome-scale datasets. The advent of "big data" molecular phylogenetics provided a battery of new tools for biologists but simultaneously brought new methodological challenges. The increase in analytical complexity comes at the price of highly specific training in computational biology and molecular phylogenetics, resulting very often in a polarized accumulation of knowledge (technical on one side and biological on the other). Interpreting the robustness of genome-scale phylogenetic studies is not straightforward, particularly as new methodological developments have consistently shown that the general belief of "more genes, more robustness" often does not apply, and because there is a range of systematic errors that plague phylogenomic investigations. This is particularly problematic because phylogenomic studies are highly heterogeneous in their methodology, and best practices are often not clearly defined. The main aim of this article is to present what I consider as the ten most important points to take into consideration when planning a well-thought-out phylogenomic study and while evaluating the quality of published papers. The goal is to provide a practical step-by-step guide that can be easily followed by nonexperts and phylogenomic novices in order to assess the technical robustness of phylogenomic studies or improve the experimental design of a project.
Collapse
Affiliation(s)
- Jesus Lozano-Fernandez
- Department of Genetics, Microbiology and Statistics, Biodiversity Research Institute (IRBio), University of Barcelona, Avd. Diagonal 643, 08028 Barcelona, Spain
- Institute of Evolutionary Biology (CSIC – Universitat Pompeu Fabra), Passeig marítim de la Barcelona 37-49, 08003 Barcelona, Spain
| |
Collapse
|
22
|
Hill M, Roch S. Inconsistency of Triplet-Based and Quartet-Based Species Tree Estimation under Intralocus Recombination. J Comput Biol 2022; 29:1173-1197. [PMID: 36048557 DOI: 10.1089/cmb.2022.0265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We consider species tree estimation from multiple loci subject to intralocus recombination. We focus on R∗, a summary coalescent-based method using rooted triplets, as well as a related quartet-based inference method. We demonstrate analytically that in both cases, intralocus recombination gives rise to an inconsistency zone, in which correct inference is not assured even in the limit of infinite amount of data. In addition, we validate and characterize this inconsistency zone through a simulation study, which suggests that differential rates of recombination between closely related taxa can amplify the effect of incomplete lineage sorting and contribute to inconsistency.
Collapse
Affiliation(s)
- Max Hill
- Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Sebastien Roch
- Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
23
|
Smith BT, Merwin J, Provost KL, Thom G, Brumfield RT, Ferreira M, Mauck Iii WM, Moyle RG, Wright T, Joseph L. Phylogenomic analysis of the parrots of the world distinguishes artifactual from biological sources of gene tree discordance. Syst Biol 2022; 72:228-241. [PMID: 35916751 DOI: 10.1093/sysbio/syac055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 02/22/2022] [Accepted: 07/22/2022] [Indexed: 11/14/2022] Open
Abstract
Gene tree discordance is expected in phylogenomic trees and biological processes are often invoked to explain it. However, heterogeneous levels of phylogenetic signal among individuals within datasets may cause artifactual sources of topological discordance. We examined how the information content in tips and subclades impacts topological discordance in the parrots (Order: Psittaciformes), a diverse and highly threatened clade of nearly 400 species. Using ultraconserved elements from 96% of the clade's species-level diversity, we estimated concatenated and species trees for 382 ingroup taxa. We found that discordance among tree topologies was most common at nodes dating between the late Miocene and Pliocene, and often at the taxonomic level of genus. Accordingly, we used two metrics to characterize information content in tips and assess the degree to which conflict between trees was being driven by lower quality samples. Most instances of topological conflict and non-monophyletic genera in the species tree could be objectively identified using these metrics. For subclades still discordant after tip-based filtering, we used a machine learning approach to determine whether phylogenetic signal or noise was the more important predictor of metrics supporting the alternative topologies. We found that when signal favored one of the topologies, noise was the most important variable in poorly performing models that favored the alternative topology. In sum, we show that artifactual sources of gene tree discordance, which are likely a common phenomenon in many datasets, can be distinguished from biological sources by quantifying the information content in each tip and modeling which factors support each topology.
Collapse
Affiliation(s)
- Brian Tilston Smith
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
| | - Jon Merwin
- Department of Ornithology, Academy of Natural Sciences of Drexel University, 1900 Benjamin Franklin Parkway, Philadelphia, PA 19103, USA.,Department of Biodiversity, Earth, and Environmental Science, Drexel University, Philadelphia, PA 19103, USA
| | - Kaiya L Provost
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, 318 W. 12th Avenue, Columbus, OH 43210, USA
| | - Gregory Thom
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Robb T Brumfield
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Mateus Ferreira
- Centro de Estudos da Biodiversidade, Universidade Federal de Roraima, Av. Cap. Ene Garcez, 2413, Boa Vista, RR, Brazil
| | - William M Mauck Iii
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
| | - Robert G Moyle
- Department of Ecology and Evolutionary Biology and Biodiversity Institute, University of Kansas, 1345 Jayhawk Blvd., Lawrence, KS 66045, USA
| | - Timothy Wright
- Department of Biology, New Mexico State University, Las Cruces, NM, 88003, USA
| | - Leo Joseph
- Australian National Wildlife Collection, National Research Collections Australia, CSIRO, GPO Box 1700, Canberra, ACT, 2601, Australia
| |
Collapse
|
24
|
Brower AVZ. Intersubjective Corroboration. Cladistics 2022. [DOI: 10.1111/cla.12509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Affiliation(s)
- Andrew V. Z. Brower
- USDA‐APHIS National Identification Services Riverdale MD 20737 USA
- Division of Invertebrates American Museum of Natural History New York NY 10024 USA
- Department of Entomology U.S. National Museum of Natural History Washington DC 20560 USA
| |
Collapse
|
25
|
Gatesy J, Springer MS. Phylogenomic Coalescent Analyses of Avian Retroelements Infer Zero-Length Branches at the Base of Neoaves, Emergent Support for Controversial Clades, and Ancient Introgressive Hybridization in Afroaves. Genes (Basel) 2022; 13:1167. [PMID: 35885951 PMCID: PMC9324441 DOI: 10.3390/genes13071167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 06/20/2022] [Accepted: 06/21/2022] [Indexed: 01/25/2023] Open
Abstract
Retroelement insertions (RIs) are low-homoplasy characters that are ideal data for addressing deep evolutionary radiations, where gene tree reconstruction errors can severely hinder phylogenetic inference with DNA and protein sequence data. Phylogenomic studies of Neoaves, a large clade of birds (>9000 species) that first diversified near the Cretaceous−Paleogene boundary, have yielded an array of robustly supported, contradictory relationships among deep lineages. Here, we reanalyzed a large RI matrix for birds using recently proposed quartet-based coalescent methods that enable inference of large species trees including branch lengths in coalescent units, clade-support, statistical tests for gene flow, and combined analysis with DNA-sequence-based gene trees. Genome-scale coalescent analyses revealed extremely short branches at the base of Neoaves, meager branch support, and limited congruence with previous work at the most challenging nodes. Despite widespread topological conflicts with DNA-sequence-based trees, combined analyses of RIs with thousands of gene trees show emergent support for multiple higher-level clades (Columbea, Passerea, Columbimorphae, Otidimorphae, Phaethoquornithes). RIs express asymmetrical support for deep relationships within the subclade Afroaves that hints at ancient gene flow involving the owl lineage (Strigiformes). Because DNA-sequence data are challenged by gene tree-reconstruction error, analysis of RIs represents one approach for improving gene tree-based methods when divergences are deep, internodes are short, terminal branches are long, and introgressive hybridization further confounds species−tree inference.
Collapse
Affiliation(s)
- John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Mark S. Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA;
| |
Collapse
|
26
|
Wang N, Braun EL, Liang B, Cracraft J, Smith SA. Categorical edge-based analyses of phylogenomic data reveal conflicting signals for difficult relationships in the avian tree. Mol Phylogenet Evol 2022; 174:107550. [PMID: 35691570 DOI: 10.1016/j.ympev.2022.107550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 05/13/2022] [Accepted: 06/02/2022] [Indexed: 11/28/2022]
Abstract
Phylogenetic analyses fail to yield a satisfactory resolution of some relationships in the tree of life even with genome-scale datasets, so the failure is unlikely to reflect limitations in the amount of data. Gene tree conflicts are particularly notable in studies focused on these contentious nodes, and taxon sampling, different analytical methods, and/or data type effects can further confound analyses. Although many efforts have been made to incorporate biological conflicts, few studies have curated individual genes for their efficiency in phylogenomic studies. Here, we conduct an edge-based analysis of Neoavian evolution, examining the phylogenetic efficacy of two recent phylogenomic bird datasets and three datatypes (ultraconserved elements [UCEs], introns, and coding regions). We assess the potential causes for biases in signal-resolution for three difficult nodes: the earliest divergence of Neoaves, the position of the enigmatic Hoatzin (Opisthocomus hoazin), and the position of owls (Strigiformes). We observed extensive conflict among genes for all data types and datasets even after meticulous curation. Edge-based analyses (EBA) increased congruence and provided information about the impact of data type, GC content variation (GCCV), and outlier genes on each of nodes we examined. First, outlier gene signals appeared to drive different patterns of support for the relationships among the earliest diverging Neoaves. Second, the placement of Hoatzin was highly variable, although our EBA did reveal a previously unappreciated data type effect with an impact on its position. It also revealed that the resolution with the most support here was Hoatzin + shorebirds. Finally, GCCV, rather than data type (i.e., coding vs non-coding) per se, was correlated with a signal that supports monophyly of owls + Accipitriformes (hawks, eagles, and vultures). Eliminating high GCCV loci increased the signal for owls + mousebirds. Categorical EBA was able to reveal the nature of each edge and provide a way to highlight especially problematic branches that warrant a further examination. The current study increases our understanding about the contentious parts of the avian tree, which show even greater conflicts than appreciated previously.
Collapse
Affiliation(s)
- Ning Wang
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China; Department of Ecology & Evolutionary Biology, University of Michigan, 1105 N University Ave, Ann Arbor, MI 48109-1048, USA; Department of Ornithology, American Museum of Natural History, New York, NY 10024, USA.
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL 32607, USA
| | - Bin Liang
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China; Department of Ecology & Evolutionary Biology, University of Michigan, 1105 N University Ave, Ann Arbor, MI 48109-1048, USA
| | - Joel Cracraft
- Department of Ornithology, American Museum of Natural History, New York, NY 10024, USA
| | - Stephen A Smith
- Department of Ecology & Evolutionary Biology, University of Michigan, 1105 N University Ave, Ann Arbor, MI 48109-1048, USA
| |
Collapse
|
27
|
Annotation-free delineation of prokaryotic homology groups. PLoS Comput Biol 2022; 18:e1010216. [PMID: 35675326 PMCID: PMC9212150 DOI: 10.1371/journal.pcbi.1010216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 06/21/2022] [Accepted: 05/16/2022] [Indexed: 11/19/2022] Open
Abstract
Phylogenomic studies of prokaryotic taxa often assume conserved marker genes are homologous across their length. However, processes such as horizontal gene transfer or gene duplication and loss may disrupt this homology by recombining only parts of genes, causing gene fission or fusion. We show using simulation that it is necessary to delineate homology groups in a set of bacterial genomes without relying on gene annotations to define the boundaries of homologous regions. To solve this problem, we have developed a graph-based algorithm to partition a set of bacterial genomes into Maximal Homologous Groups of sequences (MHGs) where each MHG is a maximal set of maximum-length sequences which are homologous across the entire sequence alignment. We applied our algorithm to a dataset of 19 Enterobacteriaceae species and found that MHGs cover much greater proportions of genomes than markers and, relatedly, are less biased in terms of the functions of the genes they cover. We zoomed in on the correlation between each individual marker and their overlapping MHGs, and show that few phylogenetic splits supported by the markers are supported by the MHGs while many marker-supported splits are contradicted by the MHGs. A comparison of the species tree inferred from marker genes with the species tree inferred from MHGs suggests that the increased bias and lack of genome coverage by markers causes incorrect inferences as to the overall relationship between bacterial taxa. Assuming genes to be the basic evolutionary unit has been commonplace in bacterial genomics. For example, when quantifying the extent of horizontal gene transfer it is common to infer gene trees and reconcile them against a species tree to account for recombination-based processes. We have developed a new method which challenges this assumption by identifying contiguous regions of true homology without regards to gene boundaries and applied it to Enterobacteriaceae, a family of bacteria containing several important human pathogens. Our results show that genes are composed of distinct homologous regions with conflicting phylogenetic histories. We further demonstrate that failing to take account of this conflict, together with the functional biases we show exist among single-copy marker genes, significantly changes the consensus evolutionary tree of Enterobacteriaceae.
Collapse
|
28
|
Hancock ZB, Lehmberg ES, Blackmon H. Phylogenetics in Space: How Continuous Spatial Structure Impacts Tree Inference. Mol Phylogenet Evol 2022; 173:107505. [PMID: 35577296 DOI: 10.1016/j.ympev.2022.107505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 04/08/2022] [Accepted: 05/06/2022] [Indexed: 11/26/2022]
Abstract
The tendency to discretize biology permeates taxonomy and systematics, leading to models that simplify the often continuous nature of populations. Even when the assumption of panmixia is relaxed, most models still assume some degree of discrete structure. The multispecies coalescent has emerged as a powerful model in phylogenetics, but in its common implementation is entirely space-independent - what we call the "missing z-axis". In this article, we review the many lines of evidence for how continuous spatial structure can impact phylogenetic inference. We illustrate and expand on these by using complex continuous-space demographic models that include distinct modes of speciation. We find that the impact of spatial structure permeates all aspects of phylogenetic inference, including gene tree stoichiometry, topological and branch-length variance, network estimation, and species delimitation. We conclude by utilizing our results to suggest how researchers can identify spatial structure in phylogenetic datasets.
Collapse
|
29
|
Doronina L, Hughes GM, Moreno-Santillan D, Lawless C, Lonergan T, Ryan L, Jebb D, Kirilenko BM, Korstian JM, Dávalos LM, Vernes SC, Myers EW, Teeling EC, Hiller M, Jermiin LS, Schmitz J, Springer MS, Ray DA. Contradictory Phylogenetic Signals in the Laurasiatheria Anomaly Zone. Genes (Basel) 2022; 13:766. [PMID: 35627151 PMCID: PMC9141728 DOI: 10.3390/genes13050766] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 04/12/2022] [Accepted: 04/21/2022] [Indexed: 02/04/2023] Open
Abstract
Relationships among laurasiatherian clades represent one of the most highly disputed topics in mammalian phylogeny. In this study, we attempt to disentangle laurasiatherian interordinal relationships using two independent genome-level approaches: (1) quantifying retrotransposon presence/absence patterns, and (2) comparisons of exon datasets at the levels of nucleotides and amino acids. The two approaches revealed contradictory phylogenetic signals, possibly due to a high level of ancestral incomplete lineage sorting. The positions of Eulipotyphla and Chiroptera as the first and second earliest divergences were consistent across the approaches. However, the phylogenetic relationships of Perissodactyla, Cetartiodactyla, and Ferae, were contradictory. While retrotransposon insertion analyses suggest a clade with Cetartiodactyla and Ferae, the exon dataset favoured Cetartiodactyla and Perissodactyla. Future analyses of hitherto unsampled laurasiatherian lineages and synergistic analyses of retrotransposon insertions, exon and conserved intron/intergenic sequences might unravel the conflicting patterns of relationships in this major mammalian clade.
Collapse
Affiliation(s)
- Liliya Doronina
- Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany;
| | - Graham M. Hughes
- School of Biology and Environmental Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland; (C.L.); (T.L.); (L.R.); (E.C.T.); (L.S.J.)
| | - Diana Moreno-Santillan
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA; (D.M.-S.); (J.M.K.)
- Department of Integrative Biology, University of California, Berkeley, CA 92697, USA
| | - Colleen Lawless
- School of Biology and Environmental Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland; (C.L.); (T.L.); (L.R.); (E.C.T.); (L.S.J.)
| | - Tadhg Lonergan
- School of Biology and Environmental Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland; (C.L.); (T.L.); (L.R.); (E.C.T.); (L.S.J.)
| | - Louise Ryan
- School of Biology and Environmental Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland; (C.L.); (T.L.); (L.R.); (E.C.T.); (L.S.J.)
| | - David Jebb
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany; (D.J.); (E.W.M.)
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Bogdan M. Kirilenko
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany; (B.M.K.); (M.H.)
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
| | - Jennifer M. Korstian
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA; (D.M.-S.); (J.M.K.)
| | - Liliana M. Dávalos
- Department of Ecology and Evolution and Consortium for Inter—Disciplinary Environmental Research, Stony Brook University, Stony Brook, NY 11794, USA;
| | - Sonja C. Vernes
- School of Biology, The University of St Andrews, St Andrews KY16 9ST, UK;
- Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, 6525 Nijmegen, The Netherlands
| | - Eugene W. Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany; (D.J.); (E.W.M.)
- Faculty of Computer Science, Technical University Dresden, 01307 Dresden, Germany
- The Okinawa Institute of Science and Technology, Okinawa 904-0495, Japan
| | - Emma C. Teeling
- School of Biology and Environmental Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland; (C.L.); (T.L.); (L.R.); (E.C.T.); (L.S.J.)
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany; (B.M.K.); (M.H.)
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
| | - Lars S. Jermiin
- School of Biology and Environmental Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland; (C.L.); (T.L.); (L.R.); (E.C.T.); (L.S.J.)
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
- Earth Institute, University College Dublin, D04 V1W8 Dublin, Ireland
| | - Jürgen Schmitz
- Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany;
| | - Mark S. Springer
- Department of Evolution, Ecology and Organismal Biology, University of California, Riverside, CA 92521, USA;
| | - David A. Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA; (D.M.-S.); (J.M.K.)
| |
Collapse
|
30
|
Schull JK, Turakhia Y, Hemker JA, Dally WJ, Bejerano G. Champagne: Automated Whole-Genome Phylogenomic Character Matrix Method Using Large Genomic Indels for Homoplasy-Free Inference. Genome Biol Evol 2022; 14:evac013. [PMID: 35171243 PMCID: PMC8920512 DOI: 10.1093/gbe/evac013] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2022] [Indexed: 11/14/2022] Open
Abstract
We present Champagne, a whole-genome method for generating character matrices for phylogenomic analysis using large genomic indel events. By rigorously picking orthologous genes and locating large insertion and deletion events, Champagne delivers a character matrix that considerably reduces homoplasy compared with morphological and nucleotide-based matrices, on both established phylogenies and difficult-to-resolve nodes in the mammalian tree. Champagne provides ample evidence in the form of genomic structural variation to support incomplete lineage sorting and possible introgression in Paenungulata and human-chimp-gorilla which were previously inferred primarily through matrices composed of aligned single-nucleotide characters. Champagne also offers further evidence for Myomorpha as sister to Sciuridae and Hystricomorpha in the rodent tree. Champagne harbors distinct theoretical advantages as an automated method that produces nearly homoplasy-free character matrices on the whole-genome scale.
Collapse
Affiliation(s)
- James K Schull
- Department of Computer Science, Stanford University, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, USA
| | - James A Hemker
- Department of Computer Science, Stanford University, USA
| | - William J Dally
- Department of Computer Science, Stanford University, USA
- NVIDIA, Santa Clara, California, USA
- Department of Electrical Engineering, Stanford University, USA
| | - Gill Bejerano
- Department of Computer Science, Stanford University, USA
- Department of Developmental Biology, Stanford University, USA
- Department of Biomedical Data Science, Stanford University, USA
- Department of Pediatrics, Stanford University, USA
| |
Collapse
|
31
|
Site Pattern Probabilities Under the Multispecies Coalescent and a Relaxed Molecular Clock: Theory and Applications. J Theor Biol 2022; 542:111078. [DOI: 10.1016/j.jtbi.2022.111078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 02/24/2022] [Accepted: 02/27/2022] [Indexed: 11/22/2022]
|
32
|
Matschiner M. Species Tree Inference with SNP Data. Methods Mol Biol 2022; 2512:23-44. [PMID: 35817997 DOI: 10.1007/978-1-0716-2429-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
While the inference of species trees from molecular sequences has become a common type of analysis in studies of species diversification, few programs so far allow for the use of single-nucleotide polymorphisms (SNPs) for the same purpose. In this book chapter, I discuss the use of the Bayesian program SNAPP, which infers the species tree by mathematically integrating over all possible genealogies at each SNP. In particular, I focus on a molecular clock model developed for SNAPP, allowing the inference of divergence times together with the species tree topology and the population size, directly from SNP datasets in variant call format. With the growing availability of SNP datasets for multiple closely related species, this approach is becoming increasingly relevant for the reconstruction of the temporal framework of recent species diversification.
Collapse
Affiliation(s)
- Michael Matschiner
- Department of Palaeontology and Museum, University of Zurich, Zurich, Switzerland.
- Natural History Museum, University of Oslo, Oslo, Norway.
| |
Collapse
|
33
|
Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022; 2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenomics is the inference of phylogenetic trees based on multiple marker genes sampled in the genomes of interest. An important challenge in phylogenomics is the potential incongruence among the evolutionary histories of individual genes, which can be widespread in microorganisms due to the prevalence of horizontal gene transfer. This protocol introduces the procedures for building a phylogenetic tree of a large number of microbial genomes using a broad sampling of marker genes that are representative of whole-genome evolution. The protocol highlights the use of a gene tree summary method, which can effectively reconstruct the species tree while accounting for the topological conflicts among individual gene trees. The pipeline described in this protocol is scalable to tens of thousands of genomes while retaining high accuracy. We discussed multiple software tools, libraries, and scripts to enable convenient adoption of the protocol. The protocol is suitable for microbiology and microbiome studies based on public genomes and metagenomic data.
Collapse
Affiliation(s)
- Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
34
|
How challenging RADseq data turned out to favor coalescent-based species tree inference. A case study in Aichryson (Crassulaceae). Mol Phylogenet Evol 2021; 167:107342. [PMID: 34785384 DOI: 10.1016/j.ympev.2021.107342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 07/05/2021] [Accepted: 10/29/2021] [Indexed: 12/24/2022]
Abstract
Analysing multiple genomic regions while incorporating detection and qualification of discordance among regions has become standard for understanding phylogenetic relationships. In plants, which usually have comparatively large genomes, this is feasible by the combination of reduced-representation library (RRL) methods and high-throughput sequencing enabling the cost effective acquisition of genomic data for thousands of loci from hundreds of samples. One popular RRL method is RADseq. A major disadvantage of established RADseq approaches is the rather short fragment and sequencing range, leading to loci of little individual phylogenetic information. This issue hampers the application of coalescent-based species tree inference. The modified RADseq protocol presented here targets ca. 5,000 loci of 300-600nt length, sequenced with the latest short-read-sequencing (SRS) technology, has the potential to overcome this drawback. To illustrate the advantages of this approach we use the study group Aichryson Webb & Berthelott (Crassulaceae), a plant genus that diversified on the Canary Islands. The data analysis approach used here aims at a careful quality control of the long loci dataset. It involves an informed selection of thresholds for accurate clustering, a thorough exploration of locus properties, such as locus length, coverage and variability, to identify potential biased data and a comparative phylogenetic inference of filtered datasets, accompanied by an evaluation of resulting BS support, gene and site concordance factor values, to improve overall resolution of the resulting phylogenetic trees. The final dataset contains variable loci with an average length of 373nt and facilitates species tree estimation using a coalescent-based summary approach. Additional improvements brought by the approach are critically discussed.
Collapse
|
35
|
Simmons MP, Springer MS, Gatesy J. Gene-tree misrooting drives conflicts in phylogenomic coalescent analyses of palaeognath birds. Mol Phylogenet Evol 2021; 167:107344. [PMID: 34748873 DOI: 10.1016/j.ympev.2021.107344] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 10/08/2021] [Accepted: 11/02/2021] [Indexed: 10/19/2022]
Abstract
Phylogenomic analyses of ancient rapid radiations can produce conflicting results that are driven by differential sampling of taxa and characters as well as the limitations of alternative analytical methods. We re-examine basal relationships of palaeognath birds (ratites and tinamous) using recently published datasets of nucleotide characters from 20,850 loci as well as 4301 retroelement insertions. The original studies attributed conflicting resolutions of rheas in their inferred coalescent and concatenation trees to concatenation failing in the anomaly zone. By contrast, we find that the coalescent-based resolution of rheas is premised upon extensive gene-tree estimation errors. Furthermore, retroelement insertions contain much more conflict than originally reported and multiple insertion loci support the basal position of rheas found in concatenation trees, while none were reported in the original publication. We demonstrate how even remarkable congruence in phylogenomic studies may be driven by long-branch misplacement of a divergent outgroup, highly incongruent gene trees, differential taxon sampling that can result in gene-tree misrooting errors that bias species-tree inference, and gross homology errors. What was previously interpreted as broad, robustly supported corroboration for a single resolution in coalescent analyses may instead indicate a common bias that taints phylogenomic results across multiple genome-scale datasets. The updated retroelement dataset now supports a species tree with branch lengths that suggest an ancient anomaly zone, and both concatenation and coalescent analyses of the huge nucleotide datasets fail to yield coherent, reliable results in this challenging phylogenetic context.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| |
Collapse
|
36
|
Chen L, Jin WT, Liu XQ, Wang XQ. New insights into the phylogeny and evolution of Podocarpaceae inferred from transcriptomic data. Mol Phylogenet Evol 2021; 166:107341. [PMID: 34740782 DOI: 10.1016/j.ympev.2021.107341] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 10/28/2021] [Accepted: 10/29/2021] [Indexed: 12/14/2022]
Abstract
Phylogenies of an increasing number of taxa have been resolved with the development of phylogenomics. However, the intergeneric relationships of Podocarpaceae, the second largest family of conifers comprising 19 genera and approximately 187 species mainly distributed in the Southern Hemisphere, have not been well disentangled in previous studies, even when genome-scale data sets were used. Here we used 993 nuclear orthologous groups (OGs) and 54 chloroplast OGs (genes), which were generated from 47 transcriptomes of Podocarpaceae and its sister group Araucariaceae, to reconstruct the phylogeny of Podocarpaceae. Our study completely resolved the intergeneric relationships of Podocarpaceae represented by all extant genera and revealed that topological conflicts among phylogenetic trees could be attributed to synonymous substitutions. Moreover, we found that two morphological traits, fleshy seed cones and flattened leaves, might be important for Podocarpaceae to adapt to angiosperm-dominated forests and thus could have promoted its species diversification. In addition, our results indicate that Podocarpaceae originated in Gondwana in the late Triassic and both vicariance and dispersal have contributed to its current biogeographic patterns. Our study provides the first robust transcriptome-based phylogeny of Podocarpaceae, an evolutionary framework important for future studies of this family.
Collapse
Affiliation(s)
- Luo Chen
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei-Tao Jin
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Xin-Quan Liu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiao-Quan Wang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
37
|
Molloy EK, Gatesy J, Springer MS. Theoretical and practical considerations when using retroelement insertions to estimate species trees in the anomaly zone. Syst Biol 2021; 71:721-740. [PMID: 34677617 DOI: 10.1093/sysbio/syab086] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin-Sokal parsimony typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e. local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study, provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100 000 parsimony-informative RIs. We found that, when given just 1 000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e clades separated by branches > 0.3 CUs) with high support and identified rapid radiations (i.e. shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios.
Collapse
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, College Park, 20742, USA
| | - John Gatesy
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, Riverside, 92521, USA
| |
Collapse
|
38
|
Nesi N, Tsagkogeorga G, Tsang SM, Nicolas V, Lalis A, Scanlon AT, Riesle-Sbarbaro SA, Wiantoro S, Hitch AT, Juste J, Pinzari CA, Bonaccorso FJ, Todd CM, Lim BK, Simmons NB, McGowen MR, Rossiter SJ. Interrogating Phylogenetic Discordance Resolves Deep Splits in the Rapid Radiation of Old World Fruit Bats (Chiroptera: Pteropodidae). Syst Biol 2021; 70:1077-1089. [PMID: 33693838 PMCID: PMC8513763 DOI: 10.1093/sysbio/syab013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 04/27/2021] [Accepted: 03/03/2021] [Indexed: 11/14/2022] Open
Abstract
The family Pteropodidae (Old World fruit bats) comprises $>$200 species distributed across the Old World tropics and subtropics. Most pteropodids feed on fruit, suggesting an early origin of frugivory, although several lineages have shifted to nectar-based diets. Pteropodids are of exceptional conservation concern with $>$50% of species considered threatened, yet the systematics of this group has long been debated, with uncertainty surrounding early splits attributed to an ancient rapid diversification. Resolving the relationships among the main pteropodid lineages is essential if we are to fully understand their evolutionary distinctiveness, and the extent to which these bats have transitioned to nectar-feeding. Here we generated orthologous sequences for $>$1400 nuclear protein-coding genes (2.8 million base pairs) across 114 species from 43 genera of Old World fruit bats (57% and 96% of extant species- and genus-level diversity, respectively), and combined phylogenomic inference with filtering by information content to resolve systematic relationships among the major lineages. Concatenation and coalescent-based methods recovered three distinct backbone topologies that were not able to be reconciled by filtering via phylogenetic information content. Concordance analysis and gene genealogy interrogation show that one topology is consistently the best supported, and that observed phylogenetic conflicts arise from both gene tree error and deep incomplete lineage sorting. In addition to resolving long-standing inconsistencies in the reported relationships among major lineages, we show that Old World fruit bats have likely undergone at least seven independent dietary transitions from frugivory to nectarivory. Finally, we use this phylogeny to identify and describe one new genus. [Chiroptera; coalescence; concordance; incomplete lineage sorting; nectar feeder; species tree; target enrichment.].
Collapse
Affiliation(s)
- Nicolas Nesi
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK
| | - Georgia Tsagkogeorga
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK
| | - Susan M Tsang
- Department of Mammalogy, Division of Vertebrate Zoology, American Museum of Natural History, New York, USA
- Zoology Section, National Museum of Natural History, Manila, Philippines
| | - Violaine Nicolas
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum national d’Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - Aude Lalis
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum national d’Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - Annette T Scanlon
- School of Natural and Built Environments, University of South Australia, Mawson Lakes, SA, Australia
| | - Silke A Riesle-Sbarbaro
- Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
- Institute of Zoology, Zoological Society of London, London, UK
- Centre for Biological Threats and Special Pathogens, Robert Koch Institute, Berlin, Germany
| | - Sigit Wiantoro
- Museum Zoologicum Bogoriense, Research Center for Biology, Indonesian Institute of Sciences, Cibinong, Indonesia
| | - Alan T Hitch
- Department of Wildlife, Fish, and Conservation Biology, University of California Davis, CA, USA
| | - Javier Juste
- Estación Biológica de Doñana (CSIC), Avda. Américo Vespucio, Sevilla, Spain
| | | | | | - Christopher M Todd
- The Hawkesbury institute for the Environment, Western Sydney University, Australia
| | - Burton K Lim
- Royal Ontario Museum, Toronto, ON M5S 2C6, Canada
| | - Nancy B Simmons
- Department of Mammalogy, Division of Vertebrate Zoology, American Museum of Natural History, New York, USA
| | - Michael R McGowen
- Department of Vertebrate Zoology, Smithsonian National Museum of Natural History, Washington, DC, USA
| | - Stephen J Rossiter
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK
| |
Collapse
|
39
|
Forthman M, Braun EL, Kimball RT. Gene tree quality affects empirical coalescent branch length estimation. ZOOL SCR 2021. [DOI: 10.1111/zsc.12512] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Michael Forthman
- Department of Entomology & Nematology University of Florida Gainesville FL USA
- California State Collection of Arthropods Plant Pest Diagnostics Branch California Department of Food & Agriculture Sacramento CA USA
| | - Edward L. Braun
- Department of Biology University of Florida Gainesville FL USA
| | | |
Collapse
|
40
|
Chafin TK, Douglas MR, Bangs MR, Martin BT, Mussmann SM, Douglas ME. Taxonomic Uncertainty and the Anomaly Zone: Phylogenomics Disentangle a Rapid Radiation to Resolve Contentious Species (Gila robusta Complex) in the Colorado River. Genome Biol Evol 2021; 13:evab200. [PMID: 34432005 PMCID: PMC8449829 DOI: 10.1093/gbe/evab200] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/19/2021] [Indexed: 12/18/2022] Open
Abstract
Species are indisputable units for biodiversity conservation, yet their delimitation is fraught with both conceptual and methodological difficulties. A classic example is the taxonomic controversy surrounding the Gila robusta complex in the lower Colorado River of southwestern North America. Nominal species designations were originally defined according to weakly diagnostic morphological differences, but these conflicted with subsequent genetic analyses. Given this ambiguity, the complex was re-defined as a single polytypic unit, with the proposed "threatened" status under the U.S. Endangered Species Act of two elements being withdrawn. Here we re-evaluated the status of the complex by utilizing dense spatial and genomic sampling (n = 387 and >22 k loci), coupled with SNP-based coalescent and polymorphism-aware phylogenetic models. In doing so, we found that all three species were indeed supported as evolutionarily independent lineages, despite widespread phylogenetic discordance. To juxtapose this discrepancy with previous studies, we first categorized those evolutionary mechanisms driving discordance, then tested (and subsequently rejected) prior hypotheses which argued phylogenetic discord in the complex was driven by the hybrid origin of Gila nigra. The inconsistent patterns of diversity we found within G. robusta were instead associated with rapid Plio-Pleistocene drainage evolution, with subsequent divergence within the "anomaly zone" of tree space producing ambiguities that served to confound prior studies. Our results not only support the resurrection of the three species as distinct entities but also offer an empirical example of how phylogenetic discordance can be categorized within other recalcitrant taxa, particularly when variation is primarily partitioned at the species level.
Collapse
Affiliation(s)
- Tyler K Chafin
- Department of Biological Sciences, University of Arkansas, Fayetteville, Arkansas, USA
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, USA
| | - Marlis R Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, Arkansas, USA
| | - Max R Bangs
- Department of Biological Sciences, University of Arkansas, Fayetteville, Arkansas, USA
- Department of Biological Science, Florida State University, Tallahassee, Florida, USA
| | - Bradley T Martin
- Department of Biological Sciences, University of Arkansas, Fayetteville, Arkansas, USA
- Global Campus, University of Arkansas, Fayetteville, Arkansas, USA
| | - Steven M Mussmann
- Department of Biological Sciences, University of Arkansas, Fayetteville, Arkansas, USA
- Southwestern Native Aquatic Resources and Recovery Center, U.S. Fish & Wildlife Service, Dexter, New Mexico, USA
| | - Michael E Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, Arkansas, USA
| |
Collapse
|
41
|
Liu X, Ogilvie HA, Nakhleh L. Variational inference using approximate likelihood under the coalescent with recombination. Genome Res 2021; 31:2107-2119. [PMID: 34426513 PMCID: PMC8559707 DOI: 10.1101/gr.273631.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 08/17/2021] [Indexed: 11/30/2022]
Abstract
Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, is coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human–chimp–gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies, and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation, it is flexible enough to enable future implementations of various population models.
Collapse
Affiliation(s)
- Xinhao Liu
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Huw A Ogilvie
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
42
|
Zhang C, Zhao Y, Braun EL, Mirarab S. TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13696] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology Program University of California San Diego CA USA
| | - Yiming Zhao
- Electrical and Computer Engineering Department University of California San Diego CA USA
| | - Edward L. Braun
- Department of Biology and Genetics Institute University of Florida Gainesville FL USA
| | - Siavash Mirarab
- Electrical and Computer Engineering Department University of California San Diego CA USA
| |
Collapse
|
43
|
|
44
|
Ferrer Obiol J, James HF, Chesser RT, Bretagnolle V, González-Solís J, Rozas J, Riutort M, Welch AJ. Integrating Sequence Capture and Restriction Site-Associated DNA Sequencing to Resolve Recent Radiations of Pelagic Seabirds. Syst Biol 2021; 70:976-996. [PMID: 33512506 PMCID: PMC8357341 DOI: 10.1093/sysbio/syaa101] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 11/13/2020] [Accepted: 12/15/2020] [Indexed: 01/01/2023] Open
Abstract
The diversification of modern birds has been shaped by a number of radiations. Rapid diversification events make reconstructing the evolutionary relationships among taxa challenging due to the convoluted effects of incomplete lineage sorting (ILS) and introgression. Phylogenomic data sets have the potential to detect patterns of phylogenetic incongruence, and to address their causes. However, the footprints of ILS and introgression on sequence data can vary between different phylogenomic markers at different phylogenetic scales depending on factors such as their evolutionary rates or their selection pressures. We show that combining phylogenomic markers that evolve at different rates, such as paired-end double-digest restriction site-associated DNA (PE-ddRAD) and ultraconserved elements (UCEs), allows a comprehensive exploration of the causes of phylogenetic discordance associated with short internodes at different timescales. We used thousands of UCE and PE-ddRAD markers to produce the first well-resolved phylogeny of shearwaters, a group of medium-sized pelagic seabirds that are among the most phylogenetically controversial and endangered bird groups. We found that phylogenomic conflict was mainly derived from high levels of ILS due to rapid speciation events. We also documented a case of introgression, despite the high philopatry of shearwaters to their breeding sites, which typically limits gene flow. We integrated state-of-the-art concatenated and coalescent-based approaches to expand on previous comparisons of UCE and RAD-Seq data sets for phylogenetics, divergence time estimation, and inference of introgression, and we propose a strategy to optimize RAD-Seq data for phylogenetic analyses. Our results highlight the usefulness of combining phylogenomic markers evolving at different rates to understand the causes of phylogenetic discordance at different timescales. [Aves; incomplete lineage sorting; introgression; PE-ddRAD-Seq; phylogenomics; radiations; shearwaters; UCEs.].
Collapse
Affiliation(s)
- Joan Ferrer Obiol
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Barcelona, Catalonia, Spain
| | - Helen F James
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - R Terry Chesser
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- U.S. Geological Survey, Patuxent Wildlife Research Center, Laurel, MD, USA
| | - Vincent Bretagnolle
- Centre d’Études Biologiques de Chizé, CNRS & La Rochelle Université, 79360, Villiers en Bois, France
| | - Jacob González-Solís
- Institut de Recerca de la Biodiversitat (IRBio), Barcelona, Catalonia, Spain
- Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Barcelona, Catalonia, Spain
| | - Marta Riutort
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Barcelona, Catalonia, Spain
| | | |
Collapse
|
45
|
Adams RH, Castoe TA, DeGiorgio M. PhyloWGA: chromosome-aware phylogenetic interrogation of whole genome alignments. Bioinformatics 2021; 37:1923-1925. [PMID: 33051672 DOI: 10.1093/bioinformatics/btaa884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 09/16/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Here, we present PhyloWGA, an open source R package for conducting phylogenetic analysis and investigation of whole genome data. AVAILABILITYAND IMPLEMENTATION Available at Github (https://github.com/radamsRHA/PhyloWGA). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Richard H Adams
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
46
|
Richards A, Kubatko L. Bayesian-Weighted Triplet and Quartet Methods for Species Tree Inference. Bull Math Biol 2021; 83:93. [PMID: 34297209 DOI: 10.1007/s11538-021-00918-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 06/03/2021] [Indexed: 11/26/2022]
Abstract
Inference of the evolutionary histories of species, commonly represented by a species tree, is complicated by the divergent evolutionary history of different parts of the genome. Different loci on the genome can have different histories from the underlying species tree (and each other) due to processes such as incomplete lineage sorting (ILS), gene duplication and loss, and horizontal gene transfer. The multispecies coalescent is a commonly used model for performing inference on species and gene trees in the presence of ILS. This paper introduces Lily-T and Lily-Q, two new methods for species tree inference under the multispecies coalescent. We then compare them to two frequently used methods, SVDQuartets and ASTRAL, using simulated and empirical data. Both methods generally showed improvement over SVDQuartets, and Lily-Q was superior to Lily-T for most simulation settings. The comparison to ASTRAL was more mixed-Lily-Q tended to be better than ASTRAL when the length of recombination-free loci was short, when the coalescent population parameter [Formula: see text] was small, or when the internal branch lengths were longer.
Collapse
Affiliation(s)
- Andrew Richards
- Department of Statistics, The Ohio State University, Columbus, USA
| | - Laura Kubatko
- Department of Statistics, The Ohio State University, Columbus, USA.
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, USA.
| |
Collapse
|
47
|
When good mitochondria go bad: Cyto-nuclear discordance in landfowl (Aves: Galliformes). Gene 2021; 801:145841. [PMID: 34274481 DOI: 10.1016/j.gene.2021.145841] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 06/10/2021] [Accepted: 07/13/2021] [Indexed: 11/22/2022]
Abstract
Mitochondrial sequences were among the first molecular data collected for phylogenetic studies and they are plentiful in DNA sequence archives. However, the future value of mitogenomic data in phylogenetics is uncertain, because its phylogenetic signal sometimes conflicts with that of the nuclear genome. A thorough understanding of the causes and prevalence of cyto-nuclear discordance would aid in reconciling different results owing to sequence data type, and provide a framework for interpreting megaphylogenies when taxa which lack substantial nuclear data are placed using mitochondrial data. Here, we examine the prevalence and possible causes of cyto-nuclear discordance in the landfowl (Aves: Galliformes), leveraging 47 new mitogenomes assembled from off-target reads recovered as part of a target-capture study. We evaluated two hypotheses, that cyto-nuclear discordance is "genuine" and a result of biological processes such as incomplete lineage sorting or introgression, and that cyto-nuclear discordance is an artifact of inaccurate mitochondrial tree estimation (the "inaccurate estimation" hypothesis). We identified seven well-supported topological differences between the mitogenomic tree and trees based on nuclear data. These well-supported topological differences were robust to model selection. An examination of sites suggests these differences were driven by small number of sites, particularly from third-codon positions, suggesting that they were not confounded by convergent directional selection. Hence, the hypothesis of genuine discordance was supported.
Collapse
|
48
|
Esquerré D, Keogh JS, Demangel D, Morando M, Avila LJ, Sites JW, Ferri-Yáñez F, Leaché AD. Rapid radiation and rampant reticulation: Phylogenomics of South American Liolaemus lizards. Syst Biol 2021; 71:286-300. [PMID: 34259868 DOI: 10.1093/sysbio/syab058] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 06/25/2021] [Accepted: 06/30/2021] [Indexed: 01/09/2023] Open
Abstract
Understanding the factors that cause heterogeneity among gene trees can increase the accuracy of species trees. Discordant signals across the genome are commonly produced by incomplete lineage sorting (ILS) and introgression, which in turn can result in reticulate evolution. Species tree inference using the multispecies coalescent is designed to deal with ILS and is robust to low levels of introgression, but extensive introgression violates the fundamental assumption that relationships are strictly bifurcating. In this study, we explore the phylogenomics of the iconic Liolaemus subgenus of South American lizards, a group of over 100 species mostly distributed in and around the Andes mountains. Using mitochondrial DNA (mtDNA) and genome-wide restriction-site associated DNA sequencing (RADseq; nDNA hereafter), we inferred a time-calibrated mtDNA gene tree, nDNA species trees, and phylogenetic networks. We found high levels of discordance between mtDNA and nDNA, which we attribute in part to extensive ILS resulting from rapid diversification. These data also reveal extensive and deep introgression, which combined with rapid diversification, explain the high level of phylogenetic discordance. We discuss these findings in the context of Andean orogeny and glacial cycles that fragmented, expanded, and contracted species distributions. Finally, we use the new phylogeny to resolve long-standing taxonomic issues in one of the most studied lizard groups in the New World.
Collapse
Affiliation(s)
- Damien Esquerré
- Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - J Scott Keogh
- Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | | | - Mariana Morando
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC- CONICET), Puerto Madryn, Chubut, Argentina
| | - Luciano J Avila
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC- CONICET), Puerto Madryn, Chubut, Argentina
| | - Jack W Sites
- Department of Biology and M.L. Bean Life Science Museum, Brigham Young University, Provo, Utah, USA
| | - Francisco Ferri-Yáñez
- Departamento de Biogeografía y Cambio Global, Museo Nacional de Ciencias Naturales, CSIC & Laboratorio Internacional en Cambio Global CSIC-PUC (LINCGlobal), Calle José Gutiérrez Abascal, 2, 28006, Madrid, Spain
| | - Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, Washington, USA
| |
Collapse
|
49
|
Walker JF, Smith SA, Hodel RGJ, Moyroud E. Concordance-based approaches for the inference of relationships and molecular rates with phylogenomic datasets. Syst Biol 2021; 71:943-958. [PMID: 34240209 DOI: 10.1093/sysbio/syab052] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 06/23/2021] [Accepted: 07/01/2021] [Indexed: 11/12/2022] Open
Abstract
Gene tree conflict is common and finding methods to analyze and alleviate the negative effects that conflict has on species tree analysis is a crucial part of phylogenomics. This study aims to expand the discussion of inferring species trees and molecular branch lengths when conflict is present. Conflict is typically examined in two ways: inferring its prevalence, and inferring the influence of the individual genes (how strongly one gene supports any given topology compared to an alternative topology). Here, we examine a procedure for incorporating both conflict and the influence of genes in order to infer evolutionary relationships. All supported relationships in the gene trees are analyzed and the likelihood of the genes constrained to these relationships is summed to provide a likelihood for the relationship. Consensus tree assembly is conducted based on the sum of likelihoods for a given relationship and choosing relationships based on the most likely relationship assuming it does not conflict with a relationship that has a higher likelihood score. If it is not possible for all most likely relationships to be combined into a single bifurcating tree then multiple trees are produced and a consensus tree with a polytomy is created. This procedure allows for more influential genes to have greater influence on an inferred relationship, does not assume conflict has arisen from any one source, and does not force the dataset to produce a single bifurcating tree. Using this approach on three empirical datasets, we examine and discuss the relationship between influence and prevalence of gene tree conflict. We find that in one of the datasets, assembling a bifurcating consensus tree solely composed of the most likely relationships is impossible. To account for conflict in molecular rate analysis we also introduce a concordance-based approach to the summary and estimation of branch lengths suitable for downstream comparative analyses. We demonstrate through simulation that even under high levels of stochastic conflict, the mean and median of the concordant rates recapitulate the true molecular rate better than using a supermatrix approach. Using a large phylogenomic dataset, we examine rate heterogeneity across concordant genes with a focus on the branch subtending crown angiosperms. Notably, we find highly variable rates of evolution along the branch subtending crown angiosperms. The approaches outlined here have several limitations, but they also represent some alternative methods for harnessing the complexity of phylogenomic datasets and enrich our inferences of both species' relationships and evolutionary processes.
Collapse
Affiliation(s)
- Joseph F Walker
- The Sainsbury Laboratory, University of Cambridge, 47 Bateman Street, Cambridge CB2 1LR, UK.,Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL, 60607 U.S.A
| | - Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Richard G J Hodel
- Department of Botany, National Museum of Natural History, MRC 166, Smithsonian Institution, Washington, DC, 20013-7012, USA
| | - Edwige Moyroud
- The Sainsbury Laboratory, University of Cambridge, 47 Bateman Street, Cambridge CB2 1LR, UK.,Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| |
Collapse
|
50
|
Doyle JJ. Defining coalescent genes: Theory meets practice in organelle phylogenomics. Syst Biol 2021; 71:476-489. [PMID: 34191012 DOI: 10.1093/sysbio/syab053] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 06/24/2021] [Accepted: 06/28/2021] [Indexed: 11/13/2022] Open
Abstract
The species tree paradigm that dominates current molecular systematic practice infers species trees from collections of sequences under assumptions of the multispecies coalescent (MSC), i.e., that there is free recombination between the sequences and no (or very low) recombination within them. These coalescent genes (c-genes) are thus defined in an historical rather than molecular sense, and can in theory be as large as an entire genome or as small as a single nucleotide. A debate about how to define c-genes centers on the contention that nuclear gene sequences used in many coalescent analyses undergo too much recombination, such that their introns comprise multiple c-genes, violating a key assumption of the MSC. Recently a similar argument has been made for the genes of plastid (e.g., chloroplast) and mitochondrial genomes, which for the last 30 or more years have been considered to represent a single c-gene for the purposes of phylogeny reconstruction because they are non-recombining in a historical sense. Consequently, it has been suggested that these genomes should be analyzed using coalescent methods that treat their genes-over 70 protein-coding genes in the case of most plastid genomes (plastomes)-as independent estimates of species phylogeny, in contrast to the usual practice of concatenation, which is appropriate for generating gene trees. However, although recombination certainly occurs in the plastome, as has been recognized since the 1970's, it is unlikely to be phylogenetically relevant. This is because such historically effective recombination can only occur when plastomes with incongruent histories are brought together in the same plastid. However, plastids sort rapidly into different cell lineages and rarely fuse. Thus, because of plastid biology, the plastome is a more canonical c-gene than is the average multi-intron mammalian nuclear gene. The plastome should thus continue to be treated as a single estimate of the underlying species phylogeny, as should the mitochondrial genome. The implications of this long-held insight of molecular systematics for studies in the phylogenomic era are explored.
Collapse
Affiliation(s)
- Jeff J Doyle
- Plant Biology Section, Plant Breeding & Genetics Section, and L. H. Bailey Hortorium, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| |
Collapse
|