1
|
Xu XM, Xu H, Yang Z, Wei Z, Gu JY, Liu DH, Liu QR, Zhu SX. Phylogeny, biogeography, and character evolution of Anaphalis (Gnaphalieae, Asteraceae). FRONTIERS IN PLANT SCIENCE 2024; 15:1336229. [PMID: 38384761 PMCID: PMC10879626 DOI: 10.3389/fpls.2024.1336229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 01/24/2024] [Indexed: 02/23/2024]
Abstract
The HAP clade, mainly including Helichrysum Mill, Anaphalis DC., and Pseudognaphalium Kirp., is a major component of tribe Gnaphalieae (Asteraceae). In this clade, Anaphalis represents the largest genus of Asian Gnaphalieae. The intergeneric relationships among Anaphalis and its related genera and the infrageneric taxonomy of this genus are complex and remain controversial. However, there are few studies that have focused on these issues. Herein, based on the current most comprehensive sampling of the HAP clade, especially Anaphalis, we conducted phylogenetic analyses using chloroplast (cp) genome and nuclear ribosomal DNA (nrDNA) to evaluate the relationships within HAP clade, test the monophyly of Anaphalis, and examine the infrageneric taxonomy of this genus. Meanwhile, the morphological characters were verified to determine the circumscription and infrageneric taxonomy system of Anaphalis. Additionally, the biogeographical history, diversification processes, and evolution of crucial morphological characters were estimated and inferred. Our phylogenetic analyses suggested that Anaphalis is polyphyletic because it nested with Helichrysum and Pseudognaphalium. Two and four main clades of Anaphalis were identified in cp genome and nrDNA trees, respectively. Compared with nrDNA trees, the cp genome trees were more effective for phylogenetic resolution. After comprehensively analyzing morphological and phylogenetic evidence, it was concluded that the achene surface ornamentation and leaf base showed less homoplasy and supported the two Anaphalis lineages that were inferred from cp genome. Our biogeographical analyses based on cp genome indicated that HAP clade underwent rapid diversification from late Miocene to Pliocene. The two Anaphalis lineages appeared to have originated in Africa, then spread to Western and Southern Asia, and subsequently moved into Southwestern China forming a diversity center. The dispersal patterns of the two Anaphalis lineages were different. One dispersed around the world, except in Africa and South America. The other one dispersed to Eastern and Southeastern Asia from the ancestral origin region.
Collapse
Affiliation(s)
- Xue-Min Xu
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - He Xu
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Zheng Yang
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Zhen Wei
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Jun-Yu Gu
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
- Resource Research Institute, Henan Provincial Third Institute of Resources and Environment Investigation, Zhengzhou, China
| | - Dan-Hui Liu
- Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumchi, China
| | - Quan-Ru Liu
- College of Life Sciences, Beijing Normal University, Beijing, China
| | - Shi-Xin Zhu
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| |
Collapse
|
2
|
Mavrodiev EV, Madorsky A. On Pattern-Cladistic Analyses Based on Complete Plastid Genome Sequences. Acta Biotheor 2023; 71:22. [PMID: 37922001 DOI: 10.1007/s10441-023-09475-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 10/26/2023] [Indexed: 11/05/2023]
Abstract
The fundamental Hennigian principle, grouping solely on synapomorphy, is seldom used in modern phylogenetics. In the submitted paper, we apply this principle in reanalyzing five datasets comprising 197 complete plastid genomes (plastomes). We focused on the latter because plastome-based DNA sequence data gained dramatic popularity in molecular systematics during the last decade. We show that pattern-cladistic analyses based on complete plastid genome sequences can successfully resolve affinities between plant taxa, simultaneously simplifying both the genomic and analytical frameworks of phylogenetic studies. We developed "Matrix to Newick" (M2N), a program to represent the standard molecular alignment of plastid genomes in the form of trees or relationships directly. Thus, massive plastome-based DNA sequence data can be successfully represented in a relational form rather than as a standard molecular alignment. Application of methods of median supertree construction (the Average Consensus method has been used as an example in this study) or Maximum Parsimony analysis to relational representations of plastome sequence data may help systematist to avoid the complicated assumption-based frameworks of Maximum Likelihood or Bayesian phylogenetics that are most used today in massive plastid sequence data analyses. We also found that significant amounts of pure genomic information that typically accommodate the majority of current plastid phylogenomic studies can be effectively dropped by systematists if they focus on the pattern-cladistics or relational analyses of plastome-based molecular data. The proposed pattern-cladistic approach is a powerful and straightforward heuristic alternative to modern plastome-based phylogenetics.
Collapse
Affiliation(s)
- Evgeny V Mavrodiev
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA.
| | | |
Collapse
|
3
|
Simmons MP, Goloboff PA, Stöver BC, Springer MS, Gatesy J. Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses. Cladistics 2023; 39:418-436. [PMID: 37096985 DOI: 10.1111/cla.12540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/24/2023] [Indexed: 04/26/2023] Open
Abstract
Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Pablo A Goloboff
- CONICET, INSUE, Fundación Miguel Lillo, Miguel Lillo 251, 4000, S.M. de Tucumán, Argentina
| | - Ben C Stöver
- Institute for Evolution and Biodiversity, WMU Münster, 48149, Münster, Germany
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
4
|
Xu XM, Wei Z, Sun JZ, Zhao QF, Lu Y, Wang ZL, Zhu SX. Phylogeny of Leontopodium (Asteraceae) in China-with a reference to plastid genome and nuclear ribosomal DNA. FRONTIERS IN PLANT SCIENCE 2023; 14:1163065. [PMID: 37583593 PMCID: PMC10425225 DOI: 10.3389/fpls.2023.1163065] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 07/10/2023] [Indexed: 08/17/2023]
Abstract
The infrageneric taxonomy system, species delimitation, and interspecies systematic relationships of Leontopodium remain controversial and complex. However, only a few studies have focused on the molecular phylogeny of this genus. In this study, the characteristics of 43 chloroplast genomes of Leontopodium and its closely related genera were analyzed. Phylogenetic relationships were inferred based on chloroplast genomes and nuclear ribosomal DNA (nrDNA). Finally, together with the morphological characteristics, the relationships within Leontopodium were identified and discussed. The results showed that the chloroplast genomes of Filago, Gamochaeta, and Leontopodium were well-conserved in terms of gene number, gene order, and GC content. The most remarkable differences among the three genera were the length of the complete chloroplast genome, large single-copy region, small single-copy region, and inverted repeat region. In addition, the chloroplast genome structure of Leontopodium exhibited high consistency and was obviously different from that of Filago and Gamochaeta in some regions, such as matk, trnK (UUU)-rps16, petN-psbM, and trnE (UUC)-rpoB. All the phylogenetic trees indicated that Leontopodium was monophyletic. Except for the subgeneric level, our molecular phylogenetic results were inconsistent with the previous taxonomic system, which was based on morphological characteristics. Nevertheless, we found that the characteristics of the leaf base, stem types, and carpopodium base were phylogenetically correlated and may have potential value in the taxonomic study of Leontopodium. In the phylogenetic trees inferred using complete chloroplast genomes, the subgen. Leontopodium was divided into two clades (Clades 1 and 2), with most species in Clade 1 having herbaceous stems, amplexicaul, or sheathed leaves, and constricted carpopodium; most species in Clade 2 had woody stems, not amplexicaul and sheathed leaves, and not constricted carpopodium.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Shi-Xin Zhu
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| |
Collapse
|
5
|
Fleming JF, Valero‐Gracia A, Struck TH. Identifying and addressing methodological incongruence in phylogenomics: A review. Evol Appl 2023; 16:1087-1104. [PMID: 37360032 PMCID: PMC10286231 DOI: 10.1111/eva.13565] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/07/2023] [Accepted: 05/17/2023] [Indexed: 06/28/2023] Open
Abstract
The availability of phylogenetic data has greatly expanded in recent years. As a result, a new era in phylogenetic analysis is dawning-one in which the methods we use to analyse and assess our data are the bottleneck to producing valuable phylogenetic hypotheses, rather than the need to acquire more data. This makes the ability to accurately appraise and evaluate new methods of phylogenetic analysis and phylogenetic artefact identification more important than ever. Incongruence in phylogenetic reconstructions based on different datasets may be due to two major sources: biological and methodological. Biological sources comprise processes like horizontal gene transfer, hybridization and incomplete lineage sorting, while methodological ones contain falsely assigned data or violations of the assumptions of the underlying model. While the former provides interesting insights into the evolutionary history of the investigated groups, the latter should be avoided or minimized as best as possible. However, errors introduced by methodology must first be excluded or minimized to be able to conclude that biological sources are the cause. Fortunately, a variety of useful tools exist to help detect such misassignments and model violations and to apply ameliorating measurements. Still, the number of methods and their theoretical underpinning can be overwhelming and opaque. Here, we present a practical and comprehensive review of recent developments in techniques to detect artefacts arising from model violations and poorly assigned data. The advantages and disadvantages of the different methods to detect such misleading signals in phylogenetic reconstructions are also discussed. As there is no one-size-fits-all solution, this review can serve as a guide in choosing the most appropriate detection methods depending on both the actual dataset and the computational power available to the researcher. Ultimately, this informed selection will have a positive impact on the broader field, allowing us to better understand the evolutionary history of the group of interest.
Collapse
|
6
|
Zhang C, Mirarab S. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Mol Biol Evol 2022; 39:6750035. [PMID: 36201617 PMCID: PMC9750496 DOI: 10.1093/molbev/msac215] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/20/2022] [Accepted: 10/03/2022] [Indexed: 01/07/2023] Open
Abstract
Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, USA
| | | |
Collapse
|
7
|
Almeida de Jesus D, Batista DM, Monteiro EF, Salzman S, Carvalho LM, Santana K, André T. Structural changes and adaptative evolutionary constraints in FLOWERING LOCUS T and TERMINAL FLOWER1-like genes of flowering plants. Front Genet 2022; 13:954015. [PMID: 36246591 PMCID: PMC9556947 DOI: 10.3389/fgene.2022.954015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 08/29/2022] [Indexed: 11/13/2022] Open
Abstract
Regulation of flowering is a crucial event in the evolutionary history of angiosperms. The production of flowers is regulated through the integration of different environmental and endogenous stimuli, many of which involve the activation of different genes in a hierarchical and complex signaling network. The FLOWERING LOCUS T/TERMINAL FLOWER 1 (FT/TFL1) gene family is known to regulate important aspects of flowering in plants. To better understand the pivotal events that changed FT and TFL1 functions during the evolution of angiosperms, we reconstructed the ancestral sequences of FT/TFL1-like genes and predicted protein structures through in silico modeling to identify determinant sites that evolved in both proteins and allowed the adaptative diversification in the flowering phenology and developmental processes. In addition, we demonstrate that the occurrence of destabilizing mutations in residues located at the phosphatidylcholine binding sites of FT structure are under positive selection, and some residues of 4th exon are under negative selection, which is compensated by the occurrence of stabilizing mutations in key regions and the P-loop to maintain the overall protein stability. Our results shed light on the evolutionary history of key genes involved in the diversification of angiosperms.
Collapse
Affiliation(s)
- Deivid Almeida de Jesus
- Institute of Biology Genetics Graduate Program, Federal University of Rio de Janeiro Rio de Janeiro, Rio de Janeiro, Brazil
| | - Darlisson Mesquista Batista
- Programa de Pós-Graduação em Biodiversidade, Universidade Federal do Oeste do Pará Santarém, Pará, Santarém, Brazil
| | - Elton Figueira Monteiro
- Programa de Pós-Graduação em Biodiversidade, Universidade Federal do Oeste do Pará Santarém, Pará, Santarém, Brazil
| | - Shayla Salzman
- School of Integrative Plant Sciences. Section of Plant Biology. Cornell University Ithaca, New York, NY, United States
| | - Lucas Miguel Carvalho
- Center for Computing in Engineering and Sciences, State University of Campinas. Campinas, São Paulo, Brazil
| | - Kauê Santana
- Institute of Biodiversity, Federal University of Western Pará Santarém Pará, Santarém, Brazil
- *Correspondence: Kauê Santana, ; Thiago André,
| | - Thiago André
- Botany Department, University of Brasília, Brasília, Brazil
- *Correspondence: Kauê Santana, ; Thiago André,
| |
Collapse
|
8
|
Smith BT, Merwin J, Provost KL, Thom G, Brumfield RT, Ferreira M, Mauck Iii WM, Moyle RG, Wright T, Joseph L. Phylogenomic analysis of the parrots of the world distinguishes artifactual from biological sources of gene tree discordance. Syst Biol 2022; 72:228-241. [PMID: 35916751 DOI: 10.1093/sysbio/syac055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 02/22/2022] [Accepted: 07/22/2022] [Indexed: 11/14/2022] Open
Abstract
Gene tree discordance is expected in phylogenomic trees and biological processes are often invoked to explain it. However, heterogeneous levels of phylogenetic signal among individuals within datasets may cause artifactual sources of topological discordance. We examined how the information content in tips and subclades impacts topological discordance in the parrots (Order: Psittaciformes), a diverse and highly threatened clade of nearly 400 species. Using ultraconserved elements from 96% of the clade's species-level diversity, we estimated concatenated and species trees for 382 ingroup taxa. We found that discordance among tree topologies was most common at nodes dating between the late Miocene and Pliocene, and often at the taxonomic level of genus. Accordingly, we used two metrics to characterize information content in tips and assess the degree to which conflict between trees was being driven by lower quality samples. Most instances of topological conflict and non-monophyletic genera in the species tree could be objectively identified using these metrics. For subclades still discordant after tip-based filtering, we used a machine learning approach to determine whether phylogenetic signal or noise was the more important predictor of metrics supporting the alternative topologies. We found that when signal favored one of the topologies, noise was the most important variable in poorly performing models that favored the alternative topology. In sum, we show that artifactual sources of gene tree discordance, which are likely a common phenomenon in many datasets, can be distinguished from biological sources by quantifying the information content in each tip and modeling which factors support each topology.
Collapse
Affiliation(s)
- Brian Tilston Smith
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
| | - Jon Merwin
- Department of Ornithology, Academy of Natural Sciences of Drexel University, 1900 Benjamin Franklin Parkway, Philadelphia, PA 19103, USA.,Department of Biodiversity, Earth, and Environmental Science, Drexel University, Philadelphia, PA 19103, USA
| | - Kaiya L Provost
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, 318 W. 12th Avenue, Columbus, OH 43210, USA
| | - Gregory Thom
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Robb T Brumfield
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Mateus Ferreira
- Centro de Estudos da Biodiversidade, Universidade Federal de Roraima, Av. Cap. Ene Garcez, 2413, Boa Vista, RR, Brazil
| | - William M Mauck Iii
- Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
| | - Robert G Moyle
- Department of Ecology and Evolutionary Biology and Biodiversity Institute, University of Kansas, 1345 Jayhawk Blvd., Lawrence, KS 66045, USA
| | - Timothy Wright
- Department of Biology, New Mexico State University, Las Cruces, NM, 88003, USA
| | - Leo Joseph
- Australian National Wildlife Collection, National Research Collections Australia, CSIRO, GPO Box 1700, Canberra, ACT, 2601, Australia
| |
Collapse
|
9
|
Gatesy J, Springer MS. Phylogenomic Coalescent Analyses of Avian Retroelements Infer Zero-Length Branches at the Base of Neoaves, Emergent Support for Controversial Clades, and Ancient Introgressive Hybridization in Afroaves. Genes (Basel) 2022; 13:genes13071167. [PMID: 35885951 PMCID: PMC9324441 DOI: 10.3390/genes13071167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 06/20/2022] [Accepted: 06/21/2022] [Indexed: 01/25/2023] Open
Abstract
Retroelement insertions (RIs) are low-homoplasy characters that are ideal data for addressing deep evolutionary radiations, where gene tree reconstruction errors can severely hinder phylogenetic inference with DNA and protein sequence data. Phylogenomic studies of Neoaves, a large clade of birds (>9000 species) that first diversified near the Cretaceous−Paleogene boundary, have yielded an array of robustly supported, contradictory relationships among deep lineages. Here, we reanalyzed a large RI matrix for birds using recently proposed quartet-based coalescent methods that enable inference of large species trees including branch lengths in coalescent units, clade-support, statistical tests for gene flow, and combined analysis with DNA-sequence-based gene trees. Genome-scale coalescent analyses revealed extremely short branches at the base of Neoaves, meager branch support, and limited congruence with previous work at the most challenging nodes. Despite widespread topological conflicts with DNA-sequence-based trees, combined analyses of RIs with thousands of gene trees show emergent support for multiple higher-level clades (Columbea, Passerea, Columbimorphae, Otidimorphae, Phaethoquornithes). RIs express asymmetrical support for deep relationships within the subclade Afroaves that hints at ancient gene flow involving the owl lineage (Strigiformes). Because DNA-sequence data are challenged by gene tree-reconstruction error, analysis of RIs represents one approach for improving gene tree-based methods when divergences are deep, internodes are short, terminal branches are long, and introgressive hybridization further confounds species−tree inference.
Collapse
Affiliation(s)
- John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
- Correspondence:
| | - Mark S. Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA;
| |
Collapse
|
10
|
Simmons MP, Springer MS, Gatesy J. Gene-tree misrooting drives conflicts in phylogenomic coalescent analyses of palaeognath birds. Mol Phylogenet Evol 2021; 167:107344. [PMID: 34748873 DOI: 10.1016/j.ympev.2021.107344] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 10/08/2021] [Accepted: 11/02/2021] [Indexed: 10/19/2022]
Abstract
Phylogenomic analyses of ancient rapid radiations can produce conflicting results that are driven by differential sampling of taxa and characters as well as the limitations of alternative analytical methods. We re-examine basal relationships of palaeognath birds (ratites and tinamous) using recently published datasets of nucleotide characters from 20,850 loci as well as 4301 retroelement insertions. The original studies attributed conflicting resolutions of rheas in their inferred coalescent and concatenation trees to concatenation failing in the anomaly zone. By contrast, we find that the coalescent-based resolution of rheas is premised upon extensive gene-tree estimation errors. Furthermore, retroelement insertions contain much more conflict than originally reported and multiple insertion loci support the basal position of rheas found in concatenation trees, while none were reported in the original publication. We demonstrate how even remarkable congruence in phylogenomic studies may be driven by long-branch misplacement of a divergent outgroup, highly incongruent gene trees, differential taxon sampling that can result in gene-tree misrooting errors that bias species-tree inference, and gross homology errors. What was previously interpreted as broad, robustly supported corroboration for a single resolution in coalescent analyses may instead indicate a common bias that taints phylogenomic results across multiple genome-scale datasets. The updated retroelement dataset now supports a species tree with branch lengths that suggest an ancient anomaly zone, and both concatenation and coalescent analyses of the huge nucleotide datasets fail to yield coherent, reliable results in this challenging phylogenetic context.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| |
Collapse
|
11
|
Molloy EK, Gatesy J, Springer MS. Theoretical and practical considerations when using retroelement insertions to estimate species trees in the anomaly zone. Syst Biol 2021; 71:721-740. [PMID: 34677617 DOI: 10.1093/sysbio/syab086] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin-Sokal parsimony typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e. local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study, provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100 000 parsimony-informative RIs. We found that, when given just 1 000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e clades separated by branches > 0.3 CUs) with high support and identified rapid radiations (i.e. shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios.
Collapse
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, College Park, 20742, USA
| | - John Gatesy
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, Riverside, 92521, USA
| |
Collapse
|
12
|
Phylogenomics, floral evolution, and biogeography of Lithospermum L. (Boraginaceae). Mol Phylogenet Evol 2021; 166:107317. [PMID: 34547439 DOI: 10.1016/j.ympev.2021.107317] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 08/29/2021] [Accepted: 09/15/2021] [Indexed: 11/23/2022]
Abstract
Lithospermum (Boraginaceae), a geographically cosmopolitan medium-sized genus, includes diverse floral morphology, with variation in corolla size and shape and in breeding system. Over the past decade, multiple studies have examined the evolutionary history of Lithospermum, with most utilizing DNA regions from the plastid genome and/or the nuclear ribosomal internal transcribed spacer. These studies have, in general, not resulted in well-resolved and well-supported phylogenies. In the present study, 298 nuclear DNA regions, amplified via target sequence capture, were utilized for phylogenetic reconstruction for Lithospermum and relatives in Boraginaceae, and patterns of floral evolution, species diversification, and biogeography were examined. Based on multiple phylogenetic methods, Lithospermum is resolved as monophyletic, and the New World species of the genus are also monophyletic. While minimal phylogenetic incongruence is resolved within the nuclear genome, incongruence between the nuclear and plastid genomes is recovered. This is likely due to incomplete lineage sorting during early diversification of the genus in the Americas approximately 7.8 million years ago. At least four shifts to longer corollas are identified throughout Lithospermum, and this may be due to selection for hummingbird-pollinated flowers, particularly for species in Mexico and the southwestern United States. In the New World, one clade of species of the genus diversified primarily across the United States and Canada, and another radiated throughout the mountains of Mexico.
Collapse
|
13
|
Cunha TJ, Reimer JD, Giribet G. Investigating Sources of Conflict in Deep Phylogenomics of Vetigastropod Snails. Syst Biol 2021; 71:1009-1022. [PMID: 34469579 PMCID: PMC9249062 DOI: 10.1093/sysbio/syab071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 08/25/2021] [Accepted: 08/27/2021] [Indexed: 11/17/2022] Open
Abstract
Phylogenetic analyses may suffer from multiple sources of error leading to conflict
between genes and methods of inference. The evolutionary history of the mollusc clade
Vetigastropoda makes them susceptible to these conflicts, their higher level phylogeny
remaining largely unresolved. Originating over 350 Ma, vetigastropods were the dominant
marine snails in the Paleozoic. Multiple extinction events and new radiations have
resulted in both very long and very short branches and a large extant diversity of over
4000 species. This is the perfect setting of a hard phylogenetic question in which sources
of conflict can be explored. We present 41 new transcriptomes across the diversity of
vetigastropods (62 terminals total), and provide the first genomic-scale phylogeny for the
group. We find that deep divergences differ from previous studies in which long branch
attraction was likely pervasive. Robust results leading to changes in taxonomy include the
paraphyly of the order Lepetellida and the family Tegulidae. Tectinae subfam.
nov. is designated for the clade comprising Tectus, Cittarium,
and Rochia. For two early divergences, topologies disagreed between
concatenated analyses using site heterogeneous models versus concatenated partitioned
analyses and summary coalescent methods. We investigated rate and composition
heterogeneity among genes, as well as missing data by locus and by taxon, none of which
had an impact on the inferred topologies. We also found no evidence for ancient
introgression throughout the phylogeny. We further tested whether uninformative genes and
over-partitioning were responsible for this discordance by evaluating the phylogenetic
signal of individual genes using likelihood mapping, and by analyzing the most informative
genes with a full multispecies coalescent (MSC) model. We find that most genes are not
informative at the two conflicting nodes, but neither this nor gene-wise partitioning are
the cause of discordant results. New method implementations that simultaneously integrate
amino acid profile mixture models and the MSC might be necessary to resolve these and
other recalcitrant nodes in the Tree of Life. [Fissurellidae; Haliotidae; likelihood
mapping; multispecies coalescent; phylogenetic signal; phylogenomic conflict; site
heterogeneity; Trochoidea.]
Collapse
Affiliation(s)
- Tauana Junqueira Cunha
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge MA 02138, USA.,Smithsonian Tropical Research Institute, Panama City, Panama
| | - James Davis Reimer
- Molecular Invertebrate Systematics and Ecology, University of the Ryukyus, 1 Senbaru, Nishihara, Okinawa 903-0213, Japan.,Tropical Biosphere Research Center, University of the Ryukyus, 1 Senbaru, Nishihara, Okinawa 903-0213, Japan
| | - Gonzalo Giribet
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge MA 02138, USA
| |
Collapse
|
14
|
Harrington RC, Friedman M, Miya M, Near TJ, Campbell MA. Phylogenomic resolution of the monotypic and enigmatic
Amarsipus
, the Bagless Glassfish (Teleostei, Amarsipidae). ZOOL SCR 2021. [DOI: 10.1111/zsc.12477] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
| | - Matt Friedman
- Museum of Paleontology and Department of Earth and Environmental Sciences University of Michigan Ann Arbor MIUSA
| | - Masaki Miya
- Natural History Museum and Institute, Chiba Chiba Japan
| | - Thomas J. Near
- Department of Ecology and Evolutionary Biology Yale University New Haven CTUSA
- Peabody Museum Yale University New Haven CTUSA
| | | |
Collapse
|
15
|
Shen XX, Steenwyk JL, Rokas A. Dissecting incongruence between concatenation- and quartet-based approaches in phylogenomic data. Syst Biol 2021; 70:997-1014. [PMID: 33616672 DOI: 10.1093/sysbio/syab011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict between likelihood-based signal (quantified by the difference in gene-wise log likelihood score or ΔGLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or ΔGQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30% - 36% of genes in each data matrix are inconsistent, that is, each of these genes has higher log likelihood score for T1 versus T2 (i.e., ΔGLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., ΔGQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that removal of inconsistent genes from datasets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from datasets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.
Collapse
Affiliation(s)
- Xing-Xing Shen
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China.,Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
16
|
Collapsing dubiously resolved gene-tree branches in phylogenomic coalescent analyses. Mol Phylogenet Evol 2021; 158:107092. [PMID: 33545272 DOI: 10.1016/j.ympev.2021.107092] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 12/30/2020] [Accepted: 01/28/2021] [Indexed: 01/15/2023]
Abstract
In two-step coalescent analyses of phylogenomic data, gene-tree topologies are treated as fixed prior to species-tree inference. Although all gene-tree conflict is assumed to be caused by lineage sorting when applying these methods, in empirical datasets much of the conflict can be caused by estimation error. Weakly supported and even arbitrarily resolved clades are important sources of this estimation error for gene trees inferred from few informative characters relative to the number of sampled terminals, and the resulting extraneous conflict among gene trees can negatively impact species-tree inference. In this study, we quantified the relative severity of alternative methods for collapsing gene-tree branches for seven empirical datasets and quantified their effects on species-tree inference. The branch-collapsing methods that we employed were based on the strict consensus of optimal topologies, various bootstrap thresholds, and 0% approximate likelihood ratio test (SH-like aLRT) support. Up to 86% of internal gene-tree branches are dubiously or arbitrarily resolved in reanalyses of these published phylogenomic datasets, and collapsing these branches increased inferred species-tree coalescent branch lengths by up to 455%. For two datasets, the longer inferred branch lengths sometimes impacted inference of anomaly-zone conditions. Although branch-collapsing methods did not consistently affect the species-tree topology, they often increased branch support. The more severe and clearly justified gene-tree branch-collapsing methods, which we recommend be broadly applied for two-step coalescent analyses, are use of the strict consensus in parsimony analyses and the collapse clades with 0% SH-like aLRT support in likelihood analyses. Collapsing dubiously or arbitrarily resolved branches in gene trees sometimes improved congruence between coalescent-based results and concatenation trees. In such cases, we contend that the resolution provided by concatenation should be preferred and that incomplete lineage sorting is a poor explanation for the initial conflict between phylogenetic approaches.
Collapse
|
17
|
New Approaches for Inferring Phylogenies in the Presence of Paralogs. Trends Genet 2021; 37:174-187. [DOI: 10.1016/j.tig.2020.08.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/13/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022]
|
18
|
Jiang X, Edwards SV, Liu L. The Multispecies Coalescent Model Outperforms Concatenation Across Diverse Phylogenomic Data Sets. Syst Biol 2021; 69:795-812. [PMID: 32011711 PMCID: PMC7302055 DOI: 10.1093/sysbio/syaa008] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 12/24/2019] [Accepted: 01/02/2020] [Indexed: 11/30/2022] Open
Abstract
A statistical framework of model comparison and model validation is essential to resolving the debates over concatenation and coalescent models in phylogenomic data analysis. A set of statistical tests are here applied and developed to evaluate and compare the adequacy of substitution, concatenation, and multispecies coalescent (MSC) models across 47 phylogenomic data sets collected across tree of life. Tests for substitution models and the concatenation assumption of topologically congruent gene trees suggest that a poor fit of substitution models, rejected by 44% of loci, and concatenation models, rejected by 38% of loci, is widespread. Logistic regression shows that the proportions of GC content and informative sites are both negatively correlated with the fit of substitution models across loci. Moreover, a substantial violation of the concatenation assumption of congruent gene trees is consistently observed across six major groups (birds, mammals, fish, insects, reptiles, and others, including other invertebrates). In contrast, among those loci adequately described by a given substitution model, the proportion of loci rejecting the MSC model is 11%, significantly lower than those rejecting the substitution and concatenation models. Although conducted on reduced data sets due to computational constraints, Bayesian model validation and comparison both strongly favor the MSC over concatenation across all data sets; the concatenation assumption of congruent gene trees rarely holds for phylogenomic data sets with more than 10 loci. Thus, for large phylogenomic data sets, model comparisons are expected to consistently and more strongly favor the coalescent model over the concatenation model. We also found that loci rejecting the MSC have little effect on species tree estimation. Our study reveals the value of model validation and comparison in phylogenomic data analysis, as well as the need for further improvements of multilocus models and computational tools for phylogenetic inference. [Bayes factor; Bayesian model validation; coalescent prior; congruent gene trees; independent prior; Metazoa; posterior predictive simulation.]
Collapse
Affiliation(s)
- Xiaodong Jiang
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Liang Liu
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602, USA.,Institute of Bioinformatics, University of Georgia, 120 Green Street, Athens, GA 30602, USA
| |
Collapse
|
19
|
Bossert S, Murray EA, Pauly A, Chernyshov K, Brady SG, Danforth BN. Gene Tree Estimation Error with Ultraconserved Elements: An Empirical Study on Pseudapis Bees. Syst Biol 2020; 70:803-821. [PMID: 33367855 DOI: 10.1093/sysbio/syaa097] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 11/18/2020] [Accepted: 12/02/2020] [Indexed: 11/12/2022] Open
Abstract
Summarizing individual gene trees to species phylogenies using two-step coalescent methods is now a standard strategy in the field of phylogenomics. However, practical implementations of summary methods suffer from gene tree estimation error, which is caused by various biological and analytical factors. Greatly understudied is the choice of gene tree inference method and downstream effects on species tree estimation for empirical data sets. To better understand the impact of this method choice on gene and species tree accuracy, we compare gene trees estimated through four widely used programs under different model-selection criteria: PhyloBayes, MrBayes, IQ-Tree, and RAxML. We study their performance in the phylogenomic framework of $>$800 ultraconserved elements from the bee subfamily Nomiinae (Halictidae). Our taxon sampling focuses on the genus Pseudapis, a distinct lineage with diverse morphological features, but contentious morphology-based taxonomic classifications and no molecular phylogenetic guidance. We approximate topological accuracy of gene trees by assessing their ability to recover two uncontroversial, monophyletic groups, and compare branch lengths of individual trees using the stemminess metric (the relative length of internal branches). We further examine different strategies of removing uninformative loci and the collapsing of weakly supported nodes into polytomies. We then summarize gene trees with ASTRAL and compare resulting species phylogenies, including comparisons to concatenation-based estimates. Gene trees obtained with the reversible jump model search in MrBayes were most concordant on average and all Bayesian methods yielded gene trees with better stemminess values. The only gene tree estimation approach whose ASTRAL summary trees consistently produced the most likely correct topology, however, was IQ-Tree with automated model designation (ModelFinder program). We discuss these findings and provide practical advice on gene tree estimation for summary methods. Lastly, we establish the first phylogeny-informed classification for Pseudapis s. l. and map the distribution of distinct morphological features of the group. [ASTRAL; Bees; concordance; gene tree estimation error; IQ-Tree; MrBayes, Nomiinae; PhyloBayes; RAxML; phylogenomics; stemminess].
Collapse
Affiliation(s)
- Silas Bossert
- Department of Entomology, Cornell University, Comstock Hall, Ithaca, NY 14853, USA.,Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.,Department of Entomology, Washington State University, Pullman, Washington 99164, USA
| | - Elizabeth A Murray
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.,Department of Entomology, Washington State University, Pullman, Washington 99164, USA
| | - Alain Pauly
- O.D. Taxonomy and Phylogeny, Royal Belgian Institute of Natural Sciences, Rue Vautier 29, 1000 Brussels, Belgium
| | - Kyrylo Chernyshov
- College of Arts and Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Seán G Brady
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - Bryan N Danforth
- Department of Entomology, Cornell University, Comstock Hall, Ithaca, NY 14853, USA
| |
Collapse
|
20
|
Zhang X, Sun Y, Landis JB, Lv Z, Shen J, Zhang H, Lin N, Li L, Sun J, Deng T, Sun H, Wang H. Plastome phylogenomic study of Gentianeae (Gentianaceae): widespread gene tree discordance and its association with evolutionary rate heterogeneity of plastid genes. BMC PLANT BIOLOGY 2020; 20:340. [PMID: 32680458 PMCID: PMC7368685 DOI: 10.1186/s12870-020-02518-w] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 06/24/2020] [Indexed: 05/10/2023]
Abstract
BACKGROUND Plastome-scale data have been prevalent in reconstructing the plant Tree of Life. However, phylogenomic studies currently based on plastomes rely primarily on maximum likelihood inference of concatenated alignments of plastid genes, and thus phylogenetic discordance produced by individual plastid genes has generally been ignored. Moreover, structural and functional characteristics of plastomes indicate that plastid genes may not evolve as a single locus and are experiencing different evolutionary forces, yet the genetic characteristics of plastid genes within a lineage remain poorly studied. RESULTS We sequenced and annotated 10 plastome sequences of Gentianeae. Phylogenomic analyses yielded robust relationships among genera within Gentianeae. We detected great variation of gene tree topologies and revealed that more than half of the genes, including one (atpB) of the three widely used plastid markers (rbcL, atpB and matK) in phylogenetic inference of Gentianeae, are likely contributing to phylogenetic ambiguity of Gentianeae. Estimation of nucleotide substitution rates showed extensive rate heterogeneity among different plastid genes and among different functional groups of genes. Comparative analysis suggested that the ribosomal protein (RPL and RPS) genes and the RNA polymerase (RPO) genes have higher substitution rates and genetic variations among plastid genes in Gentianeae. Our study revealed that just one (matK) of the three (matK, ndhB and rbcL) widely used markers show high phylogenetic informativeness (PI) value. Due to the high PI and lowest gene-tree discordance, rpoC2 is advocated as a promising plastid DNA barcode for taxonomic studies of Gentianeae. Furthermore, our analyses revealed a positive correlation of evolutionary rates with genetic variation of plastid genes, but a negative correlation with gene-tree discordance under purifying selection. CONCLUSIONS Overall, our results demonstrate the heterogeneity of nucleotide substitution rates and genetic characteristics among plastid genes providing new insights into plastome evolution, while highlighting the necessity of considering gene-tree discordance into phylogenomic studies based on plastome-scale data.
Collapse
Affiliation(s)
- Xu Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Yanxia Sun
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
| | - Jacob B Landis
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA, 92507, USA
- School of Integrative Plant Science, Section of Plant Biology and the L.H. Bailey Hortorium, Cornell University, Ithaca, NY, 14850, USA
| | - Zhenyu Lv
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Jun Shen
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Huajie Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
| | - Nan Lin
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lijuan Li
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jiao Sun
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tao Deng
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Hang Sun
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China.
| | - Hengchang Wang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
| |
Collapse
|
21
|
Abstract
Background To account for genome-wide discordance among gene trees, several widely-used methods seek to find a species tree with the minimum distance to input gene trees. To efficiently explore the large space of species trees, some of these methods, including ASTRAL, use dynamic programming (DP). The DP paradigm can restrict the search space, and thus, ASTRAL and similar methods use heuristic methods to define a restricted search space. However, arbitrary constraints provided by the user on the output tree cannot be trivially incorporated into such restrictions. The ability to infer trees that honor user-defined constraints is needed for many phylogenetic analyses, but no solution currently exists for constraining the output of ASTRAL. Results We introduce methods that enable the ASTRAL dynamic programming to infer constrained trees in an effective and scalable manner. To do so, we adopt a recently developed tree completion algorithm and extend it to allow multifurcating input and output trees. In simulation studies, we show that the approach for honoring constraints is both effective and fast. On real data, we show that constrained searches can help interrogate branches not recovered in the optimal ASTRAL tree to reveal support for alternative hypotheses. Conclusions The new algorithm is added ASTRAL to all user-provided constraints on the species tree.
Collapse
Affiliation(s)
- Maryam Rabiee
- Department of Computer Science and Engineering, UC San Diego, 9500 Gilman Dr, La Jolla, 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, 9500 Gilman Dr, La Jolla, 92093, USA.
| |
Collapse
|
22
|
Springer MS, Molloy EK, Sloan DB, Simmons MP, Gatesy J. ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets. J Hered 2019; 111:147-168. [DOI: 10.1093/jhered/esz076] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 12/12/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the “no intralocus-recombination” assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.
Collapse
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA
| | - Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO
| | - Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY
| |
Collapse
|
23
|
Simmons MP, Kessenich J. Divergence and support among slightly suboptimal likelihood gene trees. Cladistics 2019; 36:322-340. [DOI: 10.1111/cla.12404] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2019] [Indexed: 12/18/2022] Open
Affiliation(s)
- Mark P. Simmons
- Department of Biology Colorado State University Fort Collins CO 80523‐1878 USA
| | - John Kessenich
- 305 W. Magnolia Street PMB 134 Fort Collins CO 80521 USA
| |
Collapse
|
24
|
Jones KE, Fér T, Schmickl RE, Dikow RB, Funk VA, Herrando‐Moraira S, Johnston PR, Kilian N, Siniscalchi CM, Susanna A, Slovák M, Thapa R, Watson LE, Mandel JR. An empirical assessment of a single family-wide hybrid capture locus set at multiple evolutionary timescales in Asteraceae. APPLICATIONS IN PLANT SCIENCES 2019; 7:e11295. [PMID: 31667023 PMCID: PMC6814182 DOI: 10.1002/aps3.11295] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 09/05/2019] [Indexed: 05/23/2023]
Abstract
PREMISE Hybrid capture with high-throughput sequencing (Hyb-Seq) is a powerful tool for evolutionary studies. The applicability of an Asteraceae family-specific Hyb-Seq probe set and the outcomes of different phylogenetic analyses are investigated here. METHODS Hyb-Seq data from 112 Asteraceae samples were organized into groups at different taxonomic levels (tribe, genus, and species). For each group, data sets of non-paralogous loci were built and proportions of parsimony informative characters estimated. The impacts of analyzing alternative data sets, removing long branches, and type of analysis on tree resolution and inferred topologies were investigated in tribe Cichorieae. RESULTS Alignments of the Asteraceae family-wide Hyb-Seq locus set were parsimony informative at all taxonomic levels. Levels of resolution and topologies inferred at shallower nodes differed depending on the locus data set and the type of analysis, and were affected by the presence of long branches. DISCUSSION The approach used to build a Hyb-Seq locus data set influenced resolution and topologies inferred in phylogenetic analyses. Removal of long branches improved the reliability of topological inferences in maximum likelihood analyses. The Astereaceae Hyb-Seq probe set is applicable at multiple taxonomic depths, which demonstrates that probe sets do not necessarily need to be lineage-specific.
Collapse
Affiliation(s)
- Katy E. Jones
- Botanischer Garten und Botanisches Museum BerlinFreie Universität BerlinKönigin‐Luise‐Str. 6–814195BerlinGermany
| | - Tomáš Fér
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2CZ 12800PragueCzech Republic
| | - Roswitha E. Schmickl
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2CZ 12800PragueCzech Republic
- Institute of BotanyThe Czech Academy of SciencesZámek 1CZ 25243PrůhoniceCzech Republic
| | - Rebecca B. Dikow
- Data Science LabOffice of the Chief Information OfficerSmithsonian InstitutionWashingtonD.C.20013‐7012USA
| | - Vicki A. Funk
- Department of BotanyNational Museum of Natural HistorySmithsonian InstitutionWashingtonD.C.20013‐7012USA
| | | | - Paul R. Johnston
- Freie Universität BerlinEvolutionary BiologyBerlinGermany
- Berlin Center for Genomics in Biodiversity ResearchBerlinGermany
- Leibniz‐Institute of Freshwater Ecology and Inland Fisheries (IGB)BerlinGermany
| | - Norbert Kilian
- Botanischer Garten und Botanisches Museum BerlinFreie Universität BerlinKönigin‐Luise‐Str. 6–814195BerlinGermany
| | - Carolina M. Siniscalchi
- Department of Biological SciencesUniversity of MemphisMemphisTennessee38152USA
- Center for BiodiversityUniversity of MemphisMemphisTennessee38152USA
| | - Alfonso Susanna
- Botanic Institute of Barcelona (IBB‐CSIC‐ICUB)Pg. del Migdia s.n.ES 08038BarcelonaSpain
| | - Marek Slovák
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2CZ 12800PragueCzech Republic
- Plant Science and Biodiversity CentreSlovak Academy of SciencesSK‐84523BratislavaSlovakia
| | - Ramhari Thapa
- Department of Biological SciencesUniversity of MemphisMemphisTennessee38152USA
- Center for BiodiversityUniversity of MemphisMemphisTennessee38152USA
| | - Linda E. Watson
- Department of Plant Biology, Ecology, and EvolutionOklahoma State UniversityStillwaterOklahoma74078USA
| | - Jennifer R. Mandel
- Department of Biological SciencesUniversity of MemphisMemphisTennessee38152USA
- Center for BiodiversityUniversity of MemphisMemphisTennessee38152USA
| |
Collapse
|