1
|
Zhang R, Drummond AJ, Mendes FK. Fast Bayesian Inference of Phylogenies from Multiple Continuous Characters. Syst Biol 2024; 73:102-124. [PMID: 38085256 PMCID: PMC11129596 DOI: 10.1093/sysbio/syad067] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/23/2023] [Accepted: 11/07/2023] [Indexed: 05/28/2024] Open
Abstract
Time-scaled phylogenetic trees are an ultimate goal of evolutionary biology and a necessary ingredient in comparative studies. The accumulation of genomic data has resolved the tree of life to a great extent, yet timing evolutionary events remain challenging if not impossible without external information such as fossil ages and morphological characters. Methods for incorporating morphology in tree estimation have lagged behind their molecular counterparts, especially in the case of continuous characters. Despite recent advances, such tools are still direly needed as we approach the limits of what molecules can teach us. Here, we implement a suite of state-of-the-art methods for leveraging continuous morphology in phylogenetics, and by conducting extensive simulation studies we thoroughly validate and explore our methods' properties. While retaining model generality and scalability, we make it possible to estimate absolute and relative divergence times from multiple continuous characters while accounting for uncertainty. We compile and analyze one of the most data-type diverse data sets to date, comprised of contemporaneous and ancient molecular sequences, and discrete and continuous morphological characters from living and extinct Carnivora taxa. We conclude by synthesizing lessons about our method's behavior, and suggest future research venues.
Collapse
Affiliation(s)
- Rong Zhang
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School 169857, Singapore
| | - Alexei J Drummond
- Centre for Computational Evolution, The University of Auckland, Auckland 1010, New Zealand
- School of Biological Sciences, The University of Auckland, Auckland 1010, New Zealand
| | - Fábio K Mendes
- Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| |
Collapse
|
2
|
Piwczyński M, Granjon L, Trzeciak P, Carlos Brito J, Oana Popa M, Daba Dinka M, Johnston NP, Boratyński Z. Unraveling phylogenetic relationships and species boundaries in the arid adapted Gerbillus rodents (Muridae: Gerbillinae) by RAD-seq data. Mol Phylogenet Evol 2023; 189:107913. [PMID: 37659480 DOI: 10.1016/j.ympev.2023.107913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 08/25/2023] [Accepted: 08/28/2023] [Indexed: 09/04/2023]
Abstract
Gerbillus is one of the most speciose genera among rodents, with ca. 51 recognized species. Previous attempts to reconstruct the evolutionary history of Gerbillus mainly relied on the mitochondrial cyt-b marker as a source of phylogenetic information. In this study, we utilize RAD-seq genomic data from 37 specimens representing 11 species to reconstruct the phylogenetic tree for Gerbillus, applying concatenation and coalescence methods. We identified four highly supported clades corresponding to the traditionally recognized subgenera: Dipodillus, Gerbillus, Hendecapleura and Monodia. Only two uncertain branches were detected in the resulting trees, with one leading to diversification of the main lineages in the genus, recognized by quartet sampling analysis as uncertain due to possible introgression. We also examined species boundaries for four pairs of sister taxa, including potentially new species from Morocco, using SNAPP. The results strongly supported a speciation model in which all taxa are treated as separate species. The dating analyses confirmed the Plio-Pleistocene diversification of the genus, with the uncertain branch coinciding with the beginning of aridification of the Sahara at the the Plio-Pleistocene boundary. This study aligns well with the earlier analyses based on the cyt-b marker, reaffirming its suitability as an adequate marker for estimating genetic diversity in Gerbillus.
Collapse
Affiliation(s)
- Marcin Piwczyński
- Department of Ecology and Biogeography, Nicolaus Copernicus University in Toruń, Lwowska 1, PL-87-100 Toruń, Poland.
| | - Laurent Granjon
- CBGP, IRD, CIRAD, INRAE, Institut Agro, Université de Montpellier, Montpellier, France
| | - Paulina Trzeciak
- Department of Ecology and Biogeography, Nicolaus Copernicus University in Toruń, Lwowska 1, PL-87-100 Toruń, Poland
| | - José Carlos Brito
- CIBIO-InBio, Research Center in Biodiversity and Genetic Resources, University of Porto, Campus de Vairão, Rua Padre Armando Quintas 7, 4485-661 Vairão, Portugal; BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão, Portugal; Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Madalina Oana Popa
- Department of Ecology and Biogeography, Nicolaus Copernicus University in Toruń, Lwowska 1, PL-87-100 Toruń, Poland; "Stejarul" Research Centre for Biological Sciences, National Institute of Research and Development for Biological Sciences, Alexandru cel Bun 6, RO-610004, Piatra Neamţ, Romania
| | - Mergi Daba Dinka
- Department of Ecology and Biogeography, Nicolaus Copernicus University in Toruń, Lwowska 1, PL-87-100 Toruń, Poland
| | - Nikolas P Johnston
- School of Life Sciences, University of Technology Sydney, 15 Broadway, Ultimo, NSW 2007, Australia; Centre for Sustainable Ecosystem Solutions, School of Earth, Atmospheric and Life Sciences, University of Wollongong, Northfields Ave, Wollongong, NSW 2500, Australia
| | - Zbyszek Boratyński
- CIBIO-InBio, Research Center in Biodiversity and Genetic Resources, University of Porto, Campus de Vairão, Rua Padre Armando Quintas 7, 4485-661 Vairão, Portugal; BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão, Portugal
| |
Collapse
|
3
|
Castro LA, Leitner T, Romero-Severson E. Recombination smooths the time signal disrupted by latency in within-host HIV phylogenies. Virus Evol 2023; 9:vead032. [PMID: 37397911 PMCID: PMC10313349 DOI: 10.1093/ve/vead032] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 04/07/2023] [Accepted: 05/15/2023] [Indexed: 07/04/2023] Open
Abstract
Within-host Human immunodeficiency virus (HIV) evolution involves several features that may disrupt standard phylogenetic reconstruction. One important feature is reactivation of latently integrated provirus, which has the potential to disrupt the temporal signal, leading to variation in the branch lengths and apparent evolutionary rates in a tree. Yet, real within-host HIV phylogenies tend to show clear, ladder-like trees structured by the time of sampling. Another important feature is recombination, which violates the fundamental assumption that evolutionary history can be represented by a single bifurcating tree. Thus, recombination complicates the within-host HIV dynamic by mixing genomes and creating evolutionary loop structures that cannot be represented in a bifurcating tree. In this paper, we develop a coalescent-based simulator of within-host HIV evolution that includes latency, recombination, and effective population size dynamics that allows us to study the relationship between the true, complex genealogy of within-host HIV evolution, encoded as an ancestral recombination graph (ARG), and the observed phylogenetic tree. To compare our ARG results to the familiar phylogeny format, we calculate the expected bifurcating tree after decomposing the ARG into all unique site trees, their combined distance matrix, and the overall corresponding bifurcating tree. While latency and recombination separately disrupt the phylogenetic signal, remarkably, we find that recombination recovers the temporal signal of within-host HIV evolution caused by latency by mixing fragments of old, latent genomes into the contemporary population. In effect, recombination averages over extant heterogeneity, whether it stems from mixed time signals or population bottlenecks. Furthermore, we establish that the signals of latency and recombination can be observed in phylogenetic trees despite being an incorrect representation of the true evolutionary history. Using an approximate Bayesian computation method, we develop a set of statistical probes to tune our simulation model to nine longitudinally sampled within-host HIV phylogenies. Because ARGs are exceedingly difficult to infer from real HIV data, our simulation system allows investigating effects of latency, recombination, and population size bottlenecks by matching decomposed ARGs to real data as observed in standard phylogenies.
Collapse
Affiliation(s)
| | - Thomas Leitner
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | |
Collapse
|
4
|
Detecting macroevolutionary genotype-phenotype associations using error-corrected rates of protein convergence. Nat Ecol Evol 2023; 7:155-170. [PMID: 36604553 PMCID: PMC9834058 DOI: 10.1038/s41559-022-01932-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 10/12/2022] [Indexed: 01/07/2023]
Abstract
On macroevolutionary timescales, extensive mutations and phylogenetic uncertainty mask the signals of genotype-phenotype associations underlying convergent evolution. To overcome this problem, we extended the widely used framework of non-synonymous to synonymous substitution rate ratios and developed the novel metric ωC, which measures the error-corrected convergence rate of protein evolution. While ωC distinguishes natural selection from genetic noise and phylogenetic errors in simulation and real examples, its accuracy allows an exploratory genome-wide search of adaptive molecular convergence without phenotypic hypothesis or candidate genes. Using gene expression data, we explored over 20 million branch combinations in vertebrate genes and identified the joint convergence of expression patterns and protein sequences with amino acid substitutions in functionally important sites, providing hypotheses on undiscovered phenotypes. We further extended our method with a heuristic algorithm to detect highly repetitive convergence among computationally non-trivial higher-order phylogenetic combinations. Our approach allows bidirectional searches for genotype-phenotype associations, even in lineages that diverged for hundreds of millions of years.
Collapse
|
5
|
Stubbs RL, Theodoridis S, Mora‐Carrera E, Keller B, Yousefi N, Potente G, Léveillé‐Bourret É, Celep F, Kochjarová J, Tedoradze G, Eaton DAR, Conti E. Whole-genome analyses disentangle reticulate evolution of primroses in a biodiversity hotspot. THE NEW PHYTOLOGIST 2023; 237:656-671. [PMID: 36210520 PMCID: PMC10099377 DOI: 10.1111/nph.18525] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
Biodiversity hotspots, such as the Caucasus mountains, provide unprecedented opportunities for understanding the evolutionary processes that shape species diversity and richness. Therefore, we investigated the evolution of Primula sect. Primula, a clade with a high degree of endemism in the Caucasus. We performed phylogenetic and network analyses of whole-genome resequencing data from the entire nuclear genome, the entire chloroplast genome, and the entire heterostyly supergene. The different characteristics of the genomic partitions and the resulting phylogenetic incongruences enabled us to disentangle evolutionary histories resulting from tokogenetic vs cladogenetic processes. We provide the first phylogeny inferred from the heterostyly supergene that includes all species of Primula sect. Primula. Our results identified recurrent admixture at deep nodes between lineages in the Caucasus as the cause of non-monophyly in Primula. Biogeographic analyses support the 'out-of-the-Caucasus' hypothesis, emphasizing the importance of this hotspot as a cradle for biodiversity. Our findings provide novel insights into causal processes of phylogenetic discordance, demonstrating that genome-wide analyses from partitions with contrasting genetic characteristics and broad geographic sampling are crucial for disentangling the diversification of species-rich clades in biodiversity hotspots.
Collapse
Affiliation(s)
- Rebecca L. Stubbs
- Department of Systematic and Evolutionary BotanyUniversity of ZurichZollikerstrasse 107Zurich8008Switzerland
| | - Spyros Theodoridis
- Senckenberg Biodiversity and Climate Research Centre (SBiK‐F)Frankfurt am Main60325Germany
| | - Emiliano Mora‐Carrera
- Department of Systematic and Evolutionary BotanyUniversity of ZurichZollikerstrasse 107Zurich8008Switzerland
| | - Barbara Keller
- Department of Systematic and Evolutionary BotanyUniversity of ZurichZollikerstrasse 107Zurich8008Switzerland
| | - Narjes Yousefi
- Department of Systematic and Evolutionary BotanyUniversity of ZurichZollikerstrasse 107Zurich8008Switzerland
| | - Giacomo Potente
- Department of Systematic and Evolutionary BotanyUniversity of ZurichZollikerstrasse 107Zurich8008Switzerland
| | - Étienne Léveillé‐Bourret
- Département de Sciences Biologiques, Institut de Recherche en Biologie Végétale (IRBV)Université de MontréalQuébecH1X 2B2Canada
| | - Ferhat Celep
- Department of Biology, Faculty of Arts and SciencesKırıkkale UniversityKırıkkale71450Turkey
| | - Judita Kochjarová
- Department of Phytology, Faculty of ForestryTechnical University in ZvolenZvolen96001Slovak Republic
| | - Giorgi Tedoradze
- Department of Plant Systematics and Geography, Institute of BotanyIlia State UniversityTbilisi0105Georgia
| | - Deren A. R. Eaton
- Department of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNY10027USA
| | - Elena Conti
- Department of Systematic and Evolutionary BotanyUniversity of ZurichZollikerstrasse 107Zurich8008Switzerland
| |
Collapse
|
6
|
Hill M, Roch S. Inconsistency of Triplet-Based and Quartet-Based Species Tree Estimation under Intralocus Recombination. J Comput Biol 2022; 29:1173-1197. [PMID: 36048557 DOI: 10.1089/cmb.2022.0265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We consider species tree estimation from multiple loci subject to intralocus recombination. We focus on R∗, a summary coalescent-based method using rooted triplets, as well as a related quartet-based inference method. We demonstrate analytically that in both cases, intralocus recombination gives rise to an inconsistency zone, in which correct inference is not assured even in the limit of infinite amount of data. In addition, we validate and characterize this inconsistency zone through a simulation study, which suggests that differential rates of recombination between closely related taxa can amplify the effect of incomplete lineage sorting and contribute to inconsistency.
Collapse
Affiliation(s)
- Max Hill
- Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Sebastien Roch
- Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
7
|
Wang N, Braun EL, Liang B, Cracraft J, Smith SA. Categorical edge-based analyses of phylogenomic data reveal conflicting signals for difficult relationships in the avian tree. Mol Phylogenet Evol 2022; 174:107550. [PMID: 35691570 DOI: 10.1016/j.ympev.2022.107550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 05/13/2022] [Accepted: 06/02/2022] [Indexed: 11/28/2022]
Abstract
Phylogenetic analyses fail to yield a satisfactory resolution of some relationships in the tree of life even with genome-scale datasets, so the failure is unlikely to reflect limitations in the amount of data. Gene tree conflicts are particularly notable in studies focused on these contentious nodes, and taxon sampling, different analytical methods, and/or data type effects can further confound analyses. Although many efforts have been made to incorporate biological conflicts, few studies have curated individual genes for their efficiency in phylogenomic studies. Here, we conduct an edge-based analysis of Neoavian evolution, examining the phylogenetic efficacy of two recent phylogenomic bird datasets and three datatypes (ultraconserved elements [UCEs], introns, and coding regions). We assess the potential causes for biases in signal-resolution for three difficult nodes: the earliest divergence of Neoaves, the position of the enigmatic Hoatzin (Opisthocomus hoazin), and the position of owls (Strigiformes). We observed extensive conflict among genes for all data types and datasets even after meticulous curation. Edge-based analyses (EBA) increased congruence and provided information about the impact of data type, GC content variation (GCCV), and outlier genes on each of nodes we examined. First, outlier gene signals appeared to drive different patterns of support for the relationships among the earliest diverging Neoaves. Second, the placement of Hoatzin was highly variable, although our EBA did reveal a previously unappreciated data type effect with an impact on its position. It also revealed that the resolution with the most support here was Hoatzin + shorebirds. Finally, GCCV, rather than data type (i.e., coding vs non-coding) per se, was correlated with a signal that supports monophyly of owls + Accipitriformes (hawks, eagles, and vultures). Eliminating high GCCV loci increased the signal for owls + mousebirds. Categorical EBA was able to reveal the nature of each edge and provide a way to highlight especially problematic branches that warrant a further examination. The current study increases our understanding about the contentious parts of the avian tree, which show even greater conflicts than appreciated previously.
Collapse
Affiliation(s)
- Ning Wang
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China; Department of Ecology & Evolutionary Biology, University of Michigan, 1105 N University Ave, Ann Arbor, MI 48109-1048, USA; Department of Ornithology, American Museum of Natural History, New York, NY 10024, USA.
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL 32607, USA
| | - Bin Liang
- College of Life Sciences, Inner Mongolia University, Hohhot 010070, China; Department of Ecology & Evolutionary Biology, University of Michigan, 1105 N University Ave, Ann Arbor, MI 48109-1048, USA
| | - Joel Cracraft
- Department of Ornithology, American Museum of Natural History, New York, NY 10024, USA
| | - Stephen A Smith
- Department of Ecology & Evolutionary Biology, University of Michigan, 1105 N University Ave, Ann Arbor, MI 48109-1048, USA
| |
Collapse
|
8
|
Smith ML, Vanderpool D, Hahn MW. Using all gene families vastly expands data available for phylogenomic inference. Mol Biol Evol 2022; 39:6596367. [PMID: 35642314 PMCID: PMC9178227 DOI: 10.1093/molbev/msac112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Traditionally, single-copy orthologs have been the gold standard in phylogenomics. Most phylogenomic studies identify putative single-copy orthologs using clustering approaches and retain families with a single sequence per species. This limits the amount of data available by excluding larger families. Recent advances have suggested several ways to include data from larger families. For instance, tree-based decomposition methods facilitate the extraction of orthologs from large families. Additionally, several methods for species tree inference are robust to the inclusion of paralogs and could use all of the data from larger families. Here, we explore the effects of using all families for phylogenetic inference by examining relationships among 26 primate species in detail and by analyzing five additional data sets. We compare single-copy families, orthologs extracted using tree-based decomposition approaches, and all families with all data. We explore several species tree inference methods, finding that identical trees are returned across nearly all subsets of the data and methods for primates. The relationships among Platyrrhini remain contentious; however, the species tree inference method matters more than the subset of data used. Using data from larger gene families drastically increases the number of genes available and leads to consistent estimates of branch lengths, nodal certainty and concordance, and inferences of introgression in primates. For the other data sets, topological inferences are consistent whether single-copy families or orthologs extracted using decomposition approaches are analyzed. Using larger gene families is a promising approach to include more data in phylogenomics without sacrificing accuracy, at least when high-quality genomes are available.
Collapse
Affiliation(s)
- Megan L Smith
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, Indiana, USA
| | - Dan Vanderpool
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, Indiana, USA
| | - Matthew W Hahn
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, Indiana, USA
| |
Collapse
|
9
|
Schull JK, Turakhia Y, Hemker JA, Dally WJ, Bejerano G. Champagne: Automated Whole-Genome Phylogenomic Character Matrix Method Using Large Genomic Indels for Homoplasy-Free Inference. Genome Biol Evol 2022; 14:evac013. [PMID: 35171243 PMCID: PMC8920512 DOI: 10.1093/gbe/evac013] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2022] [Indexed: 11/14/2022] Open
Abstract
We present Champagne, a whole-genome method for generating character matrices for phylogenomic analysis using large genomic indel events. By rigorously picking orthologous genes and locating large insertion and deletion events, Champagne delivers a character matrix that considerably reduces homoplasy compared with morphological and nucleotide-based matrices, on both established phylogenies and difficult-to-resolve nodes in the mammalian tree. Champagne provides ample evidence in the form of genomic structural variation to support incomplete lineage sorting and possible introgression in Paenungulata and human-chimp-gorilla which were previously inferred primarily through matrices composed of aligned single-nucleotide characters. Champagne also offers further evidence for Myomorpha as sister to Sciuridae and Hystricomorpha in the rodent tree. Champagne harbors distinct theoretical advantages as an automated method that produces nearly homoplasy-free character matrices on the whole-genome scale.
Collapse
Affiliation(s)
- James K Schull
- Department of Computer Science, Stanford University, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, USA
| | - James A Hemker
- Department of Computer Science, Stanford University, USA
| | - William J Dally
- Department of Computer Science, Stanford University, USA
- NVIDIA, Santa Clara, California, USA
- Department of Electrical Engineering, Stanford University, USA
| | - Gill Bejerano
- Department of Computer Science, Stanford University, USA
- Department of Developmental Biology, Stanford University, USA
- Department of Biomedical Data Science, Stanford University, USA
- Department of Pediatrics, Stanford University, USA
| |
Collapse
|
10
|
Probing the genomic limits of de-extinction in the Christmas Island rat. Curr Biol 2022; 32:1650-1656.e3. [PMID: 35271794 PMCID: PMC9044923 DOI: 10.1016/j.cub.2022.02.027] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 01/24/2022] [Accepted: 02/07/2022] [Indexed: 12/17/2022]
Abstract
Three principal methods are under discussion as possible pathways to “true” de-extinction; i.e., back-breeding, cloning, and genetic engineering.1,2 Of these, while the latter approach is most likely to apply to the largest number of extinct species, its potential is constrained by the degree to which the extinct species genome can be reconstructed. We explore this question using the extinct Christmas Island rat (Rattus macleari) as a model, an endemic rat species that was driven extinct between 1898 and 1908.3, 4, 5 We first re-sequenced its genome to an average of >60× coverage, then mapped it to the reference genomes of different Rattus species. We then explored how evolutionary divergence from the extant reference genome affected the fraction of the Christmas Island rat genome that could be recovered. Our analyses show that even when the extremely high-quality Norway brown rat (R. norvegicus) is used as a reference, nearly 5% of the genome sequence is unrecoverable, with 1,661 genes recovered at lower than 90% completeness, and 26 completely absent. Furthermore, we find the distribution of regions affected is not random, but for example, if 90% completeness is used as the cutoff, genes related to immune response and olfaction are excessively affected. Ultimately, our approach demonstrates the importance of applying similar analyses to candidates for de-extinction through genome editing in order to provide critical baseline information about how representative the edited form would be of the extinct species. Evolutionary divergence limits the completeness of extinct species genomes The extinct Christmas Island rat was re-sequenced to ca. 60× coverage Nevertheless, 4.85% of the Norway brown rat genome remains absent after mapping Absences are not random; immune response and olfaction are excessively affected
Collapse
|
11
|
Vankan M, Ho SYW, Duchêne DA. Evolutionary Rate Variation Among Lineages in Gene Trees has a Negative Impact on Species-Tree Inference. Syst Biol 2021; 71:490-500. [PMID: 34255084 PMCID: PMC8830059 DOI: 10.1093/sysbio/syab051] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 06/18/2021] [Indexed: 11/12/2022] Open
Abstract
Phylogenetic analyses of genomic data provide a powerful means of reconstructing the evolutionary relationships among organisms, yet such analyses are often hindered by conflicting phylogenetic signals among loci. Identifying the signals that are most influential to species-tree estimation can help to inform the choice of data for phylogenomic analysis. We investigated this in an analysis of 30 phylogenomic data sets. For each data set, we examined the association between several branch-length characteristics of gene trees and the distance between these gene trees and the corresponding species trees. We found that the distance of each gene tree to the species tree inferred from the full data set was positively associated with variation in root-to-tip distances and negatively associated with mean branch support. However, no such associations were found for gene-tree length, a measure of the overall substitution rate at each locus. We further explored the usefulness of the best-performing branch-based characteristics for selecting loci for phylogenomic analyses. We found that loci that yield gene trees with high variation in root-to-tip distances have a disproportionately distant signal of tree topology compared with the complete data sets. These results suggest that rate variation across lineages should be taken into consideration when exploring and even selecting loci for phylogenomic analysis.[Branch support; data filtering; nucleotide substitution model; phylogenomics; substitution rate; summary coalescent methods.]
Collapse
Affiliation(s)
- Mezzalina Vankan
- School of Life and Environmental Sciences, University of Sydney, NSW 2006, Australia.,Research School of Biology, Australian National University, ACT 2601, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, NSW 2006, Australia
| | - David A Duchêne
- Research School of Biology, Australian National University, ACT 2601, Australia.,Centre for Evolutionary Hologenomics, University of Copenhagen, Copenhagen 1352, Denmark
| |
Collapse
|
12
|
Doyle JJ. Defining coalescent genes: Theory meets practice in organelle phylogenomics. Syst Biol 2021; 71:476-489. [PMID: 34191012 DOI: 10.1093/sysbio/syab053] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 06/24/2021] [Accepted: 06/28/2021] [Indexed: 11/13/2022] Open
Abstract
The species tree paradigm that dominates current molecular systematic practice infers species trees from collections of sequences under assumptions of the multispecies coalescent (MSC), i.e., that there is free recombination between the sequences and no (or very low) recombination within them. These coalescent genes (c-genes) are thus defined in an historical rather than molecular sense, and can in theory be as large as an entire genome or as small as a single nucleotide. A debate about how to define c-genes centers on the contention that nuclear gene sequences used in many coalescent analyses undergo too much recombination, such that their introns comprise multiple c-genes, violating a key assumption of the MSC. Recently a similar argument has been made for the genes of plastid (e.g., chloroplast) and mitochondrial genomes, which for the last 30 or more years have been considered to represent a single c-gene for the purposes of phylogeny reconstruction because they are non-recombining in a historical sense. Consequently, it has been suggested that these genomes should be analyzed using coalescent methods that treat their genes-over 70 protein-coding genes in the case of most plastid genomes (plastomes)-as independent estimates of species phylogeny, in contrast to the usual practice of concatenation, which is appropriate for generating gene trees. However, although recombination certainly occurs in the plastome, as has been recognized since the 1970's, it is unlikely to be phylogenetically relevant. This is because such historically effective recombination can only occur when plastomes with incongruent histories are brought together in the same plastid. However, plastids sort rapidly into different cell lineages and rarely fuse. Thus, because of plastid biology, the plastome is a more canonical c-gene than is the average multi-intron mammalian nuclear gene. The plastome should thus continue to be treated as a single estimate of the underlying species phylogeny, as should the mitochondrial genome. The implications of this long-held insight of molecular systematics for studies in the phylogenomic era are explored.
Collapse
Affiliation(s)
- Jeff J Doyle
- Plant Biology Section, Plant Breeding & Genetics Section, and L. H. Bailey Hortorium, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| |
Collapse
|
13
|
Onn Chan K, Hutter CR, Wood PL, Su YC, Brown RM. Gene Flow Increases Phylogenetic Structure and Inflates Cryptic Species Estimations: A Case Study on Widespread Philippine Puddle Frogs (Occidozyga laevis). Syst Biol 2021; 71:40-57. [PMID: 33964168 DOI: 10.1093/sysbio/syab034] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 04/29/2021] [Accepted: 05/06/2021] [Indexed: 11/14/2022] Open
Abstract
In cryptic amphibian complexes, there is a growing trend to equate high levels of genetic structure with hidden cryptic species diversity. Typically, phylogenetic structure and distance-based approaches are used to demonstrate the distinctness of clades and justify the recognition of new cryptic species. However, this approach does not account for gene flow, spatial, and environmental processes that can obfuscate phylogenetic inference and bias species delimitation. As a case study, we sequenced genome-wide exons and introns to evince the processes that underlie the diversification of Philippine Puddle Frogs-a group that is widespread, phenotypically conserved, and exhibits high levels of geographically-based genetic structure. We showed that widely adopted tree- and distance-based approaches inferred up to 20 species, compared to genomic analyses that inferred an optimal number of five distinct genetic groups. Using a suite of clustering, admixture, and phylogenetic network analyses, we demonstrate extensive admixture among the five groups and elucidate two specific ways in which gene flow can cause overestimations of species diversity: (1) admixed populations can be inferred as distinct lineages characterized by long branches in phylograms; and (2) admixed lineages can appear to be genetically divergent, even from their parental populations when simple measures of genetic distance are used. We demonstrate that the relationship between mitochondrial and genome-wide nuclear p-distances is decoupled in admixed clades, leading to erroneous estimates of genetic distances and, consequently, species diversity. Additionally, genetic distance was also biased by spatial and environmental processes. Overall, we showed that high levels of genetic diversity in Philippine Puddle Frogs predominantly comprise metapopulation lineages that arose through complex patterns of admixture, isolation-by-distance, and isolation-by-environment as opposed to species divergence. Our findings suggest that speciation may not be the major process underlying the high levels of hidden diversity observed in many taxonomic groups and that widely-adopted tree- and distance-based methods overestimate species diversity in the presence of gene flow.
Collapse
Affiliation(s)
- Kin Onn Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, 2 Conservatory Drive, 117377 Singapore
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA.,Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Perry L Wood
- Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, Alabama 36849, USA
| | - Yong-Chao Su
- Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
14
|
Porto DS, Almeida EAB, Pennell MW. Investigating Morphological Complexes Using Informational Dissonance and Bayes Factors: A Case Study in Corbiculate Bees. Syst Biol 2021; 70:295-306. [PMID: 32722788 PMCID: PMC7882150 DOI: 10.1093/sysbio/syaa059] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 07/16/2020] [Accepted: 07/17/2020] [Indexed: 11/22/2022] Open
Abstract
It is widely recognized that different regions of a genome often have different evolutionary histories and that ignoring this variation when estimating phylogenies can be misleading. However, the extent to which this is also true for morphological data is still largely unknown. Discordance among morphological traits might plausibly arise due to either variable convergent selection pressures or else phenomena such as hemiplasy. Here, we investigate patterns of discordance among 282 morphological characters, which we scored for 50 bee species particularly targeting corbiculate bees, a group that includes the well-known eusocial honeybees and bumblebees. As a starting point for selecting the most meaningful partitions in the data, we grouped characters as morphological modules, highly integrated trait complexes that as a result of developmental constraints or coordinated selection we expect to share an evolutionary history and trajectory. In order to assess conflict and coherence across and within these morphological modules, we used recently developed approaches for computing Bayesian phylogenetic information allied with model comparisons using Bayes factors. We found that despite considerable conflict among morphological complexes, accounting for among-character and among-partition rate variation with individual gamma distributions, rate multipliers, and linked branch lengths can lead to coherent phylogenetic inference using morphological data. We suggest that evaluating information content and dissonance among partitions is a useful step in estimating phylogenies from morphological data, just as it is with molecular data. Furthermore, we argue that adopting emerging approaches for investigating dissonance in genomic datasets may provide new insights into the integration and evolution of anatomical complexes. [Apidae; entropy; morphological modules; phenotypic integration; phylogenetic information.].
Collapse
Affiliation(s)
- Diego S Porto
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto (FFCLRP), Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver BC V6T 1Z4, Canada
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, 926 West Campus Drive, Blacksburg, VA 24061 USA
| | - Eduardo A B Almeida
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto (FFCLRP), Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil
| | - Matthew W Pennell
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver BC V6T 1Z4, Canada
| |
Collapse
|
15
|
Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression. PLoS Biol 2020; 18:e3000954. [PMID: 33270638 PMCID: PMC7738166 DOI: 10.1371/journal.pbio.3000954] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 12/15/2020] [Accepted: 11/02/2020] [Indexed: 12/17/2022] Open
Abstract
Our understanding of the evolutionary history of primates is undergoing continual revision due to ongoing genome sequencing efforts. Bolstered by growing fossil evidence, these data have led to increased acceptance of once controversial hypotheses regarding phylogenetic relationships, hybridization and introgression, and the biogeographical history of primate groups. Among these findings is a pattern of recent introgression between species within all major primate groups examined to date, though little is known about introgression deeper in time. To address this and other phylogenetic questions, here, we present new reference genome assemblies for 3 Old World monkey (OWM) species: Colobus angolensis ssp. palliatus (the black and white colobus), Macaca nemestrina (southern pig-tailed macaque), and Mandrillus leucophaeus (the drill). We combine these data with 23 additional primate genomes to estimate both the species tree and individual gene trees using thousands of loci. While our species tree is largely consistent with previous phylogenetic hypotheses, the gene trees reveal high levels of genealogical discordance associated with multiple primate radiations. We use strongly asymmetric patterns of gene tree discordance around specific branches to identify multiple instances of introgression between ancestral primate lineages. In addition, we exploit recent fossil evidence to perform fossil-calibrated molecular dating analyses across the tree. Taken together, our genome-wide data help to resolve multiple contentious sets of relationships among primates, while also providing insight into the biological processes and technical artifacts that led to the disagreements in the first place. Combining three newly sequenced primate genomes with other published genomes, this study adapts a little-known method for detecting ancient introgression to genome-scale data, revealing multiple previously unknown examples of hybridization between primate species.
Collapse
|
16
|
Smith SA, Walker-Hale N, Walker JF. Intragenic Conflict in Phylogenomic Data Sets. Mol Biol Evol 2020; 37:3380-3388. [DOI: 10.1093/molbev/msaa170] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | | | - Joseph F Walker
- The Sainsbury Laboratory (SLCU), University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
17
|
Springer MS, Molloy EK, Sloan DB, Simmons MP, Gatesy J. ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets. J Hered 2019; 111:147-168. [DOI: 10.1093/jhered/esz076] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 12/12/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the “no intralocus-recombination” assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.
Collapse
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA
| | - Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO
| | - Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY
| |
Collapse
|
18
|
Lamichhaney S, Card DC, Grayson P, Tonini JFR, Bravo GA, Näpflin K, Termignoni-Garcia F, Torres C, Burbrink F, Clarke JA, Sackton TB, Edwards SV. Integrating natural history collections and comparative genomics to study the genetic architecture of convergent evolution. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180248. [PMID: 31154982 PMCID: PMC6560268 DOI: 10.1098/rstb.2018.0248] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/25/2019] [Indexed: 12/20/2022] Open
Abstract
Evolutionary convergence has been long considered primary evidence of adaptation driven by natural selection and provides opportunities to explore evolutionary repeatability and predictability. In recent years, there has been increased interest in exploring the genetic mechanisms underlying convergent evolution, in part, owing to the advent of genomic techniques. However, the current 'genomics gold rush' in studies of convergence has overshadowed the reality that most trait classifications are quite broadly defined, resulting in incomplete or potentially biased interpretations of results. Genomic studies of convergence would be greatly improved by integrating deep 'vertical', natural history knowledge with 'horizontal' knowledge focusing on the breadth of taxonomic diversity. Natural history collections have and continue to be best positioned for increasing our comprehensive understanding of phenotypic diversity, with modern practices of digitization and databasing of morphological traits providing exciting improvements in our ability to evaluate the degree of morphological convergence. Combining more detailed phenotypic data with the well-established field of genomics will enable scientists to make progress on an important goal in biology: to understand the degree to which genetic or molecular convergence is associated with phenotypic convergence. Although the fields of comparative biology or comparative genomics alone can separately reveal important insights into convergent evolution, here we suggest that the synergistic and complementary roles of natural history collection-derived phenomic data and comparative genomics methods can be particularly powerful in together elucidating the genomic basis of convergent evolution among higher taxa. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
Collapse
Affiliation(s)
- Sangeet Lamichhaney
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Daren C. Card
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
- Department of Biology, University of Texas Arlington, Arlington, TX 76019, USA
| | - Phil Grayson
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - João F. R. Tonini
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Kathrin Näpflin
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Flavia Termignoni-Garcia
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Christopher Torres
- Department of Biology, The University of Texas at Austin, Austin, MA 78712, USA
- Department of Geological Sciences, The University of Texas at Austin, Austin, MA 78712, USA
| | - Frank Burbrink
- Department of Herpetology, The American Museum of Natural History, New York, NY 10024, USA
| | - Julia A. Clarke
- Department of Biology, The University of Texas at Austin, Austin, MA 78712, USA
- Department of Geological Sciences, The University of Texas at Austin, Austin, MA 78712, USA
| | | | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
19
|
Sackton TB, Clark N. Convergent evolution in the genomics era: new insights and directions. Philos Trans R Soc Lond B Biol Sci 2019; 374:20190102. [PMID: 31154976 DOI: 10.1098/rstb.2019.0102] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Affiliation(s)
| | - Nathan Clark
- 2 Computational and Systems Biology, University of Pittsburgh , PA , USA
| |
Collapse
|
20
|
The Timing and Direction of Introgression Under the Multispecies Network Coalescent. Genetics 2019; 211:1059-1073. [PMID: 30670542 DOI: 10.1534/genetics.118.301831] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 01/21/2019] [Indexed: 12/26/2022] Open
Abstract
Introgression is a pervasive biological process, and many statistical methods have been developed to infer its presence from genomic data. However, many of the consequences and genomic signatures of introgression remain unexplored from a methodological standpoint. Here, we develop a model for the timing and direction of introgression based on the multispecies network coalescent, and from it suggest new approaches for testing introgression hypotheses. We suggest two new statistics, D 1 and D 2, which can be used in conjunction with other information to test hypotheses relating to the timing and direction of introgression, respectively. D 1 may find use in evaluating cases of homoploid hybrid speciation (HHS), while D 2 provides a four-taxon test for polarizing introgression. Although analytical expectations for our statistics require a number of assumptions to be met, we show how simulations can be used to test hypotheses about introgression when these assumptions are violated. We apply the D 1 statistic to genomic data from the wild yeast Saccharomyces paradoxus-a proposed example of HHS-demonstrating its use as a test of this model. These methods provide new and powerful ways to address questions relating to the timing and direction of introgression.
Collapse
|
21
|
Abstract
Convergent evolution provides key evidence for the action of natural selection. The process of convergence is often inferred because the same trait appears in multiple species that are not closely related. However, different parts of the genome can reveal different relationships among species, with some genes or regions uniting lineages that appear unrelated in the species tree. If changes in traits occur in these discordant regions, a false pattern of convergence can be produced (known as “hemiplasy”). Here, we provide a way to quantify the probability that hemiplasy occurs and contrast it with the probability of convergence. We find that hemiplasy is likely to explain many apparent cases of convergent evolution, even when the fraction of discordant regions is low. Convergent evolution—the appearance of the same character state in apparently unrelated organisms—is often inferred when a trait is incongruent with the species tree. However, trait incongruence can also arise from changes that occur on discordant gene trees, a process referred to as hemiplasy. Hemiplasy is rarely taken into account in studies of convergent evolution, despite the fact that phylogenomic studies have revealed rampant discordance. Here, we study the relative probabilities of homoplasy (including convergence and reversal) and hemiplasy for an incongruent trait. We derive expressions for the probabilities of the two events, showing that they depend on many of the same parameters. We find that hemiplasy is as likely—or more likely—than homoplasy for a wide range of conditions, even when levels of discordance are low. We also present a method to calculate the ratio of these two probabilities (the “hemiplasy risk factor”) along the branches of a phylogeny of arbitrary length. Such calculations can be applied to any tree to identify when and where incongruent traits may be due to hemiplasy.
Collapse
|