1
|
Sanderson MJ, Búrquez A, Copetti D, McMahon MM, Zeng Y, Wojciechowski MF. Origin and diversification of the saguaro cactus (Carnegiea gigantea): a within-species phylogenomic analysis. Syst Biol 2022; 71:1178-1194. [PMID: 35244183 DOI: 10.1093/sysbio/syac017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 02/18/2022] [Accepted: 02/25/2022] [Indexed: 11/14/2022] Open
Abstract
Reconstructing accurate historical relationships within a species poses numerous challenges, not least in many plant groups in which gene flow is high enough to extend well beyond species boundaries. Nonetheless, the extent of tree-like history within a species is an empirical question on which it is now possible to bring large amounts of genome sequence to bear. We assess phylogenetic structure across the geographic range of the saguaro cactus, an emblematic member of Cactaceae, a clade known for extensive hybridization and porous species boundaries. Using 200 Gb of whole genome resequencing data from 20 individuals sampled from 10 localities, we assembled two data sets comprising 150,000 biallelic single nucleotide polymorphisms (SNPs) from protein coding sequences. From these we inferred within-species trees and evaluated their significance and robustness using five qualitatively different inference methods. Despite the low sequence diversity, large census population sizes, and presence of wide-ranging pollen and seed dispersal agents, phylogenetic trees were well resolved and highly consistent across both data sets and all methods. We inferred that the most likely root, based on marginal likelihood comparisons, is to the east and south of the region of highest genetic diversity, which lies along the coast of the Gulf of California in Sonora, Mexico. Together with striking decreases in marginal likelihood found to the north, this supports hypotheses that saguaro's current range reflects post-glacial expansion from the refugia in the south of its range. We conclude with observations about practical and theoretical issues raised by phylogenomic data sets within species, in which SNP-based methods must be used rather than gene tree methods that are widely used when sequence divergence is higher. These include computational scalability, inference of gene flow, and proper assessment of statistical support in the presence of linkage effects.
Collapse
Affiliation(s)
- Michael J Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Alberto Búrquez
- Instituto de Ecología, Unidad Hermosillo, Universidad Nacional Autónoma de México, Hermosillo, Sonora, Mexico
| | - Dario Copetti
- Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ, 85721 USA
| | | | - Yichao Zeng
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | | |
Collapse
|
2
|
Naser-Khdour S, Minh BQ, Lanfear R. Assessing Confidence in Root Placement on Phylogenies: An Empirical Study Using Non-Reversible Models for Mammals. Syst Biol 2021; 71:959-972. [PMID: 34387349 PMCID: PMC9260635 DOI: 10.1093/sysbio/syab067] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 08/03/2021] [Accepted: 08/11/2021] [Indexed: 11/14/2022] Open
Abstract
Using time-reversible Markov models is a very common practice in phylogenetic analysis,
because although we expect many of their assumptions to be violated by empirical data,
they provide high computational efficiency. However, these models lack the ability to
infer the root placement of the estimated phylogeny. In order to compensate for the
inability of these models to root the tree, many researchers use external information such
as using outgroup taxa or additional assumptions such as molecular clocks. In this study,
we investigate the utility of nonreversible models to root empirical phylogenies and
introduce a new bootstrap measure, the rootstrap, which provides
information on the statistical support for any given root position. [Bootstrap;
nonreversible models; phylogenetic inference; root estimation.]
Collapse
Affiliation(s)
- Suha Naser-Khdour
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Bui Quang Minh
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia.,Research School of Computer Science, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| |
Collapse
|
3
|
Hannaford NE, Heaps SE, Nye TMW, Williams TA, Embley TM. Incorporating compositional heterogeneity into Lie Markov models for phylogenetic inference. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
4
|
Heaps SE, Nye TMW, Boys RJ, Williams TA, Cherlin S, Embley TM. Generalizing rate heterogeneity across sites in statistical phylogenetics. STAT MODEL 2020. [DOI: 10.1177/1471082x19829937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees relating species. Along branches, sequence evolution is modelled using a continuous-time Markov process characterized by an instantaneous rate matrix. Early models assumed the same rate matrix governed substitutions at all sites of the alignment, ignoring variation in evolutionary pressures. Substantial improvements in phylogenetic inference and model fit were achieved by augmenting these models with multiplicative random effects that describe the result of variation in selective constraints and allow sites to evolve at different rates which linearly scale a baseline rate matrix. Motivated by this pioneering work, we consider an extension using a quadratic, rather than linear, transformation. The resulting models allow for variation in the selective coefficients of different types of point mutation at a site in addition to variation in selective constraints. We derive properties of the extended models. For certain non-stationary processes, the extension gives a model that allows variation in sequence composition, both across sites and taxa. We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. Our quadratic models are applied to alignments spanning the tree of life and compared with site-homogeneous and linear models.
Collapse
Affiliation(s)
- Sarah E Heaps
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, UK
| | - Tom MW Nye
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, UK
| | - Richard J Boys
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, UK
| | - Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Svetlana Cherlin
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK
| | - T Martin Embley
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
5
|
Stadler PF, Geiß M, Schaller D, López Sánchez A, González Laffitte M, Valdivia DI, Hellmuth M, Hernández Rosales M. From pairs of most similar sequences to phylogenetic best matches. Algorithms Mol Biol 2020; 15:5. [PMID: 32308731 PMCID: PMC7147060 DOI: 10.1186/s13015-020-00165-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 03/26/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, however, whenever there are large lineage specific rate variations among paralogous genes. In practice, this introduces a high level of noise into the input data for best-hit-based orthology detection methods. RESULTS If additive distances between genes are known, then evolutionary most closely related pairs can be identified by considering certain quartets of genes provided that in each quartet the outgroup relative to the remaining three genes is known. A priori knowledge of underlying species phylogeny greatly facilitates the identification of the required outgroup. Although the workflow remains a heuristic since the correct outgroup cannot be determined reliably in all cases, simulations with lineage specific biases and rate asymmetries show that nearly perfect results can be achieved. In a realistic setting, where distances data have to be estimated from sequence data and hence are noisy, it is still possible to obtain highly accurate sets of best matches. CONCLUSION Improvements of tree-free orthology assessment methods can be expected from a combination of the accurate inference of best matches reported here and recent mathematical advances in the understanding of (reciprocal) best match graphs and orthology relations. AVAILABILITY Accompanying software is available at https://github.com/david-schaller/AsymmeTree.
Collapse
Affiliation(s)
- Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, 04107 Leipzig, Germany
- Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv), and Leipzig Research Center for Civilization Diseases, Universität Leipzig, Augustusplatz 12, 04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, 1090 Vienna, Austria
- Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Ciudad Universitaria, 111321 Bogotá, D.C. Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501 USA
| | - Manuela Geiß
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, 04107 Leipzig, Germany
- Software Competence Center Hagenberg GmbH, Softwarepark 21, 4232 Hagenberg, Austria
| | - David Schaller
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, 04107 Leipzig, Germany
| | - Alitzel López Sánchez
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México
| | - Marcos González Laffitte
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México
| | - Dulce I. Valdivia
- Departamento de Ingeniería Genética, Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV), Km. 9.6 Libramiento Norte Carretera Irapuato-León, 36821 Irapuato, GTO México
| | - Marc Hellmuth
- School of Computing, University of Leeds, E C Stoner Building, Leeds, LS2 9JT UK
| | - Maribel Hernández Rosales
- CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México
| |
Collapse
|
6
|
Williams TA, Cox CJ, Foster PG, Szöllősi GJ, Embley TM. Phylogenomics provides robust support for a two-domains tree of life. Nat Ecol Evol 2020; 4:138-147. [PMID: 31819234 PMCID: PMC6942926 DOI: 10.1038/s41559-019-1040-x] [Citation(s) in RCA: 125] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 10/15/2019] [Indexed: 11/09/2022]
Abstract
Hypotheses about the origin of eukaryotic cells are classically framed within the context of a universal 'tree of life' based on conserved core genes. Vigorous ongoing debate about eukaryote origins is based on assertions that the topology of the tree of life depends on the taxa included and the choice and quality of genomic data analysed. Here we have reanalysed the evidence underpinning those claims and apply more data to the question by using supertree and coalescent methods to interrogate >3,000 gene families in archaea and eukaryotes. We find that eukaryotes consistently originate from within the archaea in a two-domains tree when due consideration is given to the fit between model and data. Our analyses support a close relationship between eukaryotes and Asgard archaea and identify the Heimdallarchaeota as the current best candidate for the closest archaeal relatives of the eukaryotic nuclear lineage.
Collapse
Affiliation(s)
- Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol, UK.
| | - Cymon J Cox
- Centro de Ciências do Mar, Universidade do Algarve, Faro, Portugal
| | - Peter G Foster
- Department of Life Sciences, Natural History Museum, London, UK
| | - Gergely J Szöllősi
- MTA-ELTE "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary
- Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
- Evolutionary Systems Research Group, Centre for Ecological Research, Hungarian Academy of Sciences, Tihany, Hungary
| | - T Martin Embley
- Institute for Cell and Molecular Biosciences, University of Newcastle, Newcastle upon Tyne, UK.
| |
Collapse
|
7
|
Lamarca AP, Schrago CG. Fast speciations and slow genes: uncovering the root of living canids. Biol J Linn Soc Lond 2019. [DOI: 10.1093/biolinnean/blz181] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Abstract
Despite ongoing efforts relying on computationally intensive tree-building methods and large datasets, the deeper phylogenetic relationships between living canid genera remain controversial. We demonstrate that this issue arises fundamentally from the uncertainty of root placement as a consequence of the short length of the branch connecting the major canid clades, which probably resulted from a fast radiation during the early diversification of extant Canidae. Using both nuclear and mitochondrial genes, we investigate the position of the canid root and its consistency by using three rooting methods. We find that mitochondrial genomes consistently retrieve a root node separating the tribe Canini from the remaining canids, whereas nuclear data mostly recover a root that places the Urocyon foxes as the sister lineage of living canids. We demonstrate that, to resolve the canid root, the nuclear segments sequenced so far are significantly less informative than mitochondrial genomes. We also propose that short intervals between speciations obscure the place of the true root, because methods are susceptible to stochastic error in the presence of short internal branches near the root.
Collapse
Affiliation(s)
- Alessandra P Lamarca
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
8
|
Heaps SE, Nye TMW, Boys RJ, Williams TA, Cherlin S, Embley TM. Generalizing rate heterogeneity across sites in statistical phylogenetics. STAT MODEL 2019. [DOI: 10.1177/1471082x18829937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Affiliation(s)
- Sarah E Heaps
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, UK
| | - Tom MW Nye
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, UK
| | - Richard J Boys
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, UK
| | - Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Svetlana Cherlin
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK
| | - T Martin Embley
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
9
|
Dombrowski N, Lee JH, Williams TA, Offre P, Spang A. Genomic diversity, lifestyles and evolutionary origins of DPANN archaea. FEMS Microbiol Lett 2019; 366:5281434. [PMID: 30629179 PMCID: PMC6349945 DOI: 10.1093/femsle/fnz008] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 01/07/2019] [Indexed: 12/16/2022] Open
Abstract
Archaea-a primary domain of life besides Bacteria-have for a long time been regarded as peculiar organisms that play marginal roles in biogeochemical cycles. However, this picture changed with the discovery of a large diversity of archaea in non-extreme environments enabled by the use of cultivation-independent methods. These approaches have allowed the reconstruction of genomes of uncultivated microorganisms and revealed that archaea are diverse and broadly distributed in the biosphere and seemingly include a large diversity of putative symbiotic organisms, most of which belong to the tentative archaeal superphylum referred to as DPANN. This archaeal group encompasses at least 10 different lineages and includes organisms with extremely small cell and genome sizes and limited metabolic capabilities. Therefore, many members of DPANN may be obligately dependent on symbiotic interactions with other organisms and may even include novel parasites. In this contribution, we review the current knowledge of the gene repertoires and lifestyles of members of this group and discuss their placement in the tree of life, which is the basis for our understanding of the deep microbial roots and the role of symbiosis in the evolution of life on Earth.
Collapse
Affiliation(s)
- Nina Dombrowski
- NIOZ, Royal Netherlands Institute for Sea Research, Department of Marine Microbiology and Biogeochemistry, and Utrecht University, P.O. Box 59, NL-1790 AB Den Burg, The Netherlands
- Department of Marine Science, University of Texas at Austin, Marine Science Institute, 750 Channel View Drive, Port Aransas, TX 78373, USA
| | - Jun-Hoe Lee
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, P.O. Box 596, Husargatan 3, SE-75123 Uppsala, Sweden
| | - Tom A Williams
- School of Biological Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol, Bristol BS8 1TQ, UK
| | - Pierre Offre
- NIOZ, Royal Netherlands Institute for Sea Research, Department of Marine Microbiology and Biogeochemistry, and Utrecht University, P.O. Box 59, NL-1790 AB Den Burg, The Netherlands
| | - Anja Spang
- NIOZ, Royal Netherlands Institute for Sea Research, Department of Marine Microbiology and Biogeochemistry, and Utrecht University, P.O. Box 59, NL-1790 AB Den Burg, The Netherlands
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, P.O. Box 596, Husargatan 3, SE-75123 Uppsala, Sweden
| |
Collapse
|