1
|
Mello B, Schrago CG. Modeling Substitution Rate Evolution across Lineages and Relaxing the Molecular Clock. Genome Biol Evol 2024; 16:evae199. [PMID: 39332907 PMCID: PMC11430275 DOI: 10.1093/gbe/evae199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2024] [Indexed: 09/29/2024] Open
Abstract
Relaxing the molecular clock using models of how substitution rates change across lineages has become essential for addressing evolutionary problems. The diversity of rate evolution models and their implementations are substantial, and studies have demonstrated their impact on divergence time estimates can be as significant as that of calibration information. In this review, we trace the development of rate evolution models from the proposal of the molecular clock concept to the development of sophisticated Bayesian and non-Bayesian methods that handle rate variation in phylogenies. We discuss the various approaches to modeling rate evolution, provide a comprehensive list of available software, and examine the challenges and advancements of the prevalent Bayesian framework, contrasting them to faster non-Bayesian methods. Lastly, we offer insights into potential advancements in the field in the era of big data.
Collapse
Affiliation(s)
- Beatriz Mello
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, RJ 21941-617, Brazil
| | - Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, RJ 21941-617, Brazil
| |
Collapse
|
2
|
Paradis E, Claramunt S, Brown J, Schliep K. Confidence intervals in molecular dating by maximum likelihood. Mol Phylogenet Evol 2023; 178:107652. [PMID: 36306994 DOI: 10.1016/j.ympev.2022.107652] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 10/11/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022]
Abstract
Molecular dating has been widely used to infer the times of past evolutionary events using molecular sequences. This paper describes three bootstrap methods to infer confidence intervals under a penalized likelihood framework. The basic idea is to use data pseudoreplicates to infer uncertainty in the branch lengths of a phylogeny reconstructed with molecular sequences. The three specific bootstrap methods are nonparametric (direct tree bootstrapping), semiparametric (rate smoothing), and parametric (Poisson simulation). Our extensive simulation study showed that the three methods perform generally well under a simple strict clock model of molecular evolution; however, the results were less positive with data simulated using an uncorrelated or a correlated relaxed clock model. Several factors impacted, possibly in interaction, the performance of the confidence intervals. Increasing the number of calibration points had a positive effect, as well as increasing the sequence length or the number of sequences although both latter effects depended on the model of evolution. A case study is presented with a molecular phylogeny of the Felidae (Mammalia: Carnivora). A comparison was made with a Bayesian analysis: the results were very close in terms of confidence intervals and there was no marked tendency for an approach to produce younger or older bounds compared to the other.
Collapse
Affiliation(s)
| | - Santiago Claramunt
- Department of Natural History, Royal Ontario Museum, Toronto, ON 5S2C6, Canada
| | - Joseph Brown
- Department of Natural History, Royal Ontario Museum, Toronto, ON 5S2C6, Canada
| | - Klaus Schliep
- Institute of Computational Biotechnology, Technology University Graz, Austria
| |
Collapse
|
3
|
Duchêne DA, Tong KJ, Foster CSP, Duchêne S, Lanfear R, Ho SYW. Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference. Mol Biol Evol 2019; 37:1202-1210. [DOI: 10.1093/molbev/msz291] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
AbstractEvolution leaves heterogeneous patterns of nucleotide variation across the genome, with different loci subject to varying degrees of mutation, selection, and drift. In phylogenetics, the potential impacts of partitioning sequence data for the assignment of substitution models are well appreciated. In contrast, the treatment of branch lengths has received far less attention. In this study, we examined the effects of linking and unlinking branch-length parameters across loci or subsets of loci. By analyzing a range of empirical data sets, we find consistent support for a model in which branch lengths are proportionate between subsets of loci: gene trees share the same pattern of branch lengths, but form subsets that vary in their overall tree lengths. These models had substantially better statistical support than models that assume identical branch lengths across gene trees, or those in which genes form subsets with distinct branch-length patterns. We show using simulations and empirical data that the complexity of the branch-length model with the highest support depends on the length of the sequence alignment and on the numbers of taxa and loci in the data set. Our findings suggest that models in which branch lengths are proportionate between subsets have the highest statistical support under the conditions that are most commonly seen in practice. The results of our study have implications for model selection, computational efficiency, and experimental design in phylogenomics.
Collapse
Affiliation(s)
- David A Duchêne
- Research School of Biology, Australian National University, Canberra, ACT, Australia
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - K Jun Tong
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Charles S P Foster
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Sebastián Duchêne
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC, Australia
| | - Robert Lanfear
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
4
|
Wei R, Zhang XC. Phylogeny of Diplazium (Athyriaceae) revisited: Resolving the backbone relationships based on plastid genomes and phylogenetic tree space analysis. Mol Phylogenet Evol 2019; 143:106699. [PMID: 31809851 DOI: 10.1016/j.ympev.2019.106699] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 12/01/2019] [Accepted: 12/01/2019] [Indexed: 11/17/2022]
Abstract
Despite progress in resolving the phylogeny of twinsorus ferns (Diplazium) based on multilocus phylogenetic studies, uncertainty remains especially for deep, or backbone relationships among closely related clades, suggesting a classic case of rapid evolutionary radiation. Here, we investigated the deep phylogenetic relationships within Diplazium by sampling all major clades and using 51 plastid genomes (plastomes), of which 38 were newly sequenced with high-throughput sequencing technology, resulting more than 127,000 informative sites. Using parsimony, maximum likelihood and Bayesian analyses of plastome sequences, we largely resolved the backbone of the phylogeny of Diplazium with strong support. However, we also detected phylogenetic incongruence among different datasets and moderately to poorly supported relationships, particularly at several extremely short internal branches. By using phylogenetic tree space and topology-clustering analyses, we provide evidence that conflicting phylogenetic signals can be found across the trees estimated from individual chloroplast protein-coding genes, which may underlie the difficulty of systematics of Diplazium. Furthermore, our phylogenetic estimate offers more resolution over previous multilocus analyses, providing a framework for future taxonomic revisions of sectional classification of Diplazium. Our study demonstrates the advantage of a character-rich plastome dataset, combining the comparison of different phylogenetic methods, for resolving the recalcitrant lineages that have undergone rapid radiation and dramatic changes in evolutionary rates.
Collapse
Affiliation(s)
- Ran Wei
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, The Chinese Academy of Sciences, Beijing 100093, China
| | - Xian-Chun Zhang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, The Chinese Academy of Sciences, Beijing 100093, China.
| |
Collapse
|
5
|
Mello B, Schrago CG. The Estimated Pacemaker for Great Apes Supports the Hominoid Slowdown Hypothesis. Evol Bioinform Online 2019; 15:1176934319855988. [PMID: 31223232 PMCID: PMC6566470 DOI: 10.1177/1176934319855988] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/17/2019] [Indexed: 11/16/2022] Open
Abstract
The recent surge of genomic data has prompted the investigation of substitution rate variation across the genome, as well as among lineages. Evolutionary trees inferred from distinct genomic regions may display branch lengths that differ between loci by simple proportionality constants, indicating that rate variation follows a pacemaker model, which may be attributed to lineage effects. Analyses of genes from diverse biological clades produced contrasting results, supporting either this model or alternative scenarios where multiple pacemakers exist. So far, an evaluation of the pacemaker hypothesis for all great apes has never been carried out. In this work, we tested whether the evolutionary rates of hominids conform to pacemakers, which were inferred accounting for gene tree/species tree discordance. For higher precision, substitution rates in branches were estimated with a calibration-free approach, the relative rate framework. A predominant evolutionary trend in great apes was evidenced by the recovery of a large pacemaker, encompassing most hominid genomic regions. In addition, the majority of genes followed a pace of evolution that was closely related to the strict molecular clock. However, slight rate decreases were recovered in the internal branches leading to humans, corroborating the hominoid slowdown hypothesis. Our findings suggest that in great apes, life history traits were the major drivers of substitution rate variation across the genome.
Collapse
Affiliation(s)
- Beatriz Mello
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
6
|
Abstract
Several studies have pointed out that the tight correlation between genes' evolutionary rate is better explained by a model denoted as the Universal PaceMaker (UPM) rather than by a simple rate constancy as manifested by the classical hypothesis of molecular clock (MC). Under UPM, each gene is associated with a single pacemaker (PM) and varies its evolutionary rate according to this PM ticks. Hence, the relative rates of all genes associated with the same PM remain nearly constant, whereas the absolute rates can change arbitrarily according to the PM ticks. A consequent question to that mentioned is finding the gene-PM association only from the gene sequence data. This, however, turns to be a nontrivial task and is affected by the number of variables, their random noise, and the amount of available information. To this end, a clustering heuristic was devised by exploiting the correlation between corresponding edge lengths across thousands of gene trees. Nevertheless, no theoretical study linking the relationship between the affecting parameters was done. We here study this question by providing theoretical bounds, expressed by the system parameters, on probabilities for positive and negative results. We corroborate these results by a simulation study that reveals the critical role of the variances.
Collapse
Affiliation(s)
- Sagi Snir
- The Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
7
|
Foster CSP, Henwood MJ, Ho SYW. Plastome sequences and exploration of tree-space help to resolve the phylogeny of riceflowers (Thymelaeaceae: Pimelea). Mol Phylogenet Evol 2018; 127:156-167. [PMID: 29803950 DOI: 10.1016/j.ympev.2018.05.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 04/17/2018] [Accepted: 05/17/2018] [Indexed: 10/16/2022]
Abstract
Data sets comprising small numbers of genetic markers are not always able to resolve phylogenetic relationships. This has frequently been the case in molecular systematic studies of plants, with many analyses being based on sequence data from only two or three chloroplast genes. An example of this comes from the riceflowers Pimelea Banks & Sol. ex Gaertn. (Thymelaeaceae), a large genus of flowering plants predominantly distributed in Australia. Despite the considerable morphological variation in the genus, low sequence divergence in chloroplast markers has led to the phylogeny of Pimelea remaining largely uncertain. In this study, we resolve the backbone of the phylogeny of Pimelea in comprehensive Bayesian and maximum-likelihood analyses of plastome sequences from 41 taxa. However, some relationships received only moderate to poor support, and the Pimelea clade contained extremely short internal branches. By using topology-clustering analyses, we demonstrate that conflicting phylogenetic signals can be found across the trees estimated from individual chloroplast protein-coding genes. A relaxed-clock dating analysis reveals that Pimelea arose in the mid-Miocene, with most divergences within the genus occurring during a subsequent rapid diversification. Our new phylogenetic estimate offers better resolution and is more strongly supported than previous estimates, providing a platform for future taxonomic revisions of both Pimelea and the broader subfamily. Our study has demonstrated the substantial improvements in phylogenetic resolution that can be achieved using plastome-scale data sets in plant molecular systematics.
Collapse
Affiliation(s)
- Charles S P Foster
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia.
| | - Murray J Henwood
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
8
|
Lee MSY. Multiple morphological clocks and total-evidence tip-dating in mammals. Biol Lett 2017; 12:rsbl.2016.0033. [PMID: 27381882 DOI: 10.1098/rsbl.2016.0033] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 06/14/2016] [Indexed: 11/12/2022] Open
Abstract
Morphological integration predicts that correlated characters will coevolve; thus, each distinct suite of correlated characters might be expected to evolve according to a separate clock or 'pacemaker'. Characters in a large morphological dataset for mammals were found to be evolving according to seven separate clocks, each distinct from the molecular clock. Total-evidence tip-dating using these multiple clocks inflated divergence time estimates, but potentially improved topological inference. In particular, single-clock analyses placed several meridiungulates and condylarths in a heterodox position as stem placentals, but multi-clock analyses retrieved a more plausible and orthodox position within crown placentals. Several shortcomings (including uneven character sampling) currently impact upon the accuracy of total-evidence dating, but this study suggests that when sufficiently large and appropriately constructed phenotypic datasets become more commonplace, multi-clock approaches are feasible and can affect both divergence dates and phylogenetic relationships.
Collapse
Affiliation(s)
- Michael S Y Lee
- Earth Sciences Section, South Australian Museum, North Terrace, Adelaide, South Australia 5000, Australia School of Biological Sciences, Flinders University, GPO Box 2100, Adelaide, South Australia 5001, Australia
| |
Collapse
|
9
|
Tong KJ, Duchêne S, Lo N, Ho SYW. The impacts of drift and selection on genomic evolution in insects. PeerJ 2017; 5:e3241. [PMID: 28462044 PMCID: PMC5410144 DOI: 10.7717/peerj.3241] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Accepted: 03/28/2017] [Indexed: 11/20/2022] Open
Abstract
Genomes evolve through a combination of mutation, drift, and selection, all of which act heterogeneously across genes and lineages. This leads to differences in branch-length patterns among gene trees. Genes that yield trees with the same branch-length patterns can be grouped together into clusters. Here, we propose a novel phylogenetic approach to explain the factors that influence the number and distribution of these gene-tree clusters. We apply our method to a genomic dataset from insects, an ancient and diverse group of organisms. We find some evidence that when drift is the dominant evolutionary process, each cluster tends to contain a large number of fast-evolving genes. In contrast, strong negative selection leads to many distinct clusters, each of which contains only a few slow-evolving genes. Our work, although preliminary in nature, illustrates the use of phylogenetic methods to shed light on the factors driving rate variation in genomic evolution.
Collapse
Affiliation(s)
- K Jun Tong
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Sebastián Duchêne
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia.,Centre for Systems Genomics, University of Melbourne, Melbourne, Victoria, Australia
| | - Nathan Lo
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
10
|
Duchêne S, Foster CSP, Ho SYW. Estimating the number and assignment of clock models in analyses of multigene datasets. Bioinformatics 2016; 32:1281-5. [DOI: 10.1093/bioinformatics/btw005] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 01/04/2016] [Indexed: 11/14/2022] Open
|