1
|
Francisco Barbosa F, Mermudes JRM, Russo CAM. Performance of tree-building methods using a morphological dataset and a well-supported Hexapoda phylogeny. PeerJ 2024; 12:e16706. [PMID: 38213769 PMCID: PMC10782957 DOI: 10.7717/peerj.16706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 11/30/2023] [Indexed: 01/13/2024] Open
Abstract
Recently, many studies have addressed the performance of phylogenetic tree-building methods (maximum parsimony, maximum likelihood, and Bayesian inference), focusing primarily on simulated data. However, for discrete morphological data, there is no consensus yet on which methods recover the phylogeny with better performance. To address this lack of consensus, we investigate the performance of different methods using an empirical dataset for hexapods as a model. As an empirical test of performance, we applied normalized indices to effectively measure accuracy (normalized Robinson-Foulds metric, nRF) and precision, which are measured via resolution, one minus Colless' consensus fork index (1-CFI). Additionally, to further explore phylogenetic accuracy and support measures, we calculated other statistics, such as the true positive rate (statistical power) and the false positive rate (type I error), and constructed receiver operating characteristic plots to visualize the relationship between these statistics. We applied the normalized indices to the reconstructed trees from the reanalyses of an empirical discrete morphological dataset from extant Hexapoda using a well-supported phylogenomic tree as a reference. Maximum likelihood and Bayesian inference applying the k-state Markov (Mk) model (without or with a discrete gamma distribution) performed better, showing higher precision (resolution). Additionally, our results suggest that most available tree topology tests are reliable estimators of the performance measures applied in this study. Thus, we suggest that likelihood-based methods and tree topology tests should be used more often in phylogenetic tree studies based on discrete morphological characters. Our study provides a fair indication that morphological datasets have robust phylogenetic signal.
Collapse
Affiliation(s)
| | | | - Claudia A. M. Russo
- Genetics, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
2
|
The Roles of Protein Structure, Taxon Sampling, and Model Complexity in Phylogenomics: A Case Study Focused on Early Animal Divergences. BIOPHYSICA 2021. [DOI: 10.3390/biophysica1020008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Despite the long history of using protein sequences to infer the tree of life, the potential for different parts of protein structures to retain historical signal remains unclear. We propose that it might be possible to improve analyses of phylogenomic datasets by incorporating information about protein structure. We test this idea using the position of the root of Metazoa (animals) as a model system. We examined the distribution of “strongly decisive” sites (alignment positions that support a specific tree topology) in a dataset comprising >1500 proteins and almost 100 taxa. The proportion of each class of strongly decisive sites in different structural environments was very sensitive to the model used to analyze the data when a limited number of taxa were used but they were stable when taxa were added. As long as enough taxa were analyzed, sites in all structural environments supported the same topology regardless of whether standard tree searches or decisive sites were used to select the optimal tree. However, the use of decisive sites revealed a difference between the support for minority topologies for sites in different structural environments: buried sites and sites in sheet and coil environments exhibited equal support for the minority topologies, whereas solvent-exposed and helix sites had unequal numbers of sites, supporting the minority topologies. This suggests that the relatively slowly evolving buried, sheet, and coil sites are giving an accurate picture of the true species tree and the amount of conflict among gene trees. Taken as a whole, this study indicates that phylogenetic analyses using sites in different structural environments can yield different topologies for the deepest branches in the animal tree of life and that analyzing larger numbers of taxa eliminates this conflict. More broadly, our results highlight the desirability of incorporating information about protein structure into phylogenomic analyses.
Collapse
|
3
|
Meyer X. Adaptive Tree Proposals for Bayesian Phylogenetic Inference. Syst Biol 2021; 70:1015-1032. [PMID: 33515248 PMCID: PMC8357345 DOI: 10.1093/sysbio/syab004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 01/07/2021] [Accepted: 01/17/2021] [Indexed: 11/14/2022] Open
Abstract
Bayesian inference of phylogeny with MCMC plays a key role in the study of evolution. Yet, this method still suffers from a practical challenge identified more than two decades ago: designing tree topology proposals that efficiently sample tree spaces. In this article, I introduce the concept of adaptive tree proposals for unrooted topologies, that is tree proposals adapting to the posterior distribution as it is estimated. I use this concept to elaborate two adaptive variants of existing proposals and an adaptive proposal based on a novel design philosophy in which the structure of the proposal is informed by the posterior distribution of trees. I investigate the performance of these proposals by first presenting a metric that captures the performance of each proposal within a mixture of proposals. Using this metric, I compare the performance of the adaptive proposals to the performance of standard and parsimony-guided proposals on 11 empirical datasets. Using adaptive proposals led to consistent performance gains and resulted in up to 18-fold increases in mixing efficiency and 6-fold increases in convergence rate without increasing the computational cost of these analyses.
Collapse
Affiliation(s)
- X Meyer
- Department of Integrative Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
4
|
Zhang C, Huelsenbeck JP, Ronquist F. Using Parsimony-Guided Tree Proposals to Accelerate Convergence in Bayesian Phylogenetic Inference. Syst Biol 2020; 69:1016-1032. [PMID: 31985810 PMCID: PMC7440752 DOI: 10.1093/sysbio/syaa002] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Revised: 01/15/2020] [Accepted: 01/17/2020] [Indexed: 12/18/2022] Open
Abstract
Sampling across tree space is one of the major challenges in Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) algorithms. Standard MCMC tree moves consider small random perturbations of the topology, and select from candidate trees at random or based on the distance between the old and new topologies. MCMC algorithms using such moves tend to get trapped in tree space, making them slow in finding the globally most probable trees (known as "convergence") and in estimating the correct proportions of the different types of them (known as "mixing"). Here, we introduce a new class of moves, which propose trees based on their parsimony scores. The proposal distribution derived from the parsimony scores is a quickly computable albeit rough approximation of the conditional posterior distribution over candidate trees. We demonstrate with simulations that parsimony-guided moves correctly sample the uniform distribution of topologies from the prior. We then evaluate their performance against standard moves using six challenging empirical data sets, for which we were able to obtain accurate reference estimates of the posterior using long MCMC runs, a mix of topology proposals, and Metropolis coupling. On these data sets, ranging in size from 357 to 934 taxa and from 1740 to 5681 sites, we find that single chains using parsimony-guided moves usually converge an order of magnitude faster than chains using standard moves. They also exhibit better mixing, that is, they cover the most probable trees more quickly. Our results show that tree moves based on quick and dirty estimates of the posterior probability can significantly outperform standard moves. Future research will have to show to what extent the performance of such moves can be improved further by finding better ways of approximating the posterior probability, taking the trade-off between accuracy and speed into account. [Bayesian phylogenetic inference; MCMC; parsimony; tree proposal.].
Collapse
Affiliation(s)
- Chi Zhang
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, 142 XizhimenWai Street, Beijing 100044, China
- Center for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, 142 XizhimenWai Street, Beijing 100044, China
| | - John P Huelsenbeck
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
| | - Fredrik Ronquist
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, SE-10405 Stockholm, Sweden
| |
Collapse
|
5
|
Grundler M, Rabosky DL. Complex Ecological Phenotypes on Phylogenetic Trees: A Markov Process Model for Comparative Analysis of Multivariate Count Data. Syst Biol 2020; 69:1200-1211. [DOI: 10.1093/sysbio/syaa031] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 04/02/2020] [Accepted: 04/07/2020] [Indexed: 12/26/2022] Open
Abstract
AbstractThe evolutionary dynamics of complex ecological traits—including multistate representations of diet, habitat, and behavior—remain poorly understood. Reconstructing the tempo, mode, and historical sequence of transitions involving such traits poses many challenges for comparative biologists, owing to their multidimensional nature. Continuous-time Markov chains are commonly used to model ecological niche evolution on phylogenetic trees but are limited by the assumption that taxa are monomorphic and that states are univariate categorical variables. A necessary first step in the analysis of many complex traits is therefore to categorize species into a predetermined number of univariate ecological states, but this procedure can lead to distortion and loss of information. This approach also confounds interpretation of state assignments with effects of sampling variation because it does not directly incorporate empirical observations for individual species into the statistical inference model. In this study, we develop a Dirichlet-multinomial framework to model resource use evolution on phylogenetic trees. Our approach is expressly designed to model ecological traits that are multidimensional and to account for uncertainty in state assignments of terminal taxa arising from effects of sampling variation. The method uses multivariate count data across a set of discrete resource categories sampled for individual species to simultaneously infer the number of ecological states, the proportional utilization of different resources by different states, and the phylogenetic distribution of ecological states among living species and their ancestors. The method is general and may be applied to any data expressible as a set of observational counts from different categories. [Comparative methods; Dirichlet multinomial; ecological niche evolution; macroevolution; Markov model.]
Collapse
Affiliation(s)
- Michael Grundler
- Museum of Zoology and Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Daniel L Rabosky
- Museum of Zoology and Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
6
|
Tidwell H, Nakhleh L. Integrated likelihood for phylogenomics under a no-common-mechanism model. BMC Genomics 2020; 21:219. [PMID: 32299348 PMCID: PMC7161099 DOI: 10.1186/s12864-020-6608-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background Multi-locus species phylogeny inference is based on models of sequence evolution on gene trees as well as models of gene tree evolution within the branches of species phylogenies. Almost all statistical methods for this inference task assume a common mechanism across all loci as captured by a single value of each branch length of the species phylogeny. Results In this paper, we pursue a “no common mechanism" (NCM) model, where every gene tree evolves according to its own parameters of the species phylogeny. Based on this model, we derive an analytically integrated likelihood of both species trees and networks given the gene trees of multiple loci under an NCM model. We demonstrate the performance of inference under this integrated likelihood on both simulated and biological data. Conclusions The model presented here will afford opportunities for exploring connections among various criteria for estimating species phylogenies from multiple, independent loci. Furthermore, further development of this model could potentially result in more efficient methods for searching the space of species phylogenies by focusing solely on the topology of the phylogeny.
Collapse
|
7
|
Abstract
Background: Locating the root node of the "tree of life" (ToL) is one of the hardest problems in phylogenetics, given the time depth. The root-node, or the universal common ancestor (UCA), groups descendants into organismal clades/domains. Two notable variants of the two-domains ToL (2D-ToL) have gained support recently. One 2D-ToL posits that eukaryotes (organisms with nuclei) and akaryotes (organisms without nuclei) are sister clades that diverged from the UCA, and that Asgard archaea are sister to other archaea. The other 2D-ToL proposes that eukaryotes emerged from within archaea and places Asgard archaea as sister to eukaryotes. Williams et al. ( Nature Ecol. Evol. 4: 138-147; 2020) re-evaluated the data and methods that support the competing two-domains proposals and concluded that eukaryotes are the closest relatives of Asgard archaea. Critique: The poor resolution of the archaea in their analysis, despite employing amino acid alignments from thousands of proteins and the best-fitting substitution models, contradicts their conclusions. We argue that they overlooked important aspects of estimating evolutionary relatedness and assessing phylogenetic signal in empirical data. Which 2D-ToL is better supported depends on which kind of molecular features are better for resolving common ancestors at the roots of clades - protein-domains or their component amino acids. We focus on phylogenetic character reconstructions necessary to describe the UCA or its closest descendants in the absence of reliable fossils. Clarifications: It is well known that different character types present different perspectives on evolutionary history that relate to different phylogenetic depths. We show that protein structural-domains support more reliable phylogenetic reconstructions of deep-diverging clades in the ToL. Accordingly, Eukaryotes and Akaryotes are better supported clades in a 2D-ToL.
Collapse
Affiliation(s)
| | - David Morrison
- Department of Organismal Biology, Systematic Biology, Uppsala University, Uppsala, 752 36, Sweden
| |
Collapse
|
8
|
Goloboff PA, Pittman M, Pol D, Xu X. Morphological Data Sets Fit a Common Mechanism Much More Poorly than DNA Sequences and Call Into Question the Mkv Model. Syst Biol 2019; 68:494-504. [PMID: 30445627 DOI: 10.1093/sysbio/syy077] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/11/2018] [Accepted: 11/13/2018] [Indexed: 01/30/2023] Open
Abstract
The Mkv evolutionary model, based on minor modifications to models of molecular evolution, is being increasingly used to infer phylogenies from discrete morphological data, often producing different results from parsimony. The critical difference between Mkv and parsimony is the assumption of a "common mechanism" in the Mkv model, with branch lengths determining that probability of change for all characters increases or decreases at the same tree branches by the same exponential factor. We evaluate whether the assumption of a common mechanism applies to morphology, by testing the implicit prediction that branch lengths calculated from different subsets of characters will be significantly correlated. Our analysis shows that DNA (38 data sets tested) is often compatible with a common mechanism, but morphology (86 data sets tested) generally is not, showing very disparate branch lengths for different character partitions. The low levels of branch length correlation demonstrated for morphology (fitting models without a common mechanism) suggest that the Mkv model is too unrealistic and inadequate for the analysis of most morphological data sets. [Bayesian analysis; Mkv model; morphological data; phylogenetics.].
Collapse
Affiliation(s)
- Pablo A Goloboff
- Unidad Ejecutora Lillo (UEL), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), S.M. Tucumán, Argentina
| | - Michael Pittman
- Vertebrate Palaeontology Laboratory, Department of Earth Sciences, University of Hong Kong, Pokfulam, Hong Kong
| | - Diego Pol
- Museo Egidio Feruglio, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Trelew, Argentina
| | - Xing Xu
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
9
|
Goloboff PA, Arias JS. Likelihood approximations of implied weights parsimony can be selected over the Mk model by the Akaike information criterion. Cladistics 2019; 35:695-716. [DOI: 10.1111/cla.12380] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/26/2019] [Indexed: 01/09/2023] Open
Affiliation(s)
- Pablo A. Goloboff
- Unidad Ejecutora Lillo Consejo Nacional de Investigaciones Científicas y Técnicas Fundación Miguel Lillo Miguel Lillo 251 4000 S.M. de Tucumán Argentina
| | - J. Salvador Arias
- Unidad Ejecutora Lillo Consejo Nacional de Investigaciones Científicas y Técnicas Fundación Miguel Lillo Miguel Lillo 251 4000 S.M. de Tucumán Argentina
| |
Collapse
|
10
|
Goloboff PA, Torres A, Arias JS. Weighted parsimony outperforms other methods of phylogenetic inference under models appropriate for morphology. Cladistics 2017; 34:407-437. [DOI: 10.1111/cla.12205] [Citation(s) in RCA: 205] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/22/2017] [Indexed: 11/28/2022] Open
Affiliation(s)
- Pablo A. Goloboff
- Unidad Ejecutora Lillo; Fundación Miguel Lillo; CONICET; Miguel Lillo 251 4000 San Miguel de Tucumán Argentina
| | - Ambrosio Torres
- Unidad Ejecutora Lillo; Fundación Miguel Lillo; CONICET; Miguel Lillo 251 4000 San Miguel de Tucumán Argentina
| | - J. Salvador Arias
- Unidad Ejecutora Lillo; Fundación Miguel Lillo; CONICET; Miguel Lillo 251 4000 San Miguel de Tucumán Argentina
| |
Collapse
|
11
|
Scotland RW, Steel M. Circumstances in which parsimony but not compatibility will be provably misleading. Syst Biol 2015; 64:492-504. [PMID: 25634097 PMCID: PMC4395848 DOI: 10.1093/sysbio/syv008] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2014] [Accepted: 01/23/2015] [Indexed: 11/12/2022] Open
Abstract
Phylogenetic methods typically rely on an appropriate model of how data evolved in order to infer an accurate phylogenetic tree. For molecular data, standard statistical methods have provided an effective strategy for extracting phylogenetic information from aligned sequence data when each site (character) is subject to a common process. However, for other types of data (e.g., morphological data), characters can be too ambiguous, homoplastic, or saturated to develop models that are effective at capturing the underlying process of change. To address this, we examine the properties of a classic but neglected method for inferring splits in an underlying tree, namely, maximum compatibility. By adopting a simple and extreme model in which each character either fits perfectly on some tree, or is entirely random (but it is not known which class any character belongs to) we are able to derive exact and explicit formulae regarding the performance of maximum compatibility. We show that this method is able to identify a set of non-trivial homoplasy-free characters, when the number [Formula: see text] of taxa is large, even when the number of random characters is large. In contrast, we show that a method that makes more uniform use of all the data-maximum parsimony-can provably estimate trees in which none of the original homoplasy-free characters support splits.
Collapse
Affiliation(s)
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
12
|
Whidden C, Matsen FA. Quantifying MCMC exploration of phylogenetic tree space. Syst Biol 2015; 64:472-91. [PMID: 25631175 PMCID: PMC4395846 DOI: 10.1093/sysbio/syv006] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 01/20/2015] [Indexed: 11/30/2022] Open
Abstract
In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks.
Collapse
Affiliation(s)
- Chris Whidden
- Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Frederick A Matsen
- Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
13
|
Guindon S. From trajectories to averages: an improved description of the heterogeneity of substitution rates along lineages. Syst Biol 2012; 62:22-34. [PMID: 22798331 DOI: 10.1093/sysbio/sys063] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The accuracy and precision of species divergence date estimation from molecular data strongly depend on the models describing the variation of substitution rates along a phylogeny. These models generally assume that rates randomly fluctuate along branches from one node to the next. However, for mathematical convenience, the stochasticity of such a process is ignored when translating these rate trajectories into branch lengths. This study addresses this shortcoming. A new approach is described that explicitly considers the average substitution rates along branches as random quantities, resulting in a more realistic description of the variations of evolutionary rates along lineages. The proposed method provides more precise estimates of the rate autocorrelation parameter as well as divergence times. Also, simulation results indicate that ignoring the stochastic variation of rates along edges can lead to significant overestimation of specific node ages. Altogether, the new approach introduced in this study is a step forward to designing biologically relevant models of rate evolution that are well suited to data sets with dense taxon sampling which are likely to present rate autocorrelation. The computer programme PhyTime, part of the PhyML package and implementing the new approach, is available from http://code.google.com/p/phyml (last accessed 1 August 2012).
Collapse
Affiliation(s)
- Stéphane Guindon
- Department of Statistics, University of Auckland, Auckland, 1010, New Zealand.
| |
Collapse
|
14
|
Höhna S, Drummond AJ. Guided Tree Topology Proposals for Bayesian Phylogenetic Inference. Syst Biol 2011; 61:1-11. [DOI: 10.1093/sysbio/syr074] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Sebastian Höhna
- Department of Mathematics, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Alexei J. Drummond
- Department of Computer Science, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand
| |
Collapse
|
15
|
Huelsenbeck JP, Alfaro ME, Suchard MA. Biologically inspired phylogenetic models strongly outperform the no common mechanism model. Syst Biol 2011; 60:225-32. [PMID: 21252385 PMCID: PMC3038349 DOI: 10.1093/sysbio/syq089] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Revised: 06/29/2009] [Accepted: 09/22/2010] [Indexed: 11/13/2022] Open
Abstract
But Tuffley and Steel (1997) introduced a model called No Common Mechanism (NCM), in which characters may-but are not required to-vary their relative rates independently, both within and between branches. Because the independent variation is taken only as a possibility, not as a requirement, NCM would apply to almost any situation, and so may be accepted as realistic. This is useful because Tuffley and Steel also showed that maximum likelihood under NCM selects the same trees as does parsimony. With the realistic NCM in the background, then, most parsimonious trees have greatest power to explain available observations. -Farris (2008).
Collapse
Affiliation(s)
- John P Huelsenbeck
- Department of Integrative Biology, University of California, Berkeley, CA 94720-3140, USA.
| | | | | |
Collapse
|
16
|
Ané C. Detecting phylogenetic breakpoints and discordance from genome-wide alignments for species tree reconstruction. Genome Biol Evol 2011; 3:246-58. [PMID: 21362638 PMCID: PMC3070431 DOI: 10.1093/gbe/evr013] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
With the easy acquisition of sequence data, it is now possible to obtain and align whole genomes across multiple related species or populations. In this work, I assess the performance of a statistical method to reconstruct the whole distribution of phylogenetic trees along the genome, estimate the proportion of the genome for which a given clade is true, and infer a concordance tree that summarizes the dominant vertical inheritance pattern. There are two main issues when dealing with whole-genome alignments, as opposed to multiple genes: the size of the data and the detection of recombination breakpoints. These breakpoints partition the genomic alignment into phylogenetically homogeneous loci, where sites within a given locus all share the same phylogenetic tree topology. To delimitate these loci, I describe here a method based on the minimum description length (MDL) principle, implemented with dynamic programming for computational efficiency. Simulations show that combining MDL partitioning with Bayesian concordance analysis provides an efficient and robust way to estimate both the vertical inheritance signal and the horizontal phylogenetic signal. The method performed well both in the presence of incomplete lineage sorting and in the presence of horizontal gene transfer. A high level of systematic bias was found here, highlighting the need for good individual tree building methods, which form the basis for more elaborate gene tree/species tree reconciliation methods.
Collapse
Affiliation(s)
- Cécile Ané
- Departments of Statistics and Botany, University of Wisconsin-Madison, USA.
| |
Collapse
|
17
|
Affiliation(s)
- Mike Steel
- Allan Wilson Centre for Molecular Ecology and Evolution, Biomathematics Research Centre, University of Canterbury, Private Bag 4800, Christchurch, New Zealand.
| |
Collapse
|
18
|
Holder MT, Lewis PO, Swofford DL. The akaike information criterion will not choose the no common mechanism model. Syst Biol 2010; 59:477-85. [PMID: 20547783 DOI: 10.1093/sysbio/syq028] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Mark T Holder
- Department of Ecology and Evolutionary Biology, University of Kansas, 1200 Sunnyside Avenue, Lawrence, KS 66045, USA.
| | | | | |
Collapse
|
19
|
Abstract
Heterotachy is a general term to describe positions in a sequence that evolve at different rates in different lineages. Kolaczkowski and Thornton (2004. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431:980-984.) recently described an intriguing heterotachy model that leads to topological bias for likelihood-based methods and parsimony methods. In this article, we show that heterotachy can generally be viewed as multivariate rates-across-sites variation, which can be described as randomly drawing rates (or branch lengths) from a multivariate distribution for each branch at each site. Motivated by this idea, we propose a pairwise alpha heterotachy adjustment model, which gives us much improved topological estimation in the settings by Kolaczkowski and Thornton (2004).
Collapse
Affiliation(s)
- Jihua Wu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.
| | | |
Collapse
|
20
|
Kim J, Sanderson MJ. Penalized likelihood phylogenetic inference: bridging the parsimony-likelihood gap. Syst Biol 2008; 57:665-74. [PMID: 18853355 DOI: 10.1080/10635150802422274] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
The increasing diversity and heterogeneity of molecular data for phylogeny estimation has led to development of complex models and model-based estimators. Here, we propose a penalized likelihood (PL) framework in which the levels of complexity in the underlying model can be smoothly controlled. We demonstrate the PL framework for a four-taxon tree case and investigate its properties. The PL framework yields an estimator in which the majority of currently employed estimators such as the maximum-parsimony estimator, homogeneous likelihood estimator, gamma mixture likelihood estimator, etc., become special cases of a single family of PL estimators. Furthermore, using the appropriate penalty function, the complexity of the underlying models can be partitioned into separately controlled classes allowing flexible control of model complexity.
Collapse
Affiliation(s)
- Junhyong Kim
- Department of Biology and Penn Genome Frontiers Institute, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
| | | |
Collapse
|