1
|
Bouckaert RR. OBAMA: OBAMA for Bayesian amino-acid model averaging. PeerJ 2020; 8:e9460. [PMID: 32832259 PMCID: PMC7413081 DOI: 10.7717/peerj.9460] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 06/10/2020] [Indexed: 11/20/2022] Open
Abstract
Background
Bayesian analyses offer many benefits for phylogenetic, and have been popular for analysis of amino acid alignments. It is necessary to specify a substitution and site model for such analyses, and often an ad hoc, or likelihood based method is employed for choosing these models that are typically of no interest to the analysis overall.
Methods
We present a method called OBAMA that averages over substitution models and site models, thus letting the data inform model choices and taking model uncertainty into account. It uses trans-dimensional Markov Chain Monte Carlo (MCMC) proposals to switch between various empirical substitution models for amino acids such as Dayhoff, WAG, and JTT. Furthermore, it switches base frequencies from these substitution models or use base frequencies estimated based on the alignment. Finally, it switches between using gamma rate heterogeneity or not, and between using a proportion of invariable sites or not.
Results
We show that the model performs well in a simulation study. By using appropriate priors, we demonstrate both proportion of invariable sites and the shape parameter for gamma rate heterogeneity can be estimated. The OBAMA method allows taking in account model uncertainty, thus reducing bias in phylogenetic estimates. The method is implemented in the OBAMA package in BEAST 2, which is open source licensed under LGPL and allows joint tree inference under a wide range of models.
Collapse
Affiliation(s)
- Remco R. Bouckaert
- School of Computer Science, University of Auckland, Auckland, New Zealand
- Max Planck Institute for the Science of Human History, Jena, Germany
| |
Collapse
|
2
|
Grant T. Outgroup sampling in phylogenetics: Severity of test and successive outgroup expansion. J ZOOL SYST EVOL RES 2019. [DOI: 10.1111/jzs.12317] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Taran Grant
- Department of Zoology, Institute of Biosciences University of São Paulo São Paulo Brazil
| |
Collapse
|
3
|
Su Z, Townsend JP. Utility of characters evolving at diverse rates of evolution to resolve quartet trees with unequal branch lengths: analytical predictions of long-branch effects. BMC Evol Biol 2015; 15:86. [PMID: 25968460 PMCID: PMC4429678 DOI: 10.1186/s12862-015-0364-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Accepted: 04/29/2015] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The detection and avoidance of "long-branch effects" in phylogenetic inference represents a longstanding challenge for molecular phylogenetic investigations. A consequence of parallelism and convergence, long-branch effects arise in phylogenetic inference when there is unequal molecular divergence among lineages, and they can positively mislead inference based on parsimony especially, but also inference based on maximum likelihood and Bayesian approaches. Long-branch effects have been exhaustively examined by simulation studies that have compared the performance of different inference methods in specific model trees and branch length spaces. RESULTS In this paper, by generalizing the phylogenetic signal and noise analysis to quartets with uneven subtending branches, we quantify the utility of molecular characters for resolution of quartet phylogenies via parsimony. Our quantification incorporates contributions toward the correct tree from either signal or homoplasy (i.e. "the right result for either the right reason or the wrong reason"). We also characterize a highly conservative lower bound of utility that incorporates contributions to the correct tree only when they correspond to true, unobscured parsimony-informative sites (i.e. "the right result for the right reason"). We apply the generalized signal and noise analysis to classic quartet phylogenies in which long-branch effects can arise due to unequal rates of evolution or an asymmetrical topology. Application of the analysis leads to identification of branch length conditions in which inference will be inconsistent and reveals insights regarding how to improve sampling of molecular loci and taxa in order to correctly resolve phylogenies in which long-branch effects are hypothesized to exist. CONCLUSIONS The generalized signal and noise analysis provides analytical prediction of utility of characters evolving at diverse rates of evolution to resolve quartet phylogenies with unequal branch lengths. The analysis can be applied to identifying characters evolving at appropriate rates to resolve phylogenies in which long-branch effects are hypothesized to occur.
Collapse
Affiliation(s)
- Zhuo Su
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, 06520, USA.
| | - Jeffrey P Townsend
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, 06520, USA.
- Department of Biostatistics, Yale University, New Haven, CT, 06520, USA.
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Department of Biostatistics, Yale School of Public Health, 135 College St #222., New Haven, CT, 06511, United States of America.
| |
Collapse
|
4
|
Struck TH. Data congruence, paedomorphosis and salamanders. Front Zool 2007; 4:22. [PMID: 17974010 PMCID: PMC2234405 DOI: 10.1186/1742-9994-4-22] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2006] [Accepted: 10/31/2007] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The retention of ancestral juvenile characters by adult stages of descendants is called paedomorphosis. However, this process can mislead phylogenetic analyses based on morphological data, even in combination with molecular data, because the assessment if a character is primary absent or secondary lost is difficult. Thus, the detection of incongruence between morphological and molecular data is necessary to investigate the reliability of simultaneous analyses. Different methods have been proposed to detect data congruence or incongruence. Five of them (PABA, PBS, NDI, LILD, DRI) are used herein to assess incongruence between morphological and molecular data in a case study addressing salamander phylogeny, which comprises several supposedly paedomorphic taxa. Therefore, previously published data sets were compiled herein. Furthermore, two strategies ameliorating effects of paedomorphosis on phylogenetic studies were tested herein using a statistical rigor. Additionally, efficiency of the different methods to assess incongruence was analyzed using this empirical data set. Finally, a test statistic is presented for all these methods except DRI. RESULTS The addition of morphological data to molecular data results in both different positions of three of the four paedomorphic taxa and strong incongruence, but treating the morphological data using different strategies ameliorating the negative impact of paedomorphosis revokes these changes and minimizes the conflict. Of these strategies the strategy to just exclude paedomorphic character traits seem to be most beneficial. Of the three molecular partitions analyzed herein the RAG1 partition seems to be the most suitable to resolve deep salamander phylogeny. The rRNA and mtDNA partition are either too conserved or too variable, respectively. Of the different methods to detect incongruence, the NDI and PABA approaches are more conservative in the indication of incongruence than LILD and PBS. CONCLUSION Paedomorphosis induces strong conflicts and can mislead the phylogenetic analyses even in combined analyses. However, different strategies are efficiently minimizing these problems. Though the exploration of different methods to detect incongruence is preferable NDI and PABA are more conservative than the others and NDI is computational less extensive than PABA.
Collapse
Affiliation(s)
- Torsten H Struck
- Department of Biology/Chemistry, University of Osnabrück, Barbarastr, 11, Osnabrück, D-49076, Germany.
| |
Collapse
|
5
|
Wägele JW, Mayer C. Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects. BMC Evol Biol 2007; 7:147. [PMID: 17725833 PMCID: PMC2040160 DOI: 10.1186/1471-2148-7-147] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Accepted: 08/28/2007] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Published molecular phylogenies are usually based on data whose quality has not been explored prior to tree inference. This leads to errors because trees obtained with conventional methods suppress conflicting evidence, and because support values may be high even if there is no distinct phylogenetic signal. Tools that allow an a priori examination of data quality are rarely applied. RESULTS Using data from published molecular analyses on the phylogeny of crustaceans it is shown that tree topologies and popular support values do not show existing differences in data quality. To visualize variations in signal distinctness, we use network analyses based on split decomposition and split support spectra. Both methods show the same differences in data quality and the same clade-supporting patterns. Both methods are useful to discover long-branch effects. We discern three classes of long branch effects. Class I effects consist of attraction of terminal taxa caused by symplesiomorphies, which results in a false monophyly of paraphyletic groups. Addition of carefully selected taxa can fix this effect. Class II effects are caused by drastic signal erosion. Long branches affected by this phenomenon usually slip down the tree to form false clades that in reality are polyphyletic. To recover the correct phylogeny, more conservative genes must be used. Class III effects consist of attraction due to accumulated chance similarities or convergent character states. This sort of noise can be reduced by selecting less variable portions of the data set, avoiding biases, and adding slower genes. CONCLUSION To increase confidence in molecular phylogenies an exploratory analysis of the signal to noise ratio can be conducted with split decomposition methods. If long-branch effects are detected, it is necessary to discern between three classes of effects to find the best approach for an improvement of the raw data.
Collapse
Affiliation(s)
| | - Christoph Mayer
- Lehrstuhl Spezielle Zoologie, Faculty of Biology, University Bochum, 44780 Bochum, Germany
| |
Collapse
|
6
|
Dittmar K, de Souza SM, Araújo A. Challenges of phylogenetic analyses of aDNA sequences. Mem Inst Oswaldo Cruz 2006; 101 Suppl 2:9-13. [PMID: 17308803 DOI: 10.1590/s0074-02762006001000003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2006] [Accepted: 10/16/2006] [Indexed: 11/21/2022] Open
Abstract
One of the crucial steps of authentication of aDNA sequences is phylogenetic consistency. Amplified sequences should fit into the phylogenetic framework of their supposed origin. An inherent property of aDNA sequences however, is their short sequence length. Additionally, genes for aDNA studies are often chosen by their preservation potential rather than by phylogenetically informative content. This poses potential challenges regarding their analyses, and might result in an inaccurate reflection of the supposed phylogenetic history of the sequence or organism under study. In this paper some fundamental problems of phylogenetic analysis and interpretation of aDNA datasets are discussed. Suggestions for character sampling and treatment of missing data are made. The publication is the result of a talk from the 1st PAMINSA Meeting in Rio de Janeiro, July 2005.
Collapse
Affiliation(s)
- Katharina Dittmar
- Department of Integrative Biology, Brigham Young University, Provo, Utah, USA.
| | | | | |
Collapse
|
7
|
deWaard JR, Sacherova V, Cristescu MEA, Remigio EA, Crease TJ, Hebert PDN. Probing the relationships of the branchiopod crustaceans. Mol Phylogenet Evol 2006; 39:491-502. [PMID: 16406819 DOI: 10.1016/j.ympev.2005.11.003] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2005] [Accepted: 11/01/2005] [Indexed: 11/29/2022]
Abstract
The Branchiopoda display extraordinary variation in body form, even within the morphologically diverse crustaceans. To fully understand the origin and evolution of these morphological reconfigurations, a robust phylogeny of the group is essential. To infer the affinities among branchiopods, we employed two approaches to taxon and gene sampling, presented new sequence data from three genes, incorporated previously published sequence data from three additional genes, and utilized comprehensive techniques of phylogeny reconstruction. The results provided support for a number of longstanding hypotheses concerning the relationships among the orders. For example, we obtained support for the Cladoceromorpha and Gymnomera, and favoured a unique arrangement of the cladoceran orders. A few affinities remain to be resolved, particularly at the base of the Phyllopoda and within the Anomopoda. However, the results suggest that increased gene sampling is recommended for future investigations of branchiopod systematics.
Collapse
Affiliation(s)
- Jeremy R deWaard
- Department of Integrative Biology, University of Guelph, Guelph, Ont., Canada N1G 2W1.
| | | | | | | | | | | |
Collapse
|
8
|
Grant T, Kluge AG. Data exploration in phylogenetic inference: scientific, heuristic, or neither. Cladistics 2005; 19:379-418. [DOI: 10.1111/j.1096-0031.2003.tb00311.x] [Citation(s) in RCA: 121] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
9
|
|
10
|
Abstract
AFLP markers provide a potential source of phylogenetic information for molecular systematic studies. However, there are properties of restriction fragment data that limit phylogenetic interpretation of AFLPs. These are (a) possible nonindependence of fragments, (b) problems of homology assignment of fragments, (c) asymmetry in the probability of losing and gaining fragments, and (d) problems in distinguishing heterozygote from homozygote bands. In the present study, AFLP data sets of Lactuca s.l. were examined for the presence of phylogenetic signal. An indication of this signal was provided by carrying out tree length distribution skewness (g1) tests, permutation tail probability (PTP) tests, and relative apparent synapomorphy analysis (RASA). A measure of the support for internal branches in the optimal parsimony tree (MPT) was made using bootstrap, jackknife, and decay analysis. Finally, the extent of congruence in MPTs for AFLP and internal transcribed spacer (ITS)-1 data sets for the same taxa was made using the partition homogeneity test (PHT) and the Templeton test. These analytical studies suggested the presence of phylogenetic signal in the AFLP data sets, although some incongruence was found between AFLP and ITS MPTs. An extensive literature survey undertaken indicated that authors report a general congruence of AFLP and ITS tree topologies across a wide range of taxonomic groups, suggesting that the present results and conclusions have a general bearing. In these earlier studies and those for Lactuca s.l., AFLP markers have been found to be informative at somewhat lower taxonomic levels than ITS sequences. Tentative estimates are suggested for the levels of ITS sequence divergence over which AFLP profiles are likely to be phylogenetically informative.
Collapse
Affiliation(s)
- Wim J M Koopman
- Biosystematics Group, Nationaal Herbarium Nederland-, Wageningen University branch, Wageningen University, Generaal Foulkesweg 37, 6703BL, Wageningen, The Netherlands.
| |
Collapse
|
11
|
Hilu KW, Borsch T, Müller K, Soltis DE, Soltis PS, Savolainen V, Chase MW, Powell MP, Alice LA, Evans R, Sauquet H, Neinhuis C, Slotta TAB, Rohwer JG, Campbell CS, Chatrou LW. Angiosperm phylogeny based on matK sequence information. AMERICAN JOURNAL OF BOTANY 2003; 90:1758-76. [PMID: 21653353 DOI: 10.3732/ajb.90.12.1758] [Citation(s) in RCA: 221] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Plastid matK gene sequences for 374 genera representing all angiosperm orders and 12 genera of gymnosperms were analyzed using parsimony (MP) and Bayesian inference (BI) approaches. Traditionally, slowly evolving genomic regions have been preferred for deep-level phylogenetic inference in angiosperms. The matK gene evolves approximately three times faster than the widely used plastid genes rbcL and atpB. The MP and BI trees are highly congruent. The robustness of the strict consensus tree supercedes all individual gene analyses and is comparable only to multigene-based phylogenies. Of the 385 nodes resolved, 79% are supported by high jackknife values, averaging 88%. Amborella is sister to the remaining angiosperms, followed by a grade of Nymphaeaceae and Austrobaileyales. Bayesian inference resolves Amborella + Nymphaeaceae as sister to the rest, but with weak (0.42) posterior probability. The MP analysis shows a trichotomy sister to the Austrobaileyales representing eumagnoliids, monocots + Chloranthales, and Ceratophyllum + eudicots. The matK gene produces the highest internal support yet for basal eudicots and, within core eudicots, resolves a crown group comprising Berberidopsidaceae/Aextoxicaceae, Santalales, and Caryophyllales + asterids. Moreover, matK sequences provide good resolution within many angiosperm orders. Combined analyses of matK and other rapidly evolving DNA regions with available multigene data sets have strong potential to enhance resolution and internal support in deep level angiosperm phylogenetics and provide additional insights into angiosperm evolution.
Collapse
Affiliation(s)
- Khidir W Hilu
- Department of Biology, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061 USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Abstract
The scope and impact of horizontal gene transfer (HGT) in Bacteria and Archaea has grown from a topic largely ignored by the microbiological community to a hot-button issue gaining staunch supporters (on particular points of view) at a seemingly ever-increasing rate. Opinions range from HGT being a phenomenon with minor impact on overall microbial evolution and diversification to HGT being so rampant as to obfuscate any opportunities for elucidating microbial evolution - especially organismal phylogeny - from sequence comparisons. This contentious issue has been fuelled by the influx of complete genome sequences, which has allowed for a more detailed examination of this question than previously afforded. We propose that the lack of common ground upon which to formulate consensus viewpoints probably stems from the absence of answers to four critical questions. If addressed, they could clarify concepts, reject tenuous speculation and solidify a robust foundation for the integration of HGT into a framework for long-term microbial evolution, regardless of the intellectual camp in which you reside. Here, we examine these issues, why their answers shape the outcome of this debate and the progress being made to address them.
Collapse
Affiliation(s)
- Jeffrey G Lawrence
- Pittsburgh Bacteriophage Institute and Department of Biological Sciences, 352 Crawford Hall, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| | | |
Collapse
|
13
|
WILKE THOMAS. Salenthydrobia gen. nov. (Rissooidea: Hydrobiidae): a potential relict of the Messinian salinity crisis. Zool J Linn Soc 2003. [DOI: 10.1046/j.1096-3642.2003.00049.x] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
14
|
Graham SW, Olmstead RG, Barrett SCH. Rooting Phylogenetic Trees with Distant Outgroups: A Case Study from the Commelinoid Monocots. Mol Biol Evol 2002; 19:1769-81. [PMID: 12270903 DOI: 10.1093/oxfordjournals.molbev.a003999] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Phylogenetic rooting experiments demonstrate that two chloroplast genes from commelinoid monocot taxa that represent the closest living relatives of the pickerelweed family, Pontederiaceae, retain measurable signals regarding the position of that family's root. The rooting preferences of the chloroplast sequences were compared with those for artificial sequences that correspond to outgroups so divergent that their signal has been lost completely. These random sequences prefer the three longest branches in the unrooted ingroup topology and do not preferentially root on the branches favored by real outgroup sequences. However, the rooting behavior of the artificial sequences is not a simple function of branch length. The random outgroups preferentially root on long terminal ingroup branches, but many ingroup branches comparable in length to those favored by random sequences attract no or few hits. Nonterminal ingroup branches are generally avoided, regardless of their length. Comparisons of the ease of forcing sequences onto suboptimal roots indicate that real outgroups require a substantially greater rooting penalty than random outgroups for around half of the least-parsimonious candidate roots. Although this supports the existence of nonrandomized signal in the real outgroups, it also indicates that there is little power to choose among the optimal and nearly optimal rooting possibilities. A likelihood-based test rejects the hypothesis that all rootings of the subtree using real outgroup sequences are equally good explanations of the data and also eliminates around half of the least optimal candidate roots. Adding genes or outgroups can improve the ability to discriminate among different root locations. Rooting discriminatory power is shown to be stronger, in general, for more closely related outgroups and is highly correlated among different real outgroups, genes, and optimality criteria.
Collapse
Affiliation(s)
- Sean W Graham
- Department of Biological Sciences, University of Alberta, Edmonton, Canada.
| | | | | |
Collapse
|
15
|
|
16
|
|