1
|
Propagating uncertainty about molecular evolution models and prior distributions to phylogenetic trees. Mol Phylogenet Evol 2023; 180:107689. [PMID: 36587884 DOI: 10.1016/j.ympev.2022.107689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 10/21/2022] [Accepted: 12/22/2022] [Indexed: 12/31/2022]
Abstract
Phylogenetic trees constructed from molecular sequence data rely on largely arbitrary assumptions about the substitution model, the distribution of substitution rates across sites, the version of the molecular clock, and, in the case of Bayesian inference, the prior distribution. Those assumptions affect results reported in the form of clade probabilities and error bars on divergence times and substitution rates. Overlooking the uncertainty in the assumptions leads to overly confident conclusions in the form of inflated clade probabilities and short confidence intervals or credible intervals. This paper demonstrates how to propagate that uncertainty by combining the models considered along with all of their assumptions, including their prior distributions. The combined models incorporate much more of the uncertainty than Bayesian model averages since the latter tend to settle on a single model due to the higher-level assumption that one of the models is true. Nucleotide sequence data illustrates the proposed model combination method.
Collapse
|
2
|
Bickel DR. Propagating clade and model uncertainty to confidence intervals of divergence times and branch lengths. Mol Phylogenet Evol 2021; 167:107357. [PMID: 34785383 DOI: 10.1016/j.ympev.2021.107357] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 11/01/2021] [Accepted: 11/08/2021] [Indexed: 12/01/2022]
Abstract
Confidence intervals of divergence times and branch lengths do not reflect uncertainty about their clades or about the prior distributions and other model assumptions on which they are based. Uncertainty about the clade may be propagated to a confidence interval by multiplying its confidence level by the bootstrap proportion of its clade or by another probability that the clade is correct. (If the confidence level is 95% and the bootstrap proportion is 90%, then the uncertainty-adjusted confidence level is (0.95)(0.90) = 86%.) Uncertainty about the model can be propagated to the confidence interval by reporting the union of the confidence intervals from all the plausible models. Unless there is no overlap between the confidence intervals, that results in an uncertainty-adjusted interval that has as its lower and upper limits the most extreme limits of the models. The proposed methods of uncertainty quantification may be used together.
Collapse
Affiliation(s)
- David R Bickel
- Informatics and Analytics, University of North Carolina at Greensboro, The Graduate School, 241 Mossman Building, CAMPUS Greensboro, NC 27402-6170, USA.
| |
Collapse
|
3
|
Bouckaert RR. OBAMA: OBAMA for Bayesian amino-acid model averaging. PeerJ 2020; 8:e9460. [PMID: 32832259 PMCID: PMC7413081 DOI: 10.7717/peerj.9460] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 06/10/2020] [Indexed: 11/20/2022] Open
Abstract
Background
Bayesian analyses offer many benefits for phylogenetic, and have been popular for analysis of amino acid alignments. It is necessary to specify a substitution and site model for such analyses, and often an ad hoc, or likelihood based method is employed for choosing these models that are typically of no interest to the analysis overall.
Methods
We present a method called OBAMA that averages over substitution models and site models, thus letting the data inform model choices and taking model uncertainty into account. It uses trans-dimensional Markov Chain Monte Carlo (MCMC) proposals to switch between various empirical substitution models for amino acids such as Dayhoff, WAG, and JTT. Furthermore, it switches base frequencies from these substitution models or use base frequencies estimated based on the alignment. Finally, it switches between using gamma rate heterogeneity or not, and between using a proportion of invariable sites or not.
Results
We show that the model performs well in a simulation study. By using appropriate priors, we demonstrate both proportion of invariable sites and the shape parameter for gamma rate heterogeneity can be estimated. The OBAMA method allows taking in account model uncertainty, thus reducing bias in phylogenetic estimates. The method is implemented in the OBAMA package in BEAST 2, which is open source licensed under LGPL and allows joint tree inference under a wide range of models.
Collapse
Affiliation(s)
- Remco R. Bouckaert
- School of Computer Science, University of Auckland, Auckland, New Zealand
- Max Planck Institute for the Science of Human History, Jena, Germany
| |
Collapse
|
4
|
San JE, Baichoo S, Kanzi A, Moosa Y, Lessells R, Fonseca V, Mogaka J, Power R, de Oliveira T. Current Affairs of Microbial Genome-Wide Association Studies: Approaches, Bottlenecks and Analytical Pitfalls. Front Microbiol 2020; 10:3119. [PMID: 32082269 PMCID: PMC7002396 DOI: 10.3389/fmicb.2019.03119] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 12/24/2019] [Indexed: 12/12/2022] Open
Abstract
Microbial genome-wide association studies (mGWAS) are a new and exciting research field that is adapting human GWAS methods to understand how variations in microbial genomes affect host or pathogen phenotypes, such as drug resistance, virulence, host specificity and prognosis. Several computational tools and methods have been developed or adapted from human GWAS to facilitate the discovery of novel mutations and structural variations that are associated with the phenotypes of interest. However, no comprehensive, end-to-end, user-friendly tool is currently available. The development of a broadly applicable pipeline presents a real opportunity among computational biologists. Here, (i) we review the prominent and promising tools, (ii) discuss analytical pitfalls and bottlenecks in mGWAS, (iii) provide insights into the selection of appropriate tools, (iv) highlight the gaps that still need to be filled and how users and developers can work together to overcome these bottlenecks. Use of mGWAS research can inform drug repositioning decisions as well as accelerate the discovery and development of more effective vaccines and antimicrobials for pressing infectious diseases of global health significance, such as HIV, TB, influenza, and malaria.
Collapse
Affiliation(s)
- James Emmanuel San
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Shakuntala Baichoo
- Department of Digital Technologies, FoICDT, University of Mauritius, Réduit, Mauritius
| | - Aquillah Kanzi
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Yumna Moosa
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Richard Lessells
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Vagner Fonseca
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Laboratório de Genética Celular e Molecular, ICB, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - John Mogaka
- Discipline of Public Health, University of Kwazulu-Natal, Durban, South Africa
| | - Robert Power
- St Edmund Hall, Oxford University, Oxford, United Kingdom
| | - Tulio de Oliveira
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Department of Global Health, University of Washington, Seattle, WA, United States
| |
Collapse
|
5
|
Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N, Matschiner M, Mendes FK, Müller NF, Ogilvie HA, du Plessis L, Popinga A, Rambaut A, Rasmussen D, Siveroni I, Suchard MA, Wu CH, Xie D, Zhang C, Stadler T, Drummond AJ. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2019; 15:e1006650. [PMID: 30958812 PMCID: PMC6472827 DOI: 10.1371/journal.pcbi.1006650] [Citation(s) in RCA: 1610] [Impact Index Per Article: 322.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 04/18/2019] [Accepted: 02/04/2019] [Indexed: 11/18/2022] Open
Abstract
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.
Collapse
Affiliation(s)
- Remco Bouckaert
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Timothy G. Vaughan
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joëlle Barido-Sottani
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sebastián Duchêne
- Department of Biochemistry and Molecular Biology, University of Melbourne, Melbourne, Victoria, Australia
| | - Mathieu Fourment
- ithree institute, University of Technology Sydney, Sydney, Australia
| | | | | | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE 405 30 Göteborg, Sweden
| | - Denise Kühnert
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridgeshire, UK
| | - Michael Matschiner
- Department of Environmental Sciences, University of Basel, 4051 Basel, Switzerland
| | - Fábio K. Mendes
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Nicola F. Müller
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Huw A. Ogilvie
- Department of Computer Science, Rice University, Houston, TX 77005-1892, USA
| | - Louis du Plessis
- Department of Zoology, University of Oxford, Oxford, OX1 3PS, UK
| | - Alex Popinga
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, EH9 3FL UK
| | - David Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27695, USA
| | - Igor Siveroni
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK
| | - Marc A. Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Chieh-Hsi Wu
- Department of Statistics, University of Oxford, OX1 3LB, UK
| | - Dong Xie
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Chi Zhang
- Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
| | - Tanja Stadler
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alexei J. Drummond
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| |
Collapse
|
6
|
Seo TK, Thorne JL. Information Criteria for Comparing Partition Schemes. Syst Biol 2018; 67:616-632. [PMID: 29309694 DOI: 10.1093/sysbio/syx097] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 12/17/2017] [Indexed: 01/10/2023] Open
Abstract
When inferring phylogenies, one important decision is whether and how nucleotide substitution parameters should be shared across different subsets or partitions of the data. One sort of partitioning error occurs when heterogeneous subsets are mistakenly lumped together and treated as if they share parameter values. The opposite kind of error is mistakenly treating homogeneous subsets as if they result from distinct sets of parameters. Lumping and splitting errors are not equally bad. Lumping errors can yield parameter estimates that do not accurately reflect any of the subsets that were combined whereas splitting errors yield estimates that did not benefit from sharing information across partitions. Phylogenetic partitioning decisions are often made by applying information criteria such as the Akaike information criterion (AIC). As with other information criteria, the AIC evaluates a model or partition scheme by combining the maximum log-likelihood value with a penalty that depends on the number of parameters being estimated. For the purpose of selecting an optimal partitioning scheme, we derive an adjustment to the AIC that we refer to as the AIC$^{(p)}$ and that is motivated by the idea that splitting errors are less serious than lumping errors. We also introduce a similar adjustment to the Bayesian information criterion (BIC) that we refer to as the BIC$^{(p)}$. Via simulation and empirical data analysis, we contrast AIC and BIC behavior to our suggested adjustments. We discuss these results and also emphasize why we expect the probability of lumping errors with the AIC$^{(p)}$ and the BIC$^{(p)}$ to be relatively robust to model parameterization.
Collapse
Affiliation(s)
- Tae-Kun Seo
- Department of Biological Sciences, Korea Polar Research Institute, 26 Songdomirae-ro, Yeonsu-gu, Incheon 406-840, Republic of Korea
| | - Jeffrey L Thorne
- Bioinformatics Research Center, Box 7566, North Carolina State University, Raleigh NC 27695-7566, USA
| |
Collapse
|
7
|
Bromham L, Duchêne S, Hua X, Ritchie AM, Duchêne DA, Ho SYW. Bayesian molecular dating: opening up the black box. Biol Rev Camb Philos Soc 2017; 93:1165-1191. [DOI: 10.1111/brv.12390] [Citation(s) in RCA: 104] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 11/13/2017] [Accepted: 11/17/2017] [Indexed: 12/27/2022]
Affiliation(s)
- Lindell Bromham
- Macroevolution & Macroecology, Division of Ecology & Evolution, Research School of Biology; Australian National University; Canberra ACT 2601 Australia
| | - Sebastián Duchêne
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute; The University of Melbourne; Melbourne VIC 3010 Australia
- School of Life and Environmental Sciences; University of Sydney; Sydney NSW 2006 Australia
| | - Xia Hua
- Macroevolution & Macroecology, Division of Ecology & Evolution, Research School of Biology; Australian National University; Canberra ACT 2601 Australia
| | - Andrew M. Ritchie
- School of Life and Environmental Sciences; University of Sydney; Sydney NSW 2006 Australia
| | - David A. Duchêne
- Macroevolution & Macroecology, Division of Ecology & Evolution, Research School of Biology; Australian National University; Canberra ACT 2601 Australia
- School of Life and Environmental Sciences; University of Sydney; Sydney NSW 2006 Australia
| | - Simon Y. W. Ho
- School of Life and Environmental Sciences; University of Sydney; Sydney NSW 2006 Australia
| |
Collapse
|
8
|
Abstract
Understanding how and why language subsystems differ in their evolutionary dynamics is a fundamental question for historical and comparative linguistics. One key dynamic is the rate of language change. While it is commonly thought that the rapid rate of change hampers the reconstruction of deep language relationships beyond 6,000-10,000 y, there are suggestions that grammatical structures might retain more signal over time than other subsystems, such as basic vocabulary. In this study, we use a Dirichlet process mixture model to infer the rates of change in lexical and grammatical data from 81 Austronesian languages. We show that, on average, most grammatical features actually change faster than items of basic vocabulary. The grammatical data show less schismogenesis, higher rates of homoplasy, and more bursts of contact-induced change than the basic vocabulary data. However, there is a core of grammatical and lexical features that are highly stable. These findings suggest that different subsystems of language have differing dynamics and that careful, nuanced models of language change will be needed to extract deeper signal from the noise of parallel evolution, areal readaptation, and contact.
Collapse
|
9
|
Liang D, Leung RKK, Lee SS, Kam KM. Insights into intercontinental spread of Zika virus. PLoS One 2017; 12:e0176710. [PMID: 28448611 PMCID: PMC5407806 DOI: 10.1371/journal.pone.0176710] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 04/16/2017] [Indexed: 12/02/2022] Open
Abstract
The epidemic of Zika virus (ZIKV) infection in South America has led to World Health Organization's declaration of a Public Health Emergency of International Concern. To further inform effective public health policy, an understanding of ZIKV's transmission mechanisms is crucial. To characterize the intercontinental transmission of ZIKV, we compiled and analyzed more than 250 gene sequences together with their sequence-related geographic and temporal information, sampled across 27 countries spanning from 1947 to 2016. After filtering and selecting appropriate sequences, extensive phylogenetic analyses were performed. Although phylogeographic reconstruction supported the transmission route of the virus in Africa, South-eastern Asia, Oceania and Latin America, we discovered that the Eastern Africa origin of ZIKV was disputable. On a molecular level, purifying selection was found to be largely responsible for the evolution of non-structural protein 5 and envelope protein E. Our dataset and ancestral sequences reconstruction analysis captured previously unidentified amino acid changes during evolution. Finally, based on the estimation of the time to the most recent common ancestors for the non-structural protein 5 gene, we hypothesized potential specific historic events that occurred in the 1940s and might have facilitated the spread of Zika virus from Africa to South-eastern Asia. Our findings provide new insights into the transmission characteristics of ZIKV, while further genetic and serologic studies are warranted to support the design of tailored prevention strategies.
Collapse
Affiliation(s)
- Dachao Liang
- Stanley Ho Centre for Emerging Infectious Diseases, The Chinese University of Hong Kong, Hong Kong, China
| | - Ross Ka Kit Leung
- Stanley Ho Centre for Emerging Infectious Diseases, The Chinese University of Hong Kong, Hong Kong, China
| | - Shui Shan Lee
- Stanley Ho Centre for Emerging Infectious Diseases, The Chinese University of Hong Kong, Hong Kong, China
| | - Kai Man Kam
- Stanley Ho Centre for Emerging Infectious Diseases, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
10
|
Fan W, Sun Z, Shen T, Xu D, Huang K, Zhou J, Song S, Yan L. Analysis of Evolutionary Processes of Species Jump in Waterfowl Parvovirus. Front Microbiol 2017; 8:421. [PMID: 28352261 PMCID: PMC5349109 DOI: 10.3389/fmicb.2017.00421] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2017] [Accepted: 02/28/2017] [Indexed: 01/28/2023] Open
Abstract
Waterfowl parvoviruses are classified into goose parvovirus (GPV) and Muscovy duck parvovirus (MDPV) according to their antigenic features and host preferences. A novel duck parvovirus (NDPV), identified as a new variant of GPV, is currently infecting ducks, thus causing considerable economic loss. This study analyzed the molecular evolution and population dynamics of the emerging parvovirus capsid gene to investigate the evolutionary processes concerning the host shift of NDPV. Two important amino acids changes (Asn-489 and Asn-650) were identified in NDPV, which may be responsible for host shift of NDPV. Phylogenetic analysis indicated that the currently circulating NDPV originated from the GPV lineage. The Bayesian Markov chain Monte Carlo tree indicated that the NDPV diverged from GPV approximately 20 years ago. Evolutionary rate analyses demonstrated that GPV evolved with 7.674 × 10-4 substitutions/site/year, and the data for MDPV was 5.237 × 10-4 substitutions/site/year, whereas the substitution rate in NDPV branch was 2.25 × 10-3 substitutions/site/year. Meanwhile, viral population dynamics analysis revealed that the GPV major clade, including NDPV, grew exponentially at a rate of 1.717 year-1. Selection pressure analysis showed that most sites are subject to strong purifying selection and no positively selected sites were found in NDPV. The unique immune-epitopes in waterfowl parvovirus were also estimated, which may be helpful for the prediction of antibody binding sites against NDPV in ducks.
Collapse
Affiliation(s)
- Wentao Fan
- College of Veterinary Medicine, Nanjing Agricultural University Nanjing, China
| | - Zhaoyu Sun
- College of Veterinary Medicine, Nanjing Agricultural UniversityNanjing, China; Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural UniversityNanjing, China
| | - Tongtong Shen
- College of Veterinary Medicine, Nanjing Agricultural University Nanjing, China
| | - Danning Xu
- Waterfowl Healthy Breeding Engineering Research Center, Guangdong Higher Education Institutes Guangzhou, China
| | - Kehe Huang
- College of Veterinary Medicine, Nanjing Agricultural University Nanjing, China
| | - Jiyong Zhou
- College of Veterinary Medicine, Nanjing Agricultural UniversityNanjing, China; Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural UniversityNanjing, China
| | - Suquan Song
- College of Veterinary Medicine, Nanjing Agricultural University Nanjing, China
| | - Liping Yan
- College of Veterinary Medicine, Nanjing Agricultural UniversityNanjing, China; Jiangsu Engineering Laboratory of Animal Immunology, Institute of Immunology and College of Veterinary Medicine, Nanjing Agricultural UniversityNanjing, China
| |
Collapse
|
11
|
Bouckaert RR, Drummond AJ. bModelTest: Bayesian phylogenetic site model averaging and model comparison. BMC Evol Biol 2017; 17:42. [PMID: 28166715 PMCID: PMC5294809 DOI: 10.1186/s12862-017-0890-6] [Citation(s) in RCA: 403] [Impact Index Per Article: 57.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Accepted: 01/19/2017] [Indexed: 11/10/2022] Open
Abstract
Background Reconstructing phylogenies through Bayesian methods has many benefits, which include providing a mathematically sound framework, providing realistic estimates of uncertainty and being able to incorporate different sources of information based on formal principles. Bayesian phylogenetic analyses are popular for interpreting nucleotide sequence data, however for such studies one needs to specify a site model and associated substitution model. Often, the parameters of the site model is of no interest and an ad-hoc or additional likelihood based analysis is used to select a single site model. Results bModelTest allows for a Bayesian approach to inferring and marginalizing site models in a phylogenetic analysis. It is based on trans-dimensional Markov chain Monte Carlo (MCMC) proposals that allow switching between substitution models as well as estimating the posterior probability for gamma-distributed rate heterogeneity, a proportion of invariable sites and unequal base frequencies. The model can be used with the full set of time-reversible models on nucleotides, but we also introduce and demonstrate the use of two subsets of time-reversible substitution models. Conclusion With the new method the site model can be inferred (and marginalized) during the MCMC analysis and does not need to be pre-determined, as is now often the case in practice, by likelihood-based methods. The method is implemented in the bModelTest package of the popular BEAST 2 software, which is open source, licensed under the GNU Lesser General Public License and allows joint site model and tree inference under a wide range of models. Electronic supplementary material The online version of this article (doi:10.1186/s12862-017-0890-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Remco R Bouckaert
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand. .,Department of Computer Science, University of Auckland, Auckland, New Zealand. .,Max Planck Institute for the Science of Human History, Jena, Germany.
| | - Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.,Department of Computer Science, University of Auckland, Auckland, New Zealand
| |
Collapse
|
12
|
In the shadows: Phylogenomics and coalescent species delimitation unveil cryptic diversity in a Cerrado endemic lizard (Squamata: Tropidurus). Mol Phylogenet Evol 2017; 107:455-465. [DOI: 10.1016/j.ympev.2016.12.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Revised: 11/07/2016] [Accepted: 12/07/2016] [Indexed: 11/18/2022]
|
13
|
Frandsen PB, Calcott B, Mayer C, Lanfear R. Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates. BMC Evol Biol 2015; 15:13. [PMID: 25887041 PMCID: PMC4327964 DOI: 10.1186/s12862-015-0283-7] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2014] [Accepted: 01/13/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Model selection is a vital part of most phylogenetic analyses, and accounting for the heterogeneity in evolutionary patterns across sites is particularly important. Mixture models and partitioning are commonly used to account for this variation, and partitioning is the most popular approach. Most current partitioning methods require some a priori partitioning scheme to be defined, typically guided by known structural features of the sequences, such as gene boundaries or codon positions. Recent evidence suggests that these a priori boundaries often fail to adequately account for variation in rates and patterns of evolution among sites. Furthermore, new phylogenomic datasets such as those assembled from ultra-conserved elements lack obvious structural features on which to define a priori partitioning schemes. The upshot is that, for many phylogenetic datasets, partitioned models of molecular evolution may be inadequate, thus limiting the accuracy of downstream phylogenetic analyses. RESULTS We present a new algorithm that automatically selects a partitioning scheme via the iterative division of the alignment into subsets of similar sites based on their rates of evolution. We compare this method to existing approaches using a wide range of empirical datasets, and show that it consistently leads to large increases in the fit of partitioned models of molecular evolution when measured using AICc and BIC scores. In doing so, we demonstrate that some related approaches to solving this problem may have been associated with a small but important bias. CONCLUSIONS Our method provides an alternative to traditional approaches to partitioning, such as dividing alignments by gene and codon position. Because our method is data-driven, it can be used to estimate partitioned models for all types of alignments, including those that are not amenable to traditional approaches to partitioning.
Collapse
Affiliation(s)
- Paul B Frandsen
- Office of Research Information Services, Office of the CIO, Smithsonian Institution, Washington, D.C., USA. .,Department of Entomology, Rutgers University, New Brunswick, New Jersey, USA.
| | - Brett Calcott
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Christoph Mayer
- Zoologisches Forschungsmuseum Alexander Koenig (ZFMK)/Zentrum für Molekulare Biodiversitätsforschung (ZMB), Bonn, Germany.
| | - Robert Lanfear
- Ecology Evolution and Genetics, Research School of Biology, Australian National University, Canberra, ACT, Australia. .,National Evolutionary Synthesis Center, Durham, NC, USA. .,Department of Biological Sciences, Macquarie University, Sydney, Australia.
| |
Collapse
|
14
|
Abstract
Partitioning is a commonly used method in phylogenetics that aims to accommodate variation in substitution patterns among sites. Despite its popularity, there have been few systematic studies of its effects on phylogenetic inference, and there have been no studies that compare the effects of different approaches to partitioning across many empirical data sets. In this study, we applied four commonly used approaches to partitioning to each of 34 empirical data sets, and then compared the resulting tree topologies, branch-lengths, and bootstrap support estimated using each approach. We find that the choice of partitioning scheme often affects tree topology, particularly when partitioning is omitted. Most notably, we find occasional instances where the use of a suboptimal partitioning scheme produces highly supported but incorrect nodes in the tree. Branch-lengths and bootstrap support are also affected by the choice of partitioning scheme, sometimes dramatically so. We discuss the reasons for these effects and make some suggestions for best practice.
Collapse
Affiliation(s)
- David Kainer
- Division of Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - Robert Lanfear
- Division of Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra, ACT, Australia National Evolutionary Synthesis Center, Durham, NC Department of Biological Sciences, Macquarie University, Sydney, NSW, Australia
| |
Collapse
|
15
|
Persing A, Jasra A, Beskos A, Balding D, De Iorio M. A simulation approach for change-points on phylogenetic trees. J Comput Biol 2014; 22:10-24. [PMID: 25506749 DOI: 10.1089/cmb.2014.0218] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We observe n sequences at each of m sites and assume that they have evolved from an ancestral sequence that forms the root of a binary tree of known topology and branch lengths, but the sequence states at internal nodes are unknown. The topology of the tree and branch lengths are the same for all sites, but the parameters of the evolutionary model can vary over sites. We assume a piecewise constant model for these parameters, with an unknown number of change-points and hence a transdimensional parameter space over which we seek to perform Bayesian inference. We propose two novel ideas to deal with the computational challenges of such inference. Firstly, we approximate the model based on the time machine principle: the top nodes of the binary tree (near the root) are replaced by an approximation of the true distribution; as more nodes are removed from the top of the tree, the cost of computing the likelihood is reduced linearly in n. The approach introduces a bias, which we investigate empirically. Secondly, we develop a particle marginal Metropolis-Hastings (PMMH) algorithm, that employs a sequential Monte Carlo (SMC) sampler and can use the first idea. Our time-machine PMMH algorithm copes well with one of the bottle-necks of standard computational algorithms: the transdimensional nature of the posterior distribution. The algorithm is implemented on simulated and real data examples, and we empirically demonstrate its potential to outperform competing methods based on approximate Bayesian computation (ABC) techniques.
Collapse
Affiliation(s)
- Adam Persing
- 1 Department of Statistical Science, University College London , London, United Kingdom
| | | | | | | | | |
Collapse
|
16
|
Berv JS, Prum RO. A comprehensive multilocus phylogeny of the Neotropical cotingas (Cotingidae, Aves) with a comparative evolutionary analysis of breeding system and plumage dimorphism and a revised phylogenetic classification. Mol Phylogenet Evol 2014; 81:120-36. [PMID: 25234241 DOI: 10.1016/j.ympev.2014.09.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2014] [Revised: 07/24/2014] [Accepted: 09/06/2014] [Indexed: 10/24/2022]
Abstract
The Neotropical cotingas (Cotingidae: Aves) are a group of passerine birds that are characterized by extreme diversity in morphology, ecology, breeding system, and behavior. Here, we present a comprehensive phylogeny of the Neotropical cotingas based on six nuclear and mitochondrial loci (∼7500 bp) for a sample of 61 cotinga species in all 25 genera, and 22 species of suboscine outgroups. Our taxon sample more than doubles the number of cotinga species studied in previous analyses, and allows us to test the monophyly of the cotingas as well as their intrageneric relationships with high resolution. We analyze our genetic data using a Bayesian species tree method, and concatenated Bayesian and maximum likelihood methods, and present a highly supported phylogenetic hypothesis. We confirm the monophyly of the cotingas, and present the first phylogenetic evidence for the relationships of Phibalura flavirostris as the sister group to Ampelion and Doliornis, and the paraphyly of Lipaugus with respect to Tijuca. In addition, we resolve the diverse radiations within the Cotinga, Lipaugus, Pipreola, and Procnias genera. We find no support for Darwin's (1871) hypothesis that the increase in sexual selection associated with polygynous breeding systems drives the evolution of color dimorphism in the cotingas, at least when analyzed at a broad categorical scale. Finally, we present a new comprehensive phylogenetic classification of all cotinga species.
Collapse
Affiliation(s)
- Jacob S Berv
- Department of Ecology and Evolutionary Biology and Peabody Museum of Natural History, Yale University, P.O. Box 208105, New Haven, CT 06520, USA.
| | - Richard O Prum
- Department of Ecology and Evolutionary Biology and Peabody Museum of Natural History, Yale University, P.O. Box 208105, New Haven, CT 06520, USA.
| |
Collapse
|
17
|
Duchêne S, Ho SY. Using multiple relaxed-clock models to estimate evolutionary timescales from DNA sequence data. Mol Phylogenet Evol 2014; 77:65-70. [DOI: 10.1016/j.ympev.2014.04.010] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2014] [Revised: 03/11/2014] [Accepted: 04/08/2014] [Indexed: 11/25/2022]
|
18
|
Bloom JD. An experimentally informed evolutionary model improves phylogenetic fit to divergent lactamase homologs. Mol Biol Evol 2014; 31:2753-69. [PMID: 25063439 PMCID: PMC4166927 DOI: 10.1093/molbev/msu220] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Phylogenetic analyses of molecular data require a quantitative model for how
sequences evolve. Traditionally, the details of the site-specific selection that
governs sequence evolution are not known a priori, making it challenging to
create evolutionary models that adequately capture the heterogeneity of
selection at different sites. However, recent advances in high-throughput
experiments have made it possible to quantify the effects of all single
mutations on gene function. I have previously shown that such high-throughput
experiments can be combined with knowledge of underlying mutation rates to
create a parameter-free evolutionary model that describes the phylogeny of
influenza nucleoprotein far better than commonly used existing models. Here, I
extend this work by showing that published experimental data on TEM-1
beta-lactamase (Firnberg E, Labonte JW, Gray JJ, Ostermeier M. 2014. A
comprehensive, high-resolution map of a gene’s fitness landscape.
Mol Biol Evol. 31:1581–1592) can be combined with a
few mutation rate parameters to create an evolutionary model that describes
beta-lactamase phylogenies much better than most common existing models. This
experimentally informed evolutionary model is superior even for homologs that
are substantially diverged (about 35% divergence at the protein level)
from the TEM-1 parent that was the subject of the experimental study. These
results suggest that experimental measurements can inform phylogenetic
evolutionary models that are applicable to homologs that span a substantial
range of sequence divergence.
Collapse
Affiliation(s)
- Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA
| |
Collapse
|
19
|
Abstract
All modern approaches to molecular phylogenetics require a quantitative model for how genes evolve. Unfortunately, existing evolutionary models do not realistically represent the site-heterogeneous selection that governs actual sequence change. Attempts to remedy this problem have involved augmenting these models with a burgeoning number of free parameters. Here, I demonstrate an alternative: Experimental determination of a parameter-free evolutionary model via mutagenesis, functional selection, and deep sequencing. Using this strategy, I create an evolutionary model for influenza nucleoprotein that describes the gene phylogeny far better than existing models with dozens or even hundreds of free parameters. Emerging high-throughput experimental strategies such as the one employed here provide fundamentally new information that has the potential to transform the sensitivity of phylogenetic and genetic analyses.
Collapse
Affiliation(s)
- Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA
| |
Collapse
|
20
|
Jia F, Lo N, Ho SYW. The impact of modelling rate heterogeneity among sites on phylogenetic estimates of intraspecific evolutionary rates and timescales. PLoS One 2014; 9:e95722. [PMID: 24798481 PMCID: PMC4010409 DOI: 10.1371/journal.pone.0095722] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 03/28/2014] [Indexed: 12/23/2022] Open
Abstract
Phylogenetic analyses of DNA sequence data can provide estimates of evolutionary rates and timescales. Nearly all phylogenetic methods rely on accurate models of nucleotide substitution. A key feature of molecular evolution is the heterogeneity of substitution rates among sites, which is often modelled using a discrete gamma distribution. A widely used derivative of this is the gamma-invariable mixture model, which assumes that a proportion of sites in the sequence are completely resistant to change, while substitution rates at the remaining sites are gamma-distributed. For data sampled at the intraspecific level, however, biological assumptions involved in the invariable-sites model are commonly violated. We examined the use of these models in analyses of five intraspecific data sets. We show that using 6-10 rate categories for the discrete gamma distribution of rates among sites is sufficient to provide a good approximation of the marginal likelihood. Increasing the number of gamma rate categories did not have a substantial effect on estimates of the substitution rate or coalescence time, unless rates varied strongly among sites in a non-gamma-distributed manner. The assumption of a proportion of invariable sites provided a better approximation of the asymptotic marginal likelihood when the number of gamma categories was small, but had minimal impact on estimates of rates and coalescence times. However, the estimated proportion of invariable sites was highly susceptible to changes in the number of gamma rate categories. The concurrent use of gamma and invariable-site models for intraspecific data is not biologically meaningful and has been challenged on statistical grounds; here we have found that the assumption of a proportion of invariable sites has no obvious impact on Bayesian estimates of rates and timescales from intraspecific data.
Collapse
Affiliation(s)
- Fangzhi Jia
- School of Biological Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Nathan Lo
- School of Biological Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Simon Y. W. Ho
- School of Biological Sciences, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
21
|
Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis A. Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evol Biol 2014. [PMID: 24742000 DOI: 10.1186/1472-2148-14-82] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023] Open
Abstract
BACKGROUND Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics. METHODS We develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere. RESULTS We compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores. CONCLUSIONS These two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets.
Collapse
Affiliation(s)
- Robert Lanfear
- Ecology Evolution and Genetics, Research School of Biology, Australian National University, Canberra, ACT, Australia.
| | | | | | | | | |
Collapse
|
22
|
Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis A. Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evol Biol 2014; 14:82. [PMID: 24742000 PMCID: PMC4012149 DOI: 10.1186/1471-2148-14-82] [Citation(s) in RCA: 426] [Impact Index Per Article: 42.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 04/03/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics. METHODS We develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere. RESULTS We compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores. CONCLUSIONS These two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets.
Collapse
Affiliation(s)
- Robert Lanfear
- Ecology Evolution and Genetics, Research School of Biology, Australian National University, Canberra, ACT, Australia.
| | | | | | | | | |
Collapse
|
23
|
Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis A. Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evol Biol 2014. [PMID: 24742000 DOI: 10.6084/m9.figshare.938920] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023] Open
Abstract
BACKGROUND Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics. METHODS We develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere. RESULTS We compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores. CONCLUSIONS These two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets.
Collapse
Affiliation(s)
- Robert Lanfear
- Ecology Evolution and Genetics, Research School of Biology, Australian National University, Canberra, ACT, Australia.
| | | | | | | | | |
Collapse
|
24
|
Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, Suchard MA, Rambaut A, Drummond AJ. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2014; 10:e1003537. [PMID: 24722319 PMCID: PMC3985171 DOI: 10.1371/journal.pcbi.1003537] [Citation(s) in RCA: 3729] [Impact Index Per Article: 372.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 01/20/2014] [Indexed: 12/15/2022] Open
Abstract
We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format.
Collapse
Affiliation(s)
- Remco Bouckaert
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
- * E-mail: (RB); (AJD)
| | - Joseph Heled
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Denise Kühnert
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
- Department of Environmental Systems Science, ETH Zürich, Zürich, Switzerland
| | - Tim Vaughan
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
| | - Chieh-Hsi Wu
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Dong Xie
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Marc A. Suchard
- Departments of Biomathematics and Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America
- Department of Biostatistics, School of Public Health, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Alexei J. Drummond
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Auckland, New Zealand
- * E-mail: (RB); (AJD)
| |
Collapse
|
25
|
Douzery EJP, Scornavacca C, Romiguier J, Belkhir K, Galtier N, Delsuc F, Ranwez V. OrthoMaM v8: A Database of Orthologous Exons and Coding Sequences for Comparative Genomics in Mammals. Mol Biol Evol 2014; 31:1923-8. [DOI: 10.1093/molbev/msu132] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
26
|
Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, Suchard MA, Rambaut A, Drummond AJ. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2014. [PMID: 24722319 DOI: 10.1371/journal.pcbi.1003537i] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023] Open
Abstract
We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format.
Collapse
Affiliation(s)
- Remco Bouckaert
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Joseph Heled
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Denise Kühnert
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand; Department of Environmental Systems Science, ETH Zürich, Zürich, Switzerland
| | - Tim Vaughan
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
| | - Chieh-Hsi Wu
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Dong Xie
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Marc A Suchard
- Departments of Biomathematics and Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America; Department of Biostatistics, School of Public Health, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Alexei J Drummond
- Computational Evolution Group, Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Auckland, New Zealand
| |
Collapse
|
27
|
Duchêne S, Molak M, Ho SYW. ClockstaR: choosing the number of relaxed-clock models in molecular phylogenetic analysis. ACTA ACUST UNITED AC 2013; 30:1017-9. [PMID: 24234002 DOI: 10.1093/bioinformatics/btt665] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
SUMMARY Relaxed molecular clocks allow the phylogenetic estimation of evolutionary timescales even when substitution rates vary among branches. In analyses of large multigene datasets, it is often appropriate to use multiple relaxed-clock models to accommodate differing patterns of rate variation among genes. We present ClockstaR, a method for selecting the number of relaxed clocks for multigene datasets. AVAILABILITY ClockstaR is freely available for download at http://sydney.edu.au/science/biology/meep/software/.
Collapse
Affiliation(s)
- Sebastián Duchêne
- School of Biological Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | | | | |
Collapse
|
28
|
Barth JMI, Matschiner M, Robertson BC. Phylogenetic position and subspecies divergence of the endangered New Zealand Dotterel (Charadrius obscurus). PLoS One 2013; 8:e78068. [PMID: 24205094 PMCID: PMC3808304 DOI: 10.1371/journal.pone.0078068] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 09/15/2013] [Indexed: 11/19/2022] Open
Abstract
The New Zealand Dotterel (Charadrius obscurus), an endangered shorebird of the family Charadriidae, is endemic to New Zealand where two subspecies are recognized. These subspecies are not only separated geographically, with C. o. aquilonius being distributed in the New Zealand North Island and C. o. obscurus mostly restricted to Stewart Island, but also differ substantially in morphology and behavior. Despite these divergent traits, previous work has failed to detect genetic differentiation between the subspecies, and the question of when and where the two populations separated is still open. Here, we use mitochondrial and nuclear markers to address molecular divergence between the subspecies, and apply maximum likelihood and Bayesian methods to place C. obscurus within the non-monophyletic genus Charadrius. Despite very little overall differentiation, distinct haplotypes for the subspecies were detected, thus supporting molecular separation of the northern and southern populations. Phylogenetic analysis recovers a monophyletic clade combining the New Zealand Dotterel with two other New Zealand endemic shorebirds, the Wrybill and the Double-Banded Plover, thus suggesting a single dispersal event as the origin of this group. Divergence dates within Charadriidae were estimated with BEAST 2, and our results indicate a Middle Miocene origin of New Zealand endemic Charadriidae, a Late Miocene emergence of the lineage leading to the New Zealand Dotterel, and a Middle to Late Pleistocene divergence of the two New Zealand Dotterel subspecies.
Collapse
Affiliation(s)
- Julia M. I. Barth
- Department of Zoology, University of Otago, Dunedin, New Zealand
- * E-mail:
| | - Michael Matschiner
- Allan Wilson Centre for Molecular Ecology and Evolution, Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | | |
Collapse
|