1
|
Mengual-Chuliá B, Bedhomme S, Lafforgue G, Elena SF, Bravo IG. Assessing parallel gene histories in viral genomes. BMC Evol Biol 2016; 16:32. [PMID: 26847371 PMCID: PMC4743424 DOI: 10.1186/s12862-016-0605-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 01/29/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The increasing abundance of sequence data has exacerbated a long known problem: gene trees and species trees for the same terminal taxa are often incongruent. Indeed, genes within a genome have not all followed the same evolutionary path due to events such as incomplete lineage sorting, horizontal gene transfer, gene duplication and deletion, or recombination. Considering conflicts between gene trees as an obstacle, numerous methods have been developed to deal with these incongruences and to reconstruct consensus evolutionary histories of species despite the heterogeneity in the history of their genes. However, inconsistencies can also be seen as a source of information about the specific evolutionary processes that have shaped genomes. RESULTS The goal of the approach here proposed is to exploit this conflicting information: we have compiled eleven variables describing phylogenetic relationships and evolutionary pressures and submitted them to dimensionality reduction techniques to identify genes with similar evolutionary histories. To illustrate the applicability of the method, we have chosen two viral datasets, namely papillomaviruses and Turnip mosaic virus (TuMV) isolates, largely dissimilar in genome, evolutionary distance and biology. Our method pinpoints viral genes with common evolutionary patterns. In the case of papillomaviruses, gene clusters match well our knowledge on viral biology and life cycle, illustrating the potential of our approach. For the less known TuMV, our results trigger new hypotheses about viral evolution and gene interaction. CONCLUSIONS The approach here presented allows turning phylogenetic inconsistencies into evolutionary information, detecting gene assemblies with similar histories, and could be a powerful tool for comparative pathogenomics.
Collapse
Affiliation(s)
- Beatriz Mengual-Chuliá
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain.,Bellvitge Institute of Biomedical Research (IDIBELL), Barcelona, Spain
| | - Stéphanie Bedhomme
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain.,Bellvitge Institute of Biomedical Research (IDIBELL), Barcelona, Spain.,Centre d'Ecologie Fonctionnelle et Evolutive, UMR CNRS 5175, Montpellier, France
| | - Guillaume Lafforgue
- Centre d'Ecologie Fonctionnelle et Evolutive, UMR CNRS 5175, Montpellier, France.,Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universidad Politécnica de Valencia, València, Spain
| | - Santiago F Elena
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universidad Politécnica de Valencia, València, Spain.,I2SysBio, Consejo Superior de Investigaciones Científicas-Universitat de València, València, Spain.,The Santa Fe Institute, Santa Fe, NM, USA
| | - Ignacio G Bravo
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain. .,MIVEGEC (UMR CNRS 5290, IRD 224, UM), National Center for Scientific Research (CNRS), Montpellier, France. .,National Center for Scientific Research (CNRS), Maladies Infectieuses et Vecteurs: Ecologie, Génétique, Evolution et Contrôle (MIVEGEC), UMR CNRS 5290, IRD 224, UM, 911 Avenue Agropolis, BP 64501, 34394, Montpellier, Cedex 5, France.
| |
Collapse
|
2
|
Lewis PO, Holder MT, Swofford DL. Phycas: Software for Bayesian Phylogenetic Analysis. Syst Biol 2015; 64:525-31. [DOI: 10.1093/sysbio/syu132] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 12/24/2014] [Indexed: 12/15/2022] Open
Affiliation(s)
- Paul O. Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT 06269, USA; 2Department of Ecology and Evolution, University of Kansas, 1200 Sunnyside Avenue, Lawrence, KS 66045, USA; and 3Department of Biology, Box 90338, Duke University, Durham, NC 27708, USA
| | - Mark T. Holder
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT 06269, USA; 2Department of Ecology and Evolution, University of Kansas, 1200 Sunnyside Avenue, Lawrence, KS 66045, USA; and 3Department of Biology, Box 90338, Duke University, Durham, NC 27708, USA
| | - David L. Swofford
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT 06269, USA; 2Department of Ecology and Evolution, University of Kansas, 1200 Sunnyside Avenue, Lawrence, KS 66045, USA; and 3Department of Biology, Box 90338, Duke University, Durham, NC 27708, USA
| |
Collapse
|
3
|
de Vienne DM, Ollier S, Aguileta G. Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Mol Biol Evol 2012; 29:1587-98. [PMID: 22319162 DOI: 10.1093/molbev/msr317] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Full genome data sets are currently being explored on a regular basis to infer phylogenetic trees, but there are often discordances among the trees produced by different genes. An important goal in phylogenomics is to identify which individual gene and species produce the same phylogenetic tree and are thus likely to share the same evolutionary history. On the other hand, it is also essential to identify which genes and species produce discordant topologies and therefore evolve in a different way or represent noise in the data. The latter are outlier genes or species and they can provide a wealth of information on potentially interesting biological processes, such as incomplete lineage sorting, hybridization, and horizontal gene transfers. Here, we propose a new method to explore the genomic tree space and detect outlier genes and species based on multiple co-inertia analysis (MCOA), which efficiently captures and compares the similarities in the phylogenetic topologies produced by individual genes. Our method allows the rapid identification of outlier genes and species by extracting the similarities and discrepancies, in terms of the pairwise distances, between all the species in all the trees, simultaneously. This is achieved by using MCOA, which finds successive decomposition axes from individual ordinations (i.e., derived from distance matrices) that maximize a covariance function. The method is freely available as a set of R functions. The source code and tutorial can be found online at http://phylomcoa.cgenomics.org.
Collapse
Affiliation(s)
- Damien M de Vienne
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Barcelona, Spain.
| | | | | |
Collapse
|
4
|
Dixon CJ, Schönswetter P, Vargas P, Ertl S, Schneeweiss GM. Bayesian hypothesis testing supports long-distance Pleistocene migrations in a European high mountain plant (Androsace vitaliana, Primulaceae). Mol Phylogenet Evol 2009; 53:580-91. [PMID: 19622392 DOI: 10.1016/j.ympev.2009.07.016] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Revised: 07/01/2009] [Accepted: 07/06/2009] [Indexed: 11/19/2022]
Abstract
Colonization of the south-western European mountain ranges is suggested to have predominantly progressed from the Iberian Peninsula eastwards, but this hypothesis has never been tested in a statistical framework. Here, we test this hypothesis using Androsace vitaliana, a high elevation species with eight mostly allopatric subspecies, which is widely but disjunctly distributed across all major south-western European mountain ranges. To this end, we use plastid and nuclear sequence data as well as fingerprint (amplified fragment length polymorphisms) data and employ Bayesian methods, which allow co-estimation of genealogy and divergence times using explicit demographic models, as well as hypothesis testing via Bayes factors. Irrespective of the ambiguity concerning where A. vitaliana started to diversify -- both the Alps and the mountain ranges of the Iberian Peninsula outside the Pyrenees were possible -- colonization routes were not simply unidirectional, but involved Pleistocene connections between the Alps and mountain ranges of the Iberian Peninsula bypassing the interjacent Pyrenees via long-distance dispersal. In contrast, the species' post-glacial history is shaped by regional gene pool homogenization resulting in the genetic pattern showing good congruence with geographical proximity in agreement with a vicariance model, but only partly supporting current taxonomy.
Collapse
Affiliation(s)
- Christopher J Dixon
- Department of Biogeography and Botanical Garden, University of Vienna, Vienna, Austria.
| | | | | | | | | |
Collapse
|
5
|
Drummond AJ, Suchard MA. Fully Bayesian tests of neutrality using genealogical summary statistics. BMC Genet 2008; 9:68. [PMID: 18976476 PMCID: PMC2645432 DOI: 10.1186/1471-2156-9-68] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2008] [Accepted: 10/31/2008] [Indexed: 11/10/2022] Open
Abstract
Background Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Results Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Conclusion Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.
Collapse
Affiliation(s)
- Alexei J Drummond
- Bioinformatics Institute, University of Auckland, Auckland, New Zealand.
| | | |
Collapse
|
6
|
Suchard MA, Weiss RE, Sinsheimer JS. Models for estimating bayes factors with applications to phylogeny and tests of monophyly. Biometrics 2005; 61:665-73. [PMID: 16135017 DOI: 10.1111/j.1541-0420.2005.00352.x] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Bayes factors comparing two or more competing hypotheses are often estimated by constructing a Markov chain Monte Carlo (MCMC) sampler to explore the joint space of the hypotheses. To obtain efficient Bayes factor estimates, Carlin and Chib (1995, Journal of the Royal Statistical Society, Series B57, 473-484) suggest adjusting the prior odds of the competing hypotheses so that the posterior odds are approximately one, then estimating the Bayes factor by simple division. A byproduct is that one often produces several independent MCMC chains, only one of which is actually used for estimation. We extend this approach to incorporate output from multiple chains by proposing three statistical models. The first assumes independent sampler draws and models the hypothesis indicator function using logistic regression for various choices of the prior odds. The two more complex models relax the independence assumption by allowing for higher-lag dependence within the MCMC output. These models allow us to estimate the uncertainty in our Bayes factor calculation and to fully use several different MCMC chains even when the prior odds of the hypotheses vary from chain to chain. We apply these methods to calculate Bayes factors for tests of monophyly in two phylogenetic examples. The first example explores the relationship of an unknown pathogen to a set of known pathogens. Identification of the unknown's monophyletic relationship may affect antibiotic choice in a clinical setting. The second example focuses on HIV recombination detection. For potential clinical application, these types of analyses must be completed as efficiently as possible.
Collapse
Affiliation(s)
- Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, California 90095, USA.
| | | | | |
Collapse
|
7
|
Abstract
We describe a novel model and algorithm for simultaneously estimating multiple molecular sequence alignments and the phylogenetic trees that relate the sequences. Unlike current techniques that base phylogeny estimates on a single estimate of the alignment, we take alignment uncertainty into account by considering all possible alignments. Furthermore, because the alignment and phylogeny are constructed simultaneously, a guide tree is not needed. This sidesteps the problem in which alignments created by progressive alignment are biased toward the guide tree used to generate them. Joint estimation also allows us to model rate variation between sites when estimating the alignment and to use the evidence in shared insertion/deletions (indels) to group sister taxa in the phylogeny. Our indel model makes use of affine gap penalties and considers indels of multiple letters. We make the simplifying assumption that the indel process is identical on all branches. As a result, the probability of a gap is independent of branch length. We use a Markov chain Monte Carlo (MCMC) method to sample from the posterior of the joint model, estimating the most probable alignment and tree and their support simultaneously. We describe a new MCMC transition kernel that improves our algorithm's mixing efficiency, allowing the MCMC chains to converge even when started from arbitrary alignments. Our software implementation can estimate alignment uncertainty and we describe a method for summarizing this uncertainty in a single plot.
Collapse
Affiliation(s)
- Benjamin D Redelings
- Department of Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1766, USA
| | | |
Collapse
|