1
|
Kück P, Romahn J, Meusemann K. Pitfalls of the site-concordance factor (sCF) as measure of phylogenetic branch support. NAR Genom Bioinform 2022; 4:lqac064. [PMID: 36128424 PMCID: PMC9477076 DOI: 10.1093/nargab/lqac064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 08/10/2022] [Accepted: 08/17/2022] [Indexed: 12/01/2022] Open
Abstract
Confidence measures of branch reliability play an important role in phylogenetics as these measures allow to identify trees or parts of a tree that are well supported by the data and thus adequate to serve as basis for evolutionary inference of biological systems. Unreliable branch relationships in phylogenetic analyses are of concern because of their potential to represent incorrect relationships of interest among more reliable branch relationships. The site-concordance factor implemented in the IQ-TREE package is a recently introduced heuristic solution to the problem of identifying unreliable branch relationships on the basis of quartets. We test the performance of the site-concordance measure with simple examples based on simulated data and designed to study its behaviour in branch support estimates related to different degrees of branch length heterogeneities among a ten sequence tree. Our results show that in particular in cases of relationships with heterogeneous branch lengths site-concordance measures may be misleading. We therefore argue that the maximum parsimony optimality criterion currently used by the site-concordance measure may sometimes be poorly suited to evaluate branch support and that the scores reported by the site-concordance factor should not be considered as reliable.
Collapse
Affiliation(s)
- Patrick Kück
- Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change , Adenauerallee 160, 53113 Bonn, Germany
| | - Juliane Romahn
- Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change , Adenauerallee 160, 53113 Bonn, Germany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG) , Senckenberganlage 25, 60325 Frankfurt am Main, Germany
- Senckenberg Society for Nature Research , Senckenberganlage 25, 60325 Frankfurt am Main, Germany
| | - Karen Meusemann
- Directorate, Leibniz Institute for the Analysis of Biodiversity Change , Adenauerallee 160, 53113 Bonn, Germany
| |
Collapse
|
2
|
Hillman ET, Kozik AJ, Hooker CA, Burnett JL, Heo Y, Kiesel VA, Nevins CJ, Oshiro JM, Robins MM, Thakkar RD, Wu ST, Lindemann SR. Comparative genomics of the genus Roseburia reveals divergent biosynthetic pathways that may influence colonic competition among species. Microb Genom 2020; 6:mgen000399. [PMID: 32589566 PMCID: PMC7478625 DOI: 10.1099/mgen.0.000399] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 06/03/2020] [Indexed: 12/16/2022] Open
Abstract
Roseburia species are important denizens of the human gut microbiome that ferment complex polysaccharides to butyrate as a terminal fermentation product, which influences human physiology and serves as an energy source for colonocytes. Previous comparative genomics analyses of the genus Roseburia have examined polysaccharide degradation genes. Here, we characterize the core and pangenomes of the genus Roseburia with respect to central carbon and energy metabolism, as well as biosynthesis of amino acids and B vitamins using orthology-based methods, uncovering significant differences among species in their biosynthetic capacities. Variation in gene content among Roseburia species and strains was most significant for cofactor biosynthesis. Unlike all other species of Roseburia that we analysed, Roseburia inulinivorans strains lacked biosynthetic genes for riboflavin or pantothenate but possessed folate biosynthesis genes. Differences in gene content for B vitamin synthesis were matched with differences in putative salvage and synthesis strategies among species. For example, we observed extended biotin salvage capabilities in R. intestinalis strains, which further suggest that B vitamin acquisition strategies may impact fitness in the gut ecosystem. As differences in the functional potential to synthesize components of biomass (e.g. amino acids, vitamins) can drive interspecies interactions, variation in auxotrophies of the Roseburia spp. genomes may influence in vivo gut ecology. This study serves to advance our understanding of the potential metabolic interactions that influence the ecology of Roseburia spp. and, ultimately, may provide a basis for rational strategies to manipulate the abundances of these species.
Collapse
Affiliation(s)
- Ethan T. Hillman
- Department of Agricultural and Biological Engineering, Purdue University, West Lafayette, IN 47907, USA
- Purdue University Interdisciplinary Life Science Program (PULSe), Purdue University, West Lafayette, IN 47907, USA
| | - Ariangela J. Kozik
- Purdue University Interdisciplinary Life Science Program (PULSe), Purdue University, West Lafayette, IN 47907, USA
- Department of Comparative Pathobiology, Purdue University, West Lafayette, IN 47907, USA
- Present address: Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - Casey A. Hooker
- Department of Agricultural and Biological Engineering, Purdue University, West Lafayette, IN 47907, USA
| | - John L. Burnett
- Department of Food Science, Purdue University, West Lafayette, IN 47907, USA
| | - Yoojung Heo
- Department of Agronomy, Purdue University, West Lafayette, IN 47907, USA
| | - Violet A. Kiesel
- Department of Nutrition Science, Purdue University, West Lafayette, IN 47907, USA
| | - Clayton J. Nevins
- Department of Agronomy, Purdue University, West Lafayette, IN 47907, USA
- Present address: Department of Soil and Water Sciences, University of Florida, Gainesville, FL 32603, USA
| | - Jordan M.K.I. Oshiro
- Department of Nutrition Science, Purdue University, West Lafayette, IN 47907, USA
| | - Melissa M. Robins
- Department of Agricultural and Biological Engineering, Purdue University, West Lafayette, IN 47907, USA
| | - Riya D. Thakkar
- Department of Food Science, Purdue University, West Lafayette, IN 47907, USA
- Whistler Center for Carbohydrate Research, Purdue University, West Lafayette, IN 47907, USA
| | - Sophie Tongyu Wu
- Department of Food Science, Purdue University, West Lafayette, IN 47907, USA
| | - Stephen R. Lindemann
- Purdue University Interdisciplinary Life Science Program (PULSe), Purdue University, West Lafayette, IN 47907, USA
- Department of Food Science, Purdue University, West Lafayette, IN 47907, USA
- Whistler Center for Carbohydrate Research, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
3
|
Zou Z, Zhang H, Guan Y, Zhang J. Deep Residual Neural Networks Resolve Quartet Molecular Phylogenies. Mol Biol Evol 2020; 37:1495-1507. [PMID: 31868908 PMCID: PMC8453599 DOI: 10.1093/molbev/msz307] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Phylogenetic inference is of fundamental importance to evolutionary as well as other fields of biology, and molecular sequences have emerged as the primary data for this task. Although many phylogenetic methods have been developed to explicitly take into account substitution models of sequence evolution, such methods could fail due to model misspecification or insufficiency, especially in the face of heterogeneities in substitution processes across sites and among lineages. In this study, we propose to infer topologies of four-taxon trees using deep residual neural networks, a machine learning approach needing no explicit modeling of the subject system and having a record of success in solving complex nonlinear inference problems. We train residual networks on simulated protein sequence data with extensive amino acid substitution heterogeneities. We show that the well-trained residual network predictors can outperform existing state-of-the-art inference methods such as the maximum likelihood method on diverse simulated test data, especially under extensive substitution heterogeneities. Reassuringly, residual network predictors generally agree with existing methods in the trees inferred from real phylogenetic data with known or widely believed topologies. Furthermore, when combined with the quartet puzzling algorithm, residual network predictors can be used to reconstruct trees with more than four taxa. We conclude that deep learning represents a powerful new approach to phylogenetic reconstruction, especially when sequences evolve via heterogeneous substitution processes. We present our best trained predictor in a freely available program named Phylogenetics by Deep Learning (PhyDL, https://gitlab.com/ztzou/phydl; last accessed January 3, 2020).
Collapse
Affiliation(s)
- Zhengting Zou
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| |
Collapse
|
4
|
de Sousa F, Foster PG, Donoghue PCJ, Schneider H, Cox CJ. Nuclear protein phylogenies support the monophyly of the three bryophyte groups (Bryophyta Schimp.). THE NEW PHYTOLOGIST 2019; 222:565-575. [PMID: 30411803 DOI: 10.1111/nph.15587] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 10/31/2018] [Indexed: 05/05/2023]
Abstract
Unraveling the phylogenetic relationships between the four major lineages of terrestrial plants (mosses, liverworts, hornworts, and vascular plants) is essential for an understanding of the evolution of traits specific to land plants, such as their complex life cycles, and the evolutionary development of stomata and vascular tissue. Well supported phylogenetic hypotheses resulting from different data and methods are often incongruent due to processes of nucleotide evolution that are difficult to model, for example substitutional saturation and composition heterogeneity. We reanalysed a large published dataset of nuclear data and modelled these processes using degenerate-codon recoding and tree-heterogeneous composition substitution models. Our analyses resolved bryophytes as a monophyletic group and showed that the nonnonmonophyly of the clade that is supported by the analysis of nuclear nucleotide data is due solely to fast-evolving synonymous substitutions. The current congruence among phylogenies of both nuclear and chloroplast analyses lent considerable support to the conclusion that the bryophytes are a monophyletic group. An initial split between bryophytes and vascular plants implies that the bryophyte life cycle (with a dominant gametophyte nurturing an unbranched sporophyte) may not be ancestral to all land plants and that stomata are likely to be a symplesiomorphy among embryophytes.
Collapse
Affiliation(s)
- Filipe de Sousa
- Centro de Ciências do Mar, Universidade do Algarve, Gambelas, Faro, 8005-319, Portugal
| | - Peter G Foster
- Department of Life Sciences, Natural History Museum, London, SW7 5BD, UK
| | | | - Harald Schneider
- Department of Life Sciences, Natural History Museum, London, SW7 5BD, UK
- School of Earth Sciences, University of Bristol, Bristol, BS8 1TQ, UK
- Center of Integrative Conservation, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Yunnan, 666303, China
| | - Cymon J Cox
- Centro de Ciências do Mar, Universidade do Algarve, Gambelas, Faro, 8005-319, Portugal
| |
Collapse
|
5
|
Duchêne DA, Duchêne S, Ho SYW. Differences in Performance among Test Statistics for Assessing Phylogenomic Model Adequacy. Genome Biol Evol 2018; 10:1375-1388. [PMID: 29788113 PMCID: PMC6007652 DOI: 10.1093/gbe/evy094] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/11/2018] [Indexed: 11/12/2022] Open
Abstract
Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.
Collapse
Affiliation(s)
- David A Duchêne
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Sebastian Duchêne
- Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Melbourne, VIC, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
6
|
Galen SC, Borner J, Martinsen ES, Schaer J, Austin CC, West CJ, Perkins SL. The polyphyly of Plasmodium: comprehensive phylogenetic analyses of the malaria parasites (order Haemosporida) reveal widespread taxonomic conflict. ROYAL SOCIETY OPEN SCIENCE 2018; 5:171780. [PMID: 29892372 PMCID: PMC5990803 DOI: 10.1098/rsos.171780] [Citation(s) in RCA: 98] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 04/20/2018] [Indexed: 05/29/2023]
Abstract
The evolutionary relationships among the apicomplexan blood pathogens known as the malaria parasites (order Haemosporida), some of which infect nearly 200 million humans each year, has remained a vexing phylogenetic problem due to limitations in taxon sampling, character sampling and the extreme nucleotide base composition biases that are characteristic of this clade. Previous phylogenetic work on the malaria parasites has often lacked sufficient representation of the broad taxonomic diversity within the Haemosporida or the multi-locus sequence data needed to resolve deep evolutionary relationships, rendering our understanding of haemosporidian life-history evolution and the origin of the human malaria parasites incomplete. Here we present the most comprehensive phylogenetic analysis of the malaria parasites conducted to date, using samples from a broad diversity of vertebrate hosts that includes numerous enigmatic and poorly known haemosporidian lineages in addition to genome-wide multi-locus sequence data. We find that if base composition differences were corrected for during phylogenetic analysis, we recovered a well-supported topology indicating that the evolutionary history of the malaria parasites was characterized by a complex series of transitions in life-history strategies and host usage. Notably we find that Plasmodium, the malaria parasite genus that includes the species of human medical concern, is polyphyletic with the life-history traits characteristic of this genus having evolved in a dynamic manner across the phylogeny. We find support for multiple instances of gain and loss of asexual proliferation in host blood cells and production of haemozoin pigment, two traits that have been used for taxonomic classification as well as considered to be important factors for parasite virulence and used as drug targets. Lastly, our analysis illustrates the need for a widespread reassessment of malaria parasite taxonomy.
Collapse
Affiliation(s)
- Spencer C. Galen
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St., New York, NY 10024, USA
- Richard Gilder Graduate School, American Museum of Natural History, Central Park West at 79th St., New York, NY 10024, USA
| | - Janus Borner
- Institute of Zoology, Biocenter Grindel, University of Hamburg, Martin-Luther-King-Platz 3, D-20146 Hamburg, Germany
| | - Ellen S. Martinsen
- Center for Conservation Genomics, Smithsonian Conservation Biology Institute, National Zoological Park, PO Box 37012, MRC5503, Washington, DC 20013-7012, USA
| | - Juliane Schaer
- Department of Biology, Humboldt University, 10115, Berlin, Germany
| | - Christopher C. Austin
- Department of Biological Sciences, Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USA
| | | | - Susan L. Perkins
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St., New York, NY 10024, USA
| |
Collapse
|
7
|
Pathak J, Kannaujiya VK, Singh SP, Sinha RP. Codon usage analysis of photolyase encoding genes of cyanobacteria inhabiting diverse habitats. 3 Biotech 2017; 7:192. [PMID: 28664377 DOI: 10.1007/s13205-017-0826-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 05/31/2017] [Indexed: 12/17/2022] Open
Abstract
Nucleotide and amino acid compositions were studied to determine the genomic and structural relationship of photolyase gene in freshwater, marine and hot spring cyanobacteria. Among three habitats, photolyase encoding genes from hot spring cyanobacteria were found to have highest GC content. The genomic GC content was found to influence the codon usage and amino acid variability in photolyases. The third position of codon was found to have more effect on amino acid variability in photolyases than the first and second positions of codon. The variation of amino acids Ala, Asp, Glu, Gly, His, Leu, Pro, Gln, Arg and Val in photolyases of three different habitats was found to be controlled by first position of codon (G1C1). However, second position (G2C2) of codon regulates variation of Ala, Cys, Gly, Pro, Arg, Ser, Thr and Tyr contents in photolyases. Third position (G3C3) of codon controls incorporation of amino acids such as Ala, Phe, Gly, Leu, Gln, Pro, Arg, Ser, Thr and Tyr in photolyases from three habitats. Photolyase encoding genes of hot spring cyanobacteria have 85% codons with G or C at third position, whereas marine and freshwater cyanobacteria showed 82 and 60% codons, respectively, with G or C at third position. Principal component analysis (PCA) showed that GC content has a profound effect in separating the genes along the first major axis according to their RSCU (relative synonymous codon usage) values, and neutrality analysis indicated that mutational pressure has resulted in codon bias in photolyase genes of cyanobacteria.
Collapse
Affiliation(s)
- Jainendra Pathak
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Vinod K Kannaujiya
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Shailendra P Singh
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Rajeshwar P Sinha
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India.
| |
Collapse
|
8
|
Irisarri I, Meyer A. The Identification of the Closest Living Relative(s) of Tetrapods: Phylogenomic Lessons for Resolving Short Ancient Internodes. Syst Biol 2016; 65:1057-1075. [PMID: 27425642 DOI: 10.1093/sysbio/syw057] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 06/08/2016] [Indexed: 01/08/2023] Open
Abstract
Identifying the closest living relative(s) of tetrapods is an important, yet still contested question in vertebrate phylogenetics. Three hypotheses are possible and ruling out alternatives has proven difficult even with large molecular data sets due to weak phylogenetic signal coupled nonphylogenetic noise resulting from relatively rapid speciation events that occurred a long time ago ([Formula: see text]400 Ma). Here, we revisit the identity of the closest living relative of land vertebrates from a phylogenomic perspective and include new genomic data for all extant lungfish genera. RNA-seq proves to be a great alternative to genomic sequencing, which currently is technically not feasible in lungfishes due to their huge (50-130 Gb) and repetitive genomes. We examined the most important sources of systematic error, namely long-branch attraction (LBA), compositional heterogeneity and distribution of missing data and applied different correction techniques. A multispecies coalescent approach is used to account for deep coalescence that might come from the short and deep internodes separating early sarcopterygian splits. Concatenation methods favored lungfishes as the closest living relatives of tetrapods with strong statistical support. Amino acid profile mixture models can unambiguously resolve this difficult internode thanks to their ability to avoid systematic error. We assessed the performance of different site-heterogeneous models and data partitioning and compared the ability of different strategies designed to overcome LBA, including taxon manipulation, reduction of among-lineage rate heterogeneity and removal of fast-evolving or compositionally heterogeneous positions. The identification of lungfish as sister group of tetrapods is robust regarding the effects of nonstationary composition and distribution of missing data. The multispecies coalescent method reconstructed strongly supported topologies that were congruent with concatenation, despite pervasive gene tree heterogeneity. We reject alternative topologies for early sarcopterygian relationships by increasing the signal-to-noise ratio in our alignments. The analytical pipeline outlined here combines probabilistic phylogenomic inference with methods for evaluating data quality, model adequacy, and assessing systematic error, and thus is likely to help resolve similarly difficult internodes in the tree of life. [Coalescence; coelacanth; compositional heterogeneity; gene tree; long-branch attraction; lungfish; missing data; model misspecification; phylogenomic; species tree; systematic error.].
Collapse
Affiliation(s)
- Iker Irisarri
- Laboratory for Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, 78464 Konstanz, Germany
| | - Axel Meyer
- Laboratory for Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, 78464 Konstanz, Germany
| |
Collapse
|
9
|
Simmons MP, Sloan DB, Gatesy J. The effects of subsampling gene trees on coalescent methods applied to ancient divergences. Mol Phylogenet Evol 2016; 97:76-89. [PMID: 26768112 DOI: 10.1016/j.ympev.2015.12.013] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Revised: 12/03/2015] [Accepted: 12/20/2015] [Indexed: 10/22/2022]
Abstract
Gene-tree-estimation error is a major concern for coalescent methods of phylogenetic inference. We sampled eight empirical studies of ancient lineages with diverse numbers of taxa and genes for which the original authors applied one or more coalescent methods. We found that the average pairwise congruence among gene trees varied greatly both between studies and also often within a study. We recommend that presenting plots of pairwise congruence among gene trees in a dataset be treated as a standard practice for empirical coalescent studies so that readers can readily assess the extent and distribution of incongruence among gene trees. ASTRAL-based coalescent analyses generally outperformed MP-EST and STAR with respect to both internal consistency (congruence between analyses of subsamples of genes with the complete dataset of all genes) and congruence with the concatenation-based topology. We evaluated the approach of subsampling gene trees that are, on average, more congruent with other gene trees as a method to reduce artifacts caused by gene-tree-estimation errors on coalescent analyses. We suggest that this method is well suited to testing whether gene-tree-estimation error is a primary cause of incongruence between concatenation- and coalescent-based results, to reconciling conflicting phylogenetic results based on different coalescent methods, and to identifying genes affected by artifacts that may then be targeted for reciprocal illumination. We provide scripts that automate the process of calculating pairwise gene-tree incongruence and subsampling trees while accounting for differential taxon sampling among genes. Finally, we assert that multiple tree-search replicates should be implemented as a standard practice for empirical coalescent studies that apply MP-EST.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - John Gatesy
- Department of Biology, University of California, Riverside, CA 92521, USA
| |
Collapse
|
10
|
Lartillot N. Probabilistic models of eukaryotic evolution: time for integration. Philos Trans R Soc Lond B Biol Sci 2015; 370:20140338. [PMID: 26323768 PMCID: PMC4571576 DOI: 10.1098/rstb.2014.0338] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/03/2015] [Indexed: 11/12/2022] Open
Abstract
In spite of substantial work and recent progress, a global and fully resolved picture of the macroevolutionary history of eukaryotes is still under construction. This concerns not only the phylogenetic relations among major groups, but also the general characteristics of the underlying macroevolutionary processes, including the patterns of gene family evolution associated with endosymbioses, as well as their impact on the sequence evolutionary process. All these questions raise formidable methodological challenges, calling for a more powerful statistical paradigm. In this direction, model-based probabilistic approaches have played an increasingly important role. In particular, improved models of sequence evolution accounting for heterogeneities across sites and across lineages have led to significant, although insufficient, improvement in phylogenetic accuracy. More recently, one main trend has been to move away from simple parametric models and stepwise approaches, towards integrative models explicitly considering the intricate interplay between multiple levels of macroevolutionary processes. Such integrative models are in their infancy, and their application to the phylogeny of eukaryotes still requires substantial improvement of the underlying models, as well as additional computational developments.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard Lyon 1, F-69622 Villeurbanne Cedex, France
| |
Collapse
|
11
|
Callejón R, Cutillas C, Nadler SA. Nuclear and mitochondrial genes for inferring Trichuris phylogeny. Parasitol Res 2015; 114:4591-9. [PMID: 26341800 DOI: 10.1007/s00436-015-4705-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 08/25/2015] [Indexed: 01/15/2023]
Abstract
Nucleotide sequences of the triose phosphate isomerase (TPI) gene (624 bp) and mitochondrial cytochrome b (cob) gene (520 bp) were obtained by PCR and evaluated for utility in inferring the phylogenetic relationships among Trichuris species. Published sequences of one other nuclear gene (18S or SSU rRNA, 1816-1846 bp) and one additional mitochondrial (mtDNA) gene (cytochrome oxidase 1, cox1, 342 bp) were also analyzed. Maximum likelihood and Bayesian inference methods were used to infer phylogenies for each gene separately but also for the combined mitochondrial data (two genes), the combined nuclear data (two genes), and the total evidence (four gene) dataset. Few Trichuris clades were uniformly resolved across separate analyses of individual genes. For the mtDNA, the cob gene trees had greater phylogenetic resolution and tended to have higher support values than the cox1 analyses. For nuclear genes, the SSU gene trees had slightly greater resolution and support values than the TPI analyses, but TPI was the only gene with reliable support for the deepest nodes in the tree. Combined analyses of genes yielded strongly supported clades in most cases, with the exception of the relationship among Trichuris clades 1, 2, and 3, which showed conflicting results between nuclear and mitochondrial genes. Both the TPI and cob genes proved valuable for inferring Trichuris relationships, with greatest resolution and support values achieved through combined analysis of multiple genes. Based on the phylogeny of the combined analysis of nuclear and mitochondrial genes, parsimony mapping of definitive host utilization depicts artiodactyls as the ancestral hosts for these Trichuris, with host-shifts into primates, rodents, and Carnivora.
Collapse
Affiliation(s)
- Rocío Callejón
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Seville, 41012, Seville, Spain
| | - Cristina Cutillas
- Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Seville, 41012, Seville, Spain
| | - Steven A Nadler
- Department of Entomology and Nematology, University of California, Davis, CA, 95616, USA.
| |
Collapse
|
12
|
A confounding effect of missing data on character conflict in maximum likelihood and Bayesian MCMC phylogenetic analyses. Mol Phylogenet Evol 2014; 80:267-80. [DOI: 10.1016/j.ympev.2014.08.021] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Revised: 08/16/2014] [Accepted: 08/20/2014] [Indexed: 12/27/2022]
|
13
|
Dubious resolution and support from published sparse supermatrices: The importance of thorough tree searches. Mol Phylogenet Evol 2014; 78:334-48. [DOI: 10.1016/j.ympev.2014.06.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 05/30/2014] [Accepted: 06/01/2014] [Indexed: 11/17/2022]
|
14
|
Liu Y, Cox CJ, Wang W, Goffinet B. Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias. Syst Biol 2014; 63:862-78. [PMID: 25070972 DOI: 10.1093/sysbio/syu049] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Phylogenetic analyses using concatenation of genomic-scale data have been seen as the panacea for resolving the incongruences among inferences from few or single genes. However, phylogenomics may also suffer from systematic errors, due to the, perhaps cumulative, effects of saturation, among-taxa compositional (GC content) heterogeneity, or codon-usage bias plaguing the individual nucleotide loci that are concatenated. Here, we provide an example of how these factors affect the inferences of the phylogeny of early land plants based on mitochondrial genomic data. Mitochondrial sequences evolve slowly in plants and hence are thought to be suitable for resolving deep relationships. We newly assembled mitochondrial genomes from 20 bryophytes, complemented these with 40 other streptophytes (land plants plus algal outgroups), compiling a data matrix of 60 taxa and 41 mitochondrial genes. Homogeneous analyses of the concatenated nucleotide data resolve mosses as sister-group to the remaining land plants. However, the corresponding translated amino acid data support the liverwort lineage in this position. Both results receive weak to moderate support in maximum-likelihood analyses, but strong support in Bayesian inferences. Tests of alternative hypotheses using either nucleotide or amino acid data provide implicit support for their respective optimal topologies, and clearly reject the hypotheses that bryophytes are monophyletic, liverworts and mosses share a unique common ancestor, or hornworts are sister to the remaining land plants. We determined that land plant lineages differ in their nucleotide composition, and in their usage of synonymous codon variants. Composition heterogeneous Bayesian analyses employing a nonstationary model that accounts for variation in among-lineage composition, and inferences from degenerated nucleotide data that avoid the effects of synonymous substitutions that underlie codon-usage bias, again recovered liverworts being sister to the remaining land plants but without support. These analyses indicate that the inference of an early-branching moss lineage based on the nucleotide data is caused by convergent compositional biases. Accommodating among-site amino acid compositional heterogeneity (CAT-model) yields no support for the optimal resolution of liverwort as sister to the rest of land plants, suggesting that the robust inference of the liverwort position in homogeneous analyses may be due in part to compositional biases among sites. All analyses support a paraphyletic bryophytes with hornworts composing the sister-group to tracheophytes. We conclude that while genomic data may generate highly supported phylogenetic trees, these inferences may be artifacts. We suggest that phylogenomic analyses should assess the possible impact of potential biases through comparisons of protein-coding gene data and their amino acid translations by evaluating the impact of substitutional saturation, synonymous substitutions, and compositional biases through data deletion strategies and by analyzing the data using heterogeneous composition models. We caution against relying on any one presentation of the data (nucleotide or amino acid) or any one type of analysis even when analyzing large-scale data sets, no matter how well-supported, without fully exploring the effects of substitution models.
Collapse
Affiliation(s)
- Yang Liu
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA; Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal; and State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Cymon J Cox
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA; Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal; and State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Wei Wang
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA; Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal; and State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Bernard Goffinet
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA; Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal; and State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| |
Collapse
|
15
|
The rise of army ants and their relatives: diversification of specialized predatory doryline ants. BMC Evol Biol 2014; 14:93. [PMID: 24886136 PMCID: PMC4021219 DOI: 10.1186/1471-2148-14-93] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 04/22/2014] [Indexed: 12/23/2022] Open
Abstract
Background Army ants are dominant invertebrate predators in tropical and subtropical terrestrial ecosystems. Their close relatives within the dorylomorph group of ants are also highly specialized predators, although much less is known about their biology. We analyzed molecular data generated from 11 nuclear genes to infer a phylogeny for the major dorylomorph lineages, and incorporated fossil evidence to infer divergence times under a relaxed molecular clock. Results Because our results indicate that one subfamily and several genera of dorylomorphs are non-monophyletic, we propose to subsume the six previous dorylomorph subfamilies into a single subfamily, Dorylinae. We find the monophyly of Dorylinae to be strongly supported and estimate the crown age of the group at 87 (74–101) million years. Our phylogenetic analyses provide only weak support for army ant monophyly and also call into question a previous hypothesis that army ants underwent a fundamental split into New World and Old World lineages. Outside the army ants, our phylogeny reveals for the first time many old, distinct lineages in the Dorylinae. The genus Cerapachys is shown to be non-monophyletic and comprised of multiple lineages scattered across the Dorylinae tree. We recover, with strong support, novel relationships among these Cerapachys-like clades and other doryline genera, but divergences in the deepest parts of the tree are not well resolved. We find the genus Sphinctomyrmex, characterized by distinctive abdominal constrictions, to consist of two separate lineages with convergent morphologies, one inhabiting the Old World and the other the New World tropics. Conclusions While we obtain good resolution in many parts of the Dorylinae phylogeny, relationships deep in the tree remain unresolved, with major lineages joining each other in various ways depending upon the analytical method employed, but always with short internodes. This may be indicative of rapid radiation in the early history of the Dorylinae, but additional molecular data and more complete species sampling are needed for confirmation. Our phylogeny now provides a basic framework for comparative biological analyses, but much additional study on the behavior and morphology of doryline species is needed, especially investigations directed at the non-army ant taxa.
Collapse
|
16
|
Cox CJ, Li B, Foster PG, Embley TM, Civán P. Conflicting phylogenies for early land plants are caused by composition biases among synonymous substitutions. Syst Biol 2014; 63:272-9. [PMID: 24399481 PMCID: PMC3926305 DOI: 10.1093/sysbio/syt109] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Affiliation(s)
- Cymon J Cox
- Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal;Department of Life Sciences, Natural History Museum, London SW7 5BD, UK; and Institute for Cell and Molecular Biosciences, University of Newcastle, Newcastle upon Tyne NE2 4HH, UK
| | | | | | | | | |
Collapse
|
17
|
Kück P, Struck TH. BaCoCa – A heuristic software tool for the parallel assessment of sequence biases in hundreds of gene and taxon partitions. Mol Phylogenet Evol 2014; 70:94-8. [DOI: 10.1016/j.ympev.2013.09.011] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2013] [Revised: 09/12/2013] [Accepted: 09/14/2013] [Indexed: 10/26/2022]
|
18
|
Betancur-R. R, Li C, Munroe TA, Ballesteros JA, Ortí G. Addressing Gene Tree Discordance and Non-Stationarity to Resolve a Multi-Locus Phylogeny of the Flatfishes (Teleostei: Pleuronectiformes). Syst Biol 2013; 62:763-85. [DOI: 10.1093/sysbio/syt039] [Citation(s) in RCA: 104] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Ricardo Betancur-R.
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; 2College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and 3National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA
| | - Chenhong Li
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; 2College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and 3National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA
| | - Thomas A. Munroe
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; 2College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and 3National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA
| | - Jesus A. Ballesteros
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; 2College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and 3National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; 2College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and 3National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA
| |
Collapse
|
19
|
Networks in a large-scale phylogenetic analysis: reconstructing evolutionary history of Asparagales (Lilianae) based on four plastid genes. PLoS One 2013. [PMID: 23544071 DOI: 10.1371/journal.pone.0059472.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Phylogenetic analysis aims to produce a bifurcating tree, which disregards conflicting signals and displays only those that are present in a large proportion of the data. However, any character (or tree) conflict in a dataset allows the exploration of support for various evolutionary hypotheses. Although data-display network approaches exist, biologists cannot easily and routinely use them to compute rooted phylogenetic networks on real datasets containing hundreds of taxa. Here, we constructed an original neighbour-net for a large dataset of Asparagales to highlight the aspects of the resulting network that will be important for interpreting phylogeny. The analyses were largely conducted with new data collected for the same loci as in previous studies, but from different species accessions and greater sampling in many cases than in published analyses. The network tree summarised the majority data pattern in the characters of plastid sequences before tree building, which largely confirmed the currently recognised phylogenetic relationships. Most conflicting signals are at the base of each group along the Asparagales backbone, which helps us to establish the expectancy and advance our understanding of some difficult taxa relationships and their phylogeny. The network method should play a greater role in phylogenetic analyses than it has in the past. To advance the understanding of evolutionary history of the largest order of monocots Asparagales, absolute diversification times were estimated for family-level clades using relaxed molecular clock analyses.
Collapse
|
20
|
Chen S, Kim DK, Chase MW, Kim JH. Networks in a large-scale phylogenetic analysis: reconstructing evolutionary history of Asparagales (Lilianae) based on four plastid genes. PLoS One 2013; 8:e59472. [PMID: 23544071 PMCID: PMC3605904 DOI: 10.1371/journal.pone.0059472] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 02/18/2013] [Indexed: 12/19/2022] Open
Abstract
Phylogenetic analysis aims to produce a bifurcating tree, which disregards conflicting signals and displays only those that are present in a large proportion of the data. However, any character (or tree) conflict in a dataset allows the exploration of support for various evolutionary hypotheses. Although data-display network approaches exist, biologists cannot easily and routinely use them to compute rooted phylogenetic networks on real datasets containing hundreds of taxa. Here, we constructed an original neighbour-net for a large dataset of Asparagales to highlight the aspects of the resulting network that will be important for interpreting phylogeny. The analyses were largely conducted with new data collected for the same loci as in previous studies, but from different species accessions and greater sampling in many cases than in published analyses. The network tree summarised the majority data pattern in the characters of plastid sequences before tree building, which largely confirmed the currently recognised phylogenetic relationships. Most conflicting signals are at the base of each group along the Asparagales backbone, which helps us to establish the expectancy and advance our understanding of some difficult taxa relationships and their phylogeny. The network method should play a greater role in phylogenetic analyses than it has in the past. To advance the understanding of evolutionary history of the largest order of monocots Asparagales, absolute diversification times were estimated for family-level clades using relaxed molecular clock analyses.
Collapse
Affiliation(s)
- Shichao Chen
- College of Life Science and Technology, Tongji University, Shanghai, China
| | - Dong-Kap Kim
- Division of Forest Resource Conservation, Korea National Arboretum, Pocheon, Gyeonggi-do, Korea
| | - Mark W. Chase
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | - Joo-Hwan Kim
- Department of Life Science, Gachon University, Seongnam, Gyeonggi-do, Korea
| |
Collapse
|
21
|
Zwick A, Regier JC, Zwickl DJ. Resolving discrepancy between nucleotides and amino acids in deep-level arthropod phylogenomics: differentiating serine codons in 21-amino-acid models. PLoS One 2012; 7:e47450. [PMID: 23185239 PMCID: PMC3502419 DOI: 10.1371/journal.pone.0047450] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2012] [Accepted: 09/17/2012] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND In a previous study of higher-level arthropod phylogeny, analyses of nucleotide sequences from 62 protein-coding nuclear genes for 80 panarthopod species yielded significantly higher bootstrap support for selected nodes than did amino acids. This study investigates the cause of that discrepancy. METHODOLOGY/PRINCIPAL FINDINGS The hypothesis is tested that failure to distinguish the serine residues encoded by two disjunct clusters of codons (TCN, AGY) in amino acid analyses leads to this discrepancy. In one test, the two clusters of serine codons (Ser1, Ser2) are conceptually translated as separate amino acids. Analysis of the resulting 21-amino-acid data matrix shows striking increases in bootstrap support, in some cases matching that in nucleotide analyses. In a second approach, nucleotide and 20-amino-acid data sets are artificially altered through targeted deletions, modifications, and replacements, revealing the pivotal contributions of distinct Ser1 and Ser2 codons. We confirm that previous methods of coding nonsynonymous nucleotide change are robust and computationally efficient by introducing two new degeneracy coding methods. We demonstrate for degeneracy coding that neither compositional heterogeneity at the level of nucleotides nor codon usage bias between Ser1 and Ser2 clusters of codons (or their separately coded amino acids) is a major source of non-phylogenetic signal. CONCLUSIONS The incongruity in support between amino-acid and nucleotide analyses of the forementioned arthropod data set is resolved by showing that "standard" 20-amino-acid analyses yield lower node support specifically when serine provides crucial signal. Separate coding of Ser1 and Ser2 residues yields support commensurate with that found by degenerated nucleotides, without introducing phylogenetic artifacts. While exclusion of all serine data leads to reduced support for serine-sensitive nodes, these nodes are still recovered in the ML topology, indicating that the enhanced signal from Ser1 and Ser2 is not qualitatively different from that of the other amino acids.
Collapse
Affiliation(s)
- Andreas Zwick
- Department of Entomology, State Museum of Natural History, Stuttgart, Germany
| | - Jerome C. Regier
- Institute for Bioscience and Biotechnology Research and Department of Entomology, University of Maryland, College Park, Maryland, United States of America
| | - Derrick J. Zwickl
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, United States of America
| |
Collapse
|
22
|
Holland BR, Jarvis PD, Sumner JG. Low-Parameter Phylogenetic Inference Under the General Markov Model. Syst Biol 2012; 62:78-92. [DOI: 10.1093/sysbio/sys072] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Barbara R. Holland
- School of Mathematics and Physics, University of Tasmania, Hobart 7001, Australia
| | - Peter D. Jarvis
- School of Mathematics and Physics, University of Tasmania, Hobart 7001, Australia
| | - Jeremy G. Sumner
- School of Mathematics and Physics, University of Tasmania, Hobart 7001, Australia
| |
Collapse
|
23
|
Rota-Stabelli O, Lartillot N, Philippe H, Pisani D. Serine codon-usage bias in deep phylogenomics: pancrustacean relationships as a case study. Syst Biol 2012; 62:121-33. [PMID: 22962005 DOI: 10.1093/sysbio/sys077] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Phylogenomic analyses of ancient relationships are usually performed using amino acid data, but it is unclear whether amino acids or nucleotides should be preferred. With the 2-fold aim of addressing this problem and clarifying pancrustacean relationships, we explored the signals in the 62 protein-coding genes carefully assembled by Regier et al. in 2010. With reference to the pancrustaceans, this data set infers a highly supported nucleotide tree that is substantially different to the corresponding, but poorly supported, amino acid one. We show that the discrepancy between the nucleotide-based and the amino acids-based trees is caused by substitutions within synonymous codon families (especially those of serine-TCN and AGY). We show that different arthropod lineages are differentially biased in their usage of serine, arginine, and leucine synonymous codons, and that the serine bias is correlated with the topology derived from the nucleotides, but not the amino acids. We suggest that a parallel, partially compositionally driven, synonymous codon-usage bias affects the nucleotide topology. As substitutions between serine codon families can proceed through threonine or cysteine intermediates, amino acid data sets might also be affected by the serine codon-usage bias. We suggest that a Dayhoff recoding strategy would partially ameliorate the effects of such bias. Although amino acids provide an alternative hypothesis of pancrustacean relationships, neither the nucleotides nor the amino acids version of this data set seems to bring enough genuine phylogenetic information to robustly resolve the relationships within group, which should still be considered unresolved.
Collapse
Affiliation(s)
- Omar Rota-Stabelli
- Department of Biology, The National University of Ireland, Maynooth, Co. Kildare, Ireland.
| | | | | | | |
Collapse
|
24
|
Sumner JG, Jarvis PD, Fernández-Sánchez J, Kaine BT, Woodhams MD, Holland BR. Is the general time-reversible model bad for molecular phylogenetics? Syst Biol 2012; 61:1069-74. [PMID: 22442193 DOI: 10.1093/sysbio/sys042] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Jeremy G Sumner
- School of Mathematics and Physics, University of Tasmania, Hobart 7001,
| | | | | | | | | | | |
Collapse
|
25
|
Wu M, Scott AJ. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. ACTA ACUST UNITED AC 2012; 28:1033-4. [PMID: 22332237 DOI: 10.1093/bioinformatics/bts079] [Citation(s) in RCA: 333] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY With the explosive growth of bacterial and archaeal sequence data, large-scale phylogenetic analyses present both opportunities and challenges. Here we describe AMPHORA2, an automated phylogenomic inference tool that can be used for high-throughput, high-quality genome tree reconstruction and metagenomic phylotyping. Compared with its predecessor, AMPHORA2 has several major enhancements and new functions: it has a greatly expanded phylogenetic marker database and can analyze both bacterial and archaeal sequences; it incorporates probability-based sequence alignment masks that improve the phylogenetic accuracy; it can analyze DNA as well as protein sequences and is more sensitive in marker identification; finally, it is over 100× faster in metagenomic phylotyping. AVAILABILITY http://wolbachia.biology.virginia.edu/WuLab/Software.html. CONTACT mw4yv@virginia.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Martin Wu
- Department of Biology, University of Virginia, Charlottesville, VA 22904, USA.
| | | |
Collapse
|
26
|
|
27
|
Brindefalk B, Ettema TJG, Viklund J, Thollesson M, Andersson SGE. A phylometagenomic exploration of oceanic alphaproteobacteria reveals mitochondrial relatives unrelated to the SAR11 clade. PLoS One 2011; 6:e24457. [PMID: 21935411 PMCID: PMC3173451 DOI: 10.1371/journal.pone.0024457] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2011] [Accepted: 08/10/2011] [Indexed: 12/03/2022] Open
Abstract
Background According to the endosymbiont hypothesis, the mitochondrial system for aerobic respiration was derived from an ancestral Alphaproteobacterium. Phylogenetic studies indicate that the mitochondrial ancestor is most closely related to the Rickettsiales. Recently, it was suggested that Candidatus Pelagibacter ubique, a member of the SAR11 clade that is highly abundant in the oceans, is a sister taxon to the mitochondrial-Rickettsiales clade. The availability of ocean metagenome data substantially increases the sampling of Alphaproteobacteria inhabiting the oxygen-containing waters of the oceans that likely resemble the originating environment of mitochondria. Methodology/Principal Findings We present a phylogenetic study of the origin of mitochondria that incorporates metagenome data from the Global Ocean Sampling (GOS) expedition. We identify mitochondrially related sequences in the GOS dataset that represent a rare group of Alphaproteobacteria, designated OMAC (Oceanic Mitochondria Affiliated Clade) as the closest free-living relatives to mitochondria in the oceans. In addition, our analyses reject the hypothesis that the mitochondrial system for aerobic respiration is affiliated with that of the SAR11 clade. Conclusions/Significance Our results allude to the existence of an alphaproteobacterial clade in the oxygen-rich surface waters of the oceans that represents the closest free-living relative to mitochondria identified thus far. In addition, our findings underscore the importance of expanding the taxonomic diversity in phylogenetic analyses beyond that represented by cultivated bacteria to study the origin of mitochondria.
Collapse
Affiliation(s)
- Björn Brindefalk
- Department of Molecular Evolution, Evolutionary Biology Center, Science for Life Laboratory, Uppsala, Sweden
| | - Thijs J. G. Ettema
- Department of Molecular Evolution, Evolutionary Biology Center, Science for Life Laboratory, Uppsala, Sweden
| | - Johan Viklund
- Department of Molecular Evolution, Evolutionary Biology Center, Science for Life Laboratory, Uppsala, Sweden
| | - Mikael Thollesson
- Department of Molecular Evolution, Evolutionary Biology Center, Science for Life Laboratory, Uppsala, Sweden
| | - Siv G. E. Andersson
- Department of Molecular Evolution, Evolutionary Biology Center, Science for Life Laboratory, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
28
|
Regier JC, Zwick A. Sources of signal in 62 protein-coding nuclear genes for higher-level phylogenetics of arthropods. PLoS One 2011; 6:e23408. [PMID: 21829732 PMCID: PMC3150433 DOI: 10.1371/journal.pone.0023408] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 07/15/2011] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND This study aims to investigate the strength of various sources of phylogenetic information that led to recent seemingly robust conclusions about higher-level arthropod phylogeny and to assess the role of excluding or downweighting synonymous change for arriving at those conclusions. METHODOLOGY/PRINCIPAL FINDINGS The current study analyzes DNA sequences from 68 gene segments of 62 distinct protein-coding nuclear genes for 80 species. Gene segments analyzed individually support numerous nodes recovered in combined-gene analyses, but few of the higher-level nodes of greatest current interest. However, neither is there support for conflicting alternatives to these higher-level nodes. Gene segments with higher rates of nonsynonymous change tend to be more informative overall, but those with lower rates tend to provide stronger support for deeper nodes. Higher-level nodes with bootstrap values in the 80% - 99% range for the complete data matrix are markedly more sensitive to substantial drops in their bootstrap percentages after character subsampling than those with 100% bootstrap, suggesting that these nodes are likely not to have been strongly supported with many fewer data than in the full matrix. Data set partitioning of total data by (mostly) synonymous and (mostly) nonsynonymous change improves overall node support, but the result remains much inferior to analysis of (unpartitioned) nonsynonymous change alone. Clusters of genes with similar nonsynonymous rate properties (e.g., faster vs. slower) show some distinct patterns of node support but few conflicts. Synonymous change is shown to contribute little, if any, phylogenetic signal to the support of higher-level nodes, but it does contribute nonphylogenetic signal, probably through its underlying heterogeneous nucleotide composition. Analysis of seemingly conservative indels does not prove useful. CONCLUSIONS Generating a robust molecular higher-level phylogeny of Arthropoda is currently possible with large amounts of data and an exclusive reliance on nonsynonymous change.
Collapse
Affiliation(s)
- Jerome C. Regier
- Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, Maryland, United States of America
- Department of Entomology, University of Maryland, College Park, Maryland, United States of America
- Center for Biosystems Research, University of Maryland Biotechnology Institute, College Park, Maryland, United States of America
| | - Andreas Zwick
- Center for Biosystems Research, University of Maryland Biotechnology Institute, College Park, Maryland, United States of America
- Entomology, State Museum of Natural History, Stuttgart, Germany
| |
Collapse
|
29
|
Phylogeny of Celastraceae subfamily Hippocrateoideae inferred from morphological characters and nuclear and plastid loci. Mol Phylogenet Evol 2011; 59:320-30. [DOI: 10.1016/j.ympev.2011.02.017] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Revised: 02/01/2011] [Accepted: 02/14/2011] [Indexed: 11/19/2022]
|
30
|
Good species behaving badly: Non-monophyly of black fly sibling species in the Simulium arcticum complex (Diptera: Simuliidae). Mol Phylogenet Evol 2010; 57:245-57. [DOI: 10.1016/j.ympev.2010.06.024] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2010] [Revised: 05/14/2010] [Accepted: 06/25/2010] [Indexed: 11/21/2022]
|
31
|
Klopfstein S, Kropf C, Quicke DLJ. An evaluation of phylogenetic informativeness profiles and the molecular phylogeny of diplazontinae (Hymenoptera, Ichneumonidae). Syst Biol 2010; 59:226-41. [PMID: 20525632 DOI: 10.1093/sysbio/syp105] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
How to quantify the phylogenetic information content of a data set is a longstanding question in phylogenetics, influencing both the assessment of data quality in completed studies and the planning of future phylogenetic projects. Recently, a method has been developed that profiles the phylogenetic informativeness (PI) of a data set through time by linking its site-specific rates of change to its power to resolve relationships at different timescales. Here, we evaluate the performance of this method in the case of 2 standard genetic markers for phylogenetic reconstruction, 28S ribosomal RNA and cytochrome oxidase subunit 1 (CO1) mitochondrial DNA, with maximum parsimony, maximum likelihood, and Bayesian analyses of relationships within a group of parasitoid wasps (Hymenoptera: Ichneumonidae, Diplazontinae). Retrieving PI profiles of the 2 genes from our own and from 3 additional data sets, we find that the method repeatedly overestimates the performance of the more quickly evolving CO1 compared with 28S. We explore possible reasons for this bias, including phylogenetic uncertainty, violation of the molecular clock assumption, model misspecification, and nonstationary nucleotide composition. As none of these provides a sufficient explanation of the observed discrepancy, we use simulated data sets, based on an idealized setting, to show that the optimum evolutionary rate decreases with increasing number of taxa. We suggest that this relationship could explain why the formula derived from the 4-taxon case overrates the performance of higher versus lower rates of evolution in our case and that caution should be taken when the method is applied to data sets including more than 4 taxa.
Collapse
Affiliation(s)
- Seraina Klopfstein
- Department of Invertebrates, Natural History Museum, Bernastrasse 15, CH-3005 Bern, Switzerland.
| | | | | |
Collapse
|
32
|
Abstract
Orthology analysis aims at identifying orthologous genes and gene products from different organisms and, therefore, is a powerful tool in modern computational and experimental biology. Although reconciliation-based orthology methods are generally considered more accurate than distance-based ones, the traditional parsimony-based implementation of reconciliation-based orthology analysis (most parsimonious reconciliation [MPR]) suffers from a number of shortcomings. For example, 1) it is limited to orthology predictions from the reconciliation that minimizes the number of gene duplication and loss events, 2) it cannot evaluate the support of this reconciliation in relation to the other reconciliations, and 3) it cannot make use of prior knowledge (e.g., about species divergence times) that provides auxiliary information for orthology predictions. We present a probabilistic approach to reconciliation-based orthology analysis that addresses all these issues by estimating orthology probabilities. The method is based on the gene evolution model, an explicit evolutionary model for gene duplication and gene loss inside a species tree, that generalizes the standard birth-death process. We describe the probabilistic approach to orthology analysis using 2 experimental data sets and show that the use of orthology probabilities allows a more informative analysis than MPR and, in particular, that it is less sensitive to taxon sampling problems. We generalize these anecdotal observations and show, using data generated under biologically realistic conditions, that MPR give false orthology predictions at a substantial frequency. Last, we provide a new orthology prediction method that allows an orthology and paralogy classification with any chosen sensitivity/specificity combination from the spectra of achievable combinations. We conclude that probabilistic orthology analysis is a strong and more advanced alternative to traditional orthology analysis and that it provides a framework for sophisticated comparative studies of processes in genome evolution.
Collapse
Affiliation(s)
- Bengt Sennblad
- Stockholm Bioinformatics Center, Department of Biochemistry, Stockholm University, AlbaNova, 106 91 Stockholm, Sweden.
| | | |
Collapse
|
33
|
Jermiin LS, Ho JWK, Lau KW, Jayaswal V. SeqVis: a tool for detecting compositional heterogeneity among aligned nucleotide sequences. Methods Mol Biol 2009; 537:65-91. [PMID: 19378140 DOI: 10.1007/978-1-59745-251-9_4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Compositional heterogeneity is a poorly appreciated attribute of aligned nucleotide and amino acid sequences. It is a common property of molecular phylogenetic data, and it has been found to occur across sequences and/or across sites. Most molecular phylogenetic methods assume that the sequences have evolved under globally stationary, reversible, and homogeneous conditions, implying that the sequences should be compositionally homogeneous. The presence of the above-mentioned compositional heterogeneity implies that the sequences must have evolved under more general conditions than is commonly assumed. Consequently, there is a need for reliable methods to detect under what conditions alignments of nucleotides or amino acids may have evolved. In this chapter, we describe one such program. SeqVis is designed to survey aligned nucleotide sequences. We discuss pros-et-cons of this program in the context of other methods to detect compositional heterogeneity and violated phylogenetic assumptions. The benefits provided by SeqVis are demonstrated in two studies of alignments of nucleotides, one of which contained 7542 nucleotides from 53 species.
Collapse
Affiliation(s)
- Lars Sommer Jermiin
- School of Biological Sciences, Centre for Mathematical Biology and Sydney Bioinformatics, University of Sydney, Sydney, Australia
| | | | | | | |
Collapse
|
34
|
Oborník M, Janouškovec J, Chrudimský T, Lukeš J. Evolution of the apicoplast and its hosts: From heterotrophy to autotrophy and back again. Int J Parasitol 2009; 39:1-12. [DOI: 10.1016/j.ijpara.2008.07.010] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2008] [Revised: 07/23/2008] [Accepted: 07/25/2008] [Indexed: 10/21/2022]
|
35
|
A simple, fast, and accurate method of phylogenomic inference. Genome Biol 2008; 9:R151. [PMID: 18851752 PMCID: PMC2760878 DOI: 10.1186/gb-2008-9-10-r151] [Citation(s) in RCA: 348] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2008] [Revised: 09/26/2008] [Accepted: 10/13/2008] [Indexed: 11/10/2022] Open
Abstract
An automated pipeline for phylogenomic analysis (AMPHORA) is presented that overcomes existing limits to large-scale protein phylogenetic inference. The explosive growth of genomic data provides an opportunity to make increased use of protein markers for phylogenetic inference. We have developed an automated pipeline for phylogenomic analysis (AMPHORA) that overcomes the existing bottlenecks limiting large-scale protein phylogenetic inference. We demonstrated its high throughput capabilities and high quality results by constructing a genome tree of 578 bacterial species and by assigning phylotypes to 18,607 protein markers identified in metagenomic data collected from the Sargasso Sea.
Collapse
|
36
|
Abstract
It is generally accepted that plastids first arose by acquisition of photosynthetic prokaryotic endosymbionts by non-photosynthetic eukaryotic hosts. It is also accepted that photosynthetic eukaryotes were acquired on several occasions as endosymbionts by non-photosynthetic eukaryote hosts to form secondary plastids. In some lineages, secondary plastids were lost and new symbionts were acquired, to form tertiary plastids. Most recent work has been interpreted to indicate that primary plastids arose only once, referred to as a 'monophyletic' origin. We critically assess the evidence for this. We argue that the combination of Ockham's razor and poor taxon sampling will bias studies in favour of monophyly. We discuss possible concerns in phylogenetic reconstruction from sequence data. We argue that improved understanding of lineage-specific substitution processes is needed to assess the reliability of sequence-based trees. Improved understanding of the timing of the radiation of present-day cyanobacteria is also needed. We suggest that acquisition of plastids is better described as the result of a process rather than something occurring at a discrete time, and describe the 'shopping bag' model of plastid origin. We argue that dinoflagellates and other lineages provide evidence in support of this.
Collapse
|
37
|
Simmons MP, Richardson D, Reddy ASN. Incorporation of gap characters and lineage-specific regions into phylogenetic analyses of gene families from divergent clades: an example from the kinesin superfamily across eukaryotes. Cladistics 2008. [DOI: 10.1111/j.1096-0031.2007.00183.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
|
38
|
KRAUS FRED, BROWN WESLEYM. Phylogenetic relationships of colubroid snakes based on mitochondrial DNA sequences. Zool J Linn Soc 2008. [DOI: 10.1111/j.1096-3642.1998.tb02159.x] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
39
|
Conflict amongst chloroplast DNA sequences obscures the phylogeny of a group of Asplenium ferns. Mol Phylogenet Evol 2008; 48:176-87. [PMID: 18462954 DOI: 10.1016/j.ympev.2008.02.023] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2007] [Revised: 02/21/2008] [Accepted: 02/26/2008] [Indexed: 11/24/2022]
Abstract
A previous study of the relationships amongst three subgroups of the Austral Asplenium ferns found conflicting signal between the two chloroplast loci investigated. Because organelle genomes like those of chloroplasts and mitochondria are thought to be non-recombining, with a single evolutionary history, we sequenced four additional chloroplast loci with the expectation that this would resolve these relationships. Instead, the conflict was only magnified. Although tree-building analyses favoured one of the three possible trees, one of the alternative trees actually had one more supporting site (six versus five) and received greater support in spectral and neighbor-net analyses. Simulations suggested that chance alone was unlikely to produce strong support for two of the possible trees and none for the third. Likelihood permutation tests indicated that the concatenated chloroplast sequence data appeared to have experienced recombination. However, recombination between the chloroplast genomes of different species would be highly atypical, and corollary supporting observations, like chloroplast heteroplasmy, are lacking. Wider taxon sampling clarified the composition of the Austral group, but the conflicting signal meant analyses (e.g., morphological evolution, biogeographic) conditional on a well-supported phylogeny could not be performed.
Collapse
|
40
|
Blanquart S, Lartillot N. A site- and time-heterogeneous model of amino acid replacement. Mol Biol Evol 2008; 25:842-58. [PMID: 18234708 DOI: 10.1093/molbev/msn018] [Citation(s) in RCA: 166] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We combined the category (CAT) mixture model (Lartillot N, Philippe H. 2004) and the nonstationary break point (BP) model (Blanquart S, Lartillot N. 2006) into a new model, CAT-BP, accounting for variations of the evolutionary process both along the sequence and across lineages. As in CAT, the model implements a mixture of distinct Markovian processes of substitution distributed among sites, thus accommodating site-specific selective constraints induced by protein structure and function. Furthermore, as in BP, these processes are nonstationary, and their equilibrium frequencies are allowed to change along lineages in a correlated way, through discrete shifts in global amino acid composition distributed along the phylogenetic tree. We implemented the CAT-BP model in a Bayesian Markov Chain Monte Carlo framework and compared its predictions with those of 3 simpler models, BP, CAT, and the site- and time-homogeneous general time-reversible (GTR) model, on a concatenation of 4 mitochondrial proteins of 20 arthropod species. In contrast to GTR, BP, and CAT, which all display a phylogenetic reconstruction artifact positioning the bees Apis mellifera and Melipona bicolor among chelicerates, the CAT-BP model is able to recover the monophyly of insects. Using posterior predictive tests, we further show that the CAT-BP combination yields better anticipations of site- and taxon-specific amino acid frequencies and that it better accounts for the homoplasies that are responsible for the artifact. Altogether, our results show that the joint modeling of heterogeneities across sites and along time results in a synergistic improvement of the phylogenetic inference, indicating that it is essential to disentangle the combined effects of both sources of heterogeneity, in order to overcome systematic errors in protein phylogenetic analyses.
Collapse
Affiliation(s)
- Samuel Blanquart
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, Montpellier, France.
| | | |
Collapse
|
41
|
Rodríguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang BF, Philippe H. Detecting and overcoming systematic errors in genome-scale phylogenies. Syst Biol 2007; 56:389-99. [PMID: 17520503 DOI: 10.1080/10635150701397643] [Citation(s) in RCA: 211] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Genome-scale data sets result in an enhanced resolution of the phylogenetic inference by reducing stochastic errors. However, there is also an increase of systematic errors due to model violations, which can lead to erroneous phylogenies. Here, we explore the impact of systematic errors on the resolution of the eukaryotic phylogeny using a data set of 143 nuclear-encoded proteins from 37 species. The initial observation was that, despite the impressive amount of data, some branches had no significant statistical support. To demonstrate that this lack of resolution is due to a mutual annihilation of phylogenetic and nonphylogenetic signals, we created a series of data sets with slightly different taxon sampling. As expected, these data sets yielded strongly supported but mutually exclusive trees, thus confirming the presence of conflicting phylogenetic and nonphylogenetic signals in the original data set. To decide on the correct tree, we applied several methods expected to reduce the impact of some kinds of systematic error. Briefly, we show that (i) removing fast-evolving positions, (ii) recoding amino acids into functional categories, and (iii) using a site-heterogeneous mixture model (CAT) are three effective means of increasing the ratio of phylogenetic to nonphylogenetic signal. Finally, our results allow us to formulate guidelines for detecting and overcoming phylogenetic artefacts in genome-scale phylogenetic analyses.
Collapse
Affiliation(s)
- Naiara Rodríguez-Ezpeleta
- Canadian Institute for Advanced Research, Centre Robert Cedergren, Département de Biochimie, Université de Montréal, 2900 Boulevard Edouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
| | | | | | | | | | | |
Collapse
|
42
|
Larkum AWD, Lockhart PJ, Howe CJ. Shopping for plastids. TRENDS IN PLANT SCIENCE 2007; 12:189-95. [PMID: 17416546 DOI: 10.1016/j.tplants.2007.03.011] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2006] [Revised: 01/26/2007] [Accepted: 03/28/2007] [Indexed: 05/14/2023]
Abstract
Recent suggestions that endosymbionts in a diatom and an amoeba represent independent origins of plastids from those in plants and algae raise again the question of how many times plastids have evolved. In this Opinion article, we review the evidence for a single origin or multiple origins of primary plastids. Although the data are widely taken as supporting a single origin, we stress the assumptions underlying that view, and argue for a more cautious interpretation. We also suggest that the implicit view of plastids being acquired from single ancestors at a single point (or points) in time is an over-simplification.
Collapse
Affiliation(s)
- Anthony W D Larkum
- School of Biological Sciences, University of Sydney, Sydney, NSW 2006, Australia.
| | | | | |
Collapse
|
43
|
Gruber KF, Voss RS, Jansa SA. Base-compositional heterogeneity in the RAG1 locus among didelphid marsupials: implications for phylogenetic inference and the evolution of GC content. Syst Biol 2007; 56:83-96. [PMID: 17366139 DOI: 10.1080/10635150601182939] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Although theoretical studies have suggested that base-compositional heterogeneity can adversely affect phylogenetic reconstruction, only a few empirical examples of this phenomenon, mostly among ancient lineages (with divergence dates > 100 Mya), have been reported. In the course of our phylogenetic research on the New World marsupial family Didelphidae, we sequenced 2790 bp of the RAG1 exon from exemplar species of most extant genera. Phylogenetic analysis of these sequences recovered an anomalous node consisting of two clades previously shown to be distantly related based on analyses of other molecular data. These two clades show significantly increased GC content at RAG1 third codon positions, and the resulting convergence in base composition is strong enough to overwhelm phylogenetic signal from other genes (and morphology) in most analyses of concatenated datasets. This base-compositional convergence occurred relatively recently (over tens rather than hundreds of millions of years), and the affected gene region is still in a state of evolutionary disequilibrium. Both mutation rate and substitution rate are higher in GC-rich didelphid taxa, observations consistent with RAG1 sequences having experienced a higher rate of recombination in the convergent lineages.
Collapse
Affiliation(s)
- Karl F Gruber
- Bell Museum of Natural History and Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, Minnesota 55108, USA
| | | | | |
Collapse
|
44
|
Roure B, Rodriguez-Ezpeleta N, Philippe H. SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics. BMC Evol Biol 2007; 7 Suppl 1:S2. [PMID: 17288575 PMCID: PMC1796611 DOI: 10.1186/1471-2148-7-s1-s2] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Phylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (paralogous or xenologous genes), lack of some genes for a given species, or partial sequences. The use of undetected paralogous or xenologous genes in phylogenetic inference can lead to inaccurate results, and the use of partial sequences to a lack of resolution. A tool that selects sequences, species, and genes, while dealing with these issues, is needed in a phylogenomics context. RESULTS Here, we present SCaFoS, a tool that quickly assembles phylogenomic datasets containing maximal phylogenetic information while adjusting the amount of missing data in the selection of species, sequences and genes. Starting from individual sequence alignments, and using monophyletic groups defined by the user, SCaFoS creates chimeras with partial sequences, or selects, among multiple sequences, the orthologous and/or slowest evolving sequences. Once sequences representing each predefined monophyletic group have been selected, SCaFos retains genes according to the user's allowed level of missing data and generates files for super-matrix and super-tree analyses in several formats compatible with standard phylogenetic inference software. Because no clear-cut criteria exist for the sequence selection, a semi-automatic mode is available to accommodate user's expertise. CONCLUSION SCaFos is able to deal with datasets of hundreds of species and genes, both at the amino acid or nucleotide level. It has a graphical interface and can be integrated in an automatic workflow. Moreover, SCaFoS is the first tool that integrates user's knowledge to select orthologous sequences, creates chimerical sequences to reduce missing data and selects genes according to their level of missing data. Finally, applying SCaFoS to different datasets, we show that the judicious selection of genes, species and sequences reduces tree reconstruction artefacts, especially if the dataset includes fast evolving species.
Collapse
Affiliation(s)
- Béatrice Roure
- Canadian Institute for Advanced Research, Centre Robert Cedergren, Département de biochimie, Université de Montréal, Montréal, Québec H3C3J7, Canada
| | - Naiara Rodriguez-Ezpeleta
- Canadian Institute for Advanced Research, Centre Robert Cedergren, Département de biochimie, Université de Montréal, Montréal, Québec H3C3J7, Canada
| | - Hervé Philippe
- Canadian Institute for Advanced Research, Centre Robert Cedergren, Département de biochimie, Université de Montréal, Montréal, Québec H3C3J7, Canada
| |
Collapse
|
45
|
|
46
|
Anderson DL, Morgan MJ. Genetic and morphological variation of bee-parasitic Tropilaelaps mites (Acari: Laelapidae): new and re-defined species. EXPERIMENTAL & APPLIED ACAROLOGY 2007; 43:1-24. [PMID: 17828576 DOI: 10.1007/s10493-007-9103-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Accepted: 08/08/2007] [Indexed: 05/17/2023]
Abstract
Mites in the genus Tropilaelaps are parasites of social honeybees. Two species, Tropilaelaps clareae and T. koenigerum, have been recorded and their primary hosts are presumed to be the giant honeybees of Asia, Apis dorsata and A. laboriosa. The most common species, T. clareae, is also an economically important pest of the introduced Western honeybee (A. mellifera) throughout Asia and is considered an emerging threat to world apiculture. In the studies reported here, genetic (mtDNA CO-I and nuclear ITS1-5.8S-ITS2 gene sequence) and morphological variation and host associations were examined among Tropilaelaps isolates collected from A. dorsata, A. laboriosa and A. mellifera throughout Asia and neighbouring regions. The results clearly indicate that the genus contains at least four species. Tropilaelaps clareae, previously assumed to be ubiquitous in Asia, was found to be two species, and it is here redefined as encompassing haplotypes (mites with distinct mtDNA gene sequences) that parasitise native A. dorsata breviligula and introduced A. mellifera in the Philippines and also native A. d. binghami on Sulawesi Island in Indonesia. Tropilaelaps mercedesae n. sp., which until now has been mistaken for T. clareae, encompasses haplotypes that, together with haplotypes of T. koenigerum, parasitise native A. d. dorsata in mainland Asia and Indonesia (except Sulawesi Island). It also parasitises introduced A. mellifera in these and surrounding regions and, with another new species, T. thaii n. sp., also parasitises A. laboriosa in mountainous Himalayan regions. Methods are described for identifying each species. These studies help to clarify the emerging threat of Tropilaelaps to world apiculture and will necessitate a revision of quarantine protocols for countries that import and export honeybees.
Collapse
|
47
|
Simon C, Buckley TR, Frati F, Stewart JB, Beckenbach AT. Incorporating Molecular Evolution into Phylogenetic Analysis, and a New Compilation of Conserved Polymerase Chain Reaction Primers for Animal Mitochondrial DNA. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2006. [DOI: 10.1146/annurev.ecolsys.37.091305.110018] [Citation(s) in RCA: 429] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Chris Simon
- Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut 06269
- School of Biological Sciences, Victoria University of Wellington, Wellington 6014, New Zealand
| | | | - Francesco Frati
- Department of Evolutionary Biology, University of Siena, 53100 Siena, Italy;
| | - James B. Stewart
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada; ,
- Department of Laboratory Medicine, Division of Metabolic Diseases, Karolinska Institutet, Norvum 141 86, Stockholm, Sweden
| | - Andrew T. Beckenbach
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada; ,
| |
Collapse
|
48
|
Philippe H, Telford MJ. Large-scale sequencing and the new animal phylogeny. Trends Ecol Evol 2006; 21:614-20. [PMID: 16919363 DOI: 10.1016/j.tree.2006.08.004] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2005] [Revised: 07/06/2006] [Accepted: 08/08/2006] [Indexed: 11/18/2022]
Abstract
Although comparisons of gene sequences have revolutionised our understanding of the animal phylogenetic tree, it has become clear that, to avoid errors in tree reconstruction, a large number of genes from many species must be considered: too few genes and stochastic errors predominate, too few taxa and systematic errors appear. We argue here that, to gather many sequences from many taxa, the best use of resources is to sequence a small number of expressed sequence tags (1000-5000 per species) from as many taxa as possible. This approach counters both sources of error, gives the best hope of a well-resolved phylogeny of the animals and will act as a central resource for a carefully targeted genome sequencing programme.
Collapse
Affiliation(s)
- Hervé Philippe
- Canadian Institute for Advanced Research, Centre Robert-Cedergren, Département de Biochimie, Université de Montréal, Succursale Centre-Ville, Montréal, QC, Canada, H3C 3J7.
| | | |
Collapse
|
49
|
Rüber L, Britz R, Zardoya R. Molecular phylogenetics and evolutionary diversification of labyrinth fishes (Perciformes: Anabantoidei). Syst Biol 2006; 55:374-97. [PMID: 16861206 DOI: 10.1080/10635150500541664] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
Labyrinth fishes (Perciformes: Anabantoidei) are primary freshwater fishes with a disjunct African-Asian distribution that exhibit a wide variety of morphological and behavioral traits. These intrinsic features make them particularly well suited for studying patterns and processes of evolutionary diversification. We reconstructed the first molecular-based phylogenetic hypothesis of anabantoid intrarelationships using both mitochondrial and nuclear nucleotide sequence data to address anabantoid evolution. The mitochondrial data set included the complete cytochrome b, partial 12S rRNA, complete tRNA Val, and partial 16S rRNA genes (3332 bp) of 57 species representing all 19 anabantoid genera. The nuclear data set included the partial RAG1 gene (1494 bp) of 21 representative species. The phylogenetic analyses of a combined (mitochondrial+nuclear) data set recovered almost fully resolved trees at the intrafamily level with different methods of phylogenetic inference. Phylogenetic relationships at this taxonomic level were compared with previous morphology-based hypotheses. In particular, the enigmatic pike-head (Luciocephalus) was confidently placed within the "spiral egg" clade, thus resolving the long-standing controversy on its relative phylogenetic position. The molecular phylogeny was used to study the evolution of the different forms of parental care within the suborder. Our results suggest that the evolution of breeding behavior in anabantoids is highly correlated with phylogeny, and that brood care evolved three times independently from an ancestral free spawning condition without parental care. Ancestral character state reconstructions under maximum parsimony and maximum likelihood further indicated that both bubble nesting and mouthbrooding have evolved recurrently during anabantoid evolution. The new phylogenetic framework was also used to test alternative biogeographic hypotheses that account for the disjunct African-Asian distribution. Molecular divergence time estimates support either a drift vicariance linked to the breakup of Gondwana or Late Mesozoic Early Tertiary dispersal from Africa to Asia or vice versa.
Collapse
Affiliation(s)
- Lukas Rüber
- Department of Zoology, The Natural History Museum, Cromwell Road, London, SW7 5BD, UK.
| | | | | |
Collapse
|
50
|
Bennett JR, Mathews S. Phylogeny of the parasitic plant family Orobanchaceae inferred from phytochrome A. AMERICAN JOURNAL OF BOTANY 2006; 93:1039-51. [PMID: 21642169 DOI: 10.3732/ajb.93.7.1039] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Partial sequences of the nuclear gene encoding the photoreceptor phytochrome A (PHYA) are used to reconstruct relationships within Orobanchaceae, the largest of the parasitic angiosperm families. The monophyly of Orobanchaceae, including nonphotosynthetic holoparasites, hemiparasites, and nonparasitic Lindenbergia is strongly supported. Phytochrome A data resolve six well-supported lineages that contain all of the sampled genera except Brandisia, which is sister to the major radiation of hemiparasites. In contrast to previous plastid and ITS trees, relationships among these major clades also are generally well supported. Thus, the robust phylogenetic hypothesis inferred from the PHYA data provides a much better context in which to evaluate the evolution of parasitism within the group. Ninety-eight species of Orobanchaceae, representing 43 genera, are included and Brandisia, Bungea, Cymbaria, Esterhazya, Nesogenes, Phtheirospermum, Radamaea, Siphonostegia, and Xylocalyx are confirmed as members of Orobanchaceae. The earliest diverging lineage of hemiparasites is identified for the first time; it contains Bungea, Cymbaria, Monochasma, Siphonostegia, and the monotypic Schwalbea, which is federally endangered. This basal clade is marked by the presence of two novel introns. A second, apparently independent gain of one of these introns marks a clade of largely European taxa. There is significant rate heterogeneity among PHYA sequences, and the presence of multiple PHYA in some taxa is consistent with observed ploidy levels.
Collapse
Affiliation(s)
- Jonathan R Bennett
- Department of Botany, The Natural History Museum, Cromwell Road, London SW7 5BD UK
| | | |
Collapse
|