51
|
Genome-wide EST data mining approaches to resolving incongruence of molecular phylogenies. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2011. [PMID: 20865506 DOI: 10.1007/978-1-4419-5913-3_27] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
Thirty-six single genes of 6 plants inferred 18 unique trees using maximum parsimony. Such incongruence is an important challenge. How to reconstruct the congruent tree is still one of the most challenges in molecular phylogenetics. For resolving this problem, a genome-wide EST data mining approach was systematically investigated by retrieving a large size of EST data of 144 shared genes of 6 green plants from GenBank. The results show that the concatenated alignments approach overcame incongruence among single-gene phylogenies and successfully reconstructed the congruent tree of 6 species with 100% jackknife support across each branch when 144 genes was used. Jackknife supports of correct branches increased with number of genes linearly, but the number of wrong branches also increased linearly. For inferring the congruent tree, a minimum of 30 genes were required. This approach may provide potential power in resolving conflictions of phylogenies.
Collapse
|
52
|
Logacheva MD, Kasianov AS, Vinogradov DV, Samigullin TH, Gelfand MS, Makeev VJ, Penin AA. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum). BMC Genomics 2011; 12:30. [PMID: 21232141 PMCID: PMC3027159 DOI: 10.1186/1471-2164-12-30] [Citation(s) in RCA: 124] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Accepted: 01/13/2011] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales--a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. RESULTS Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. CONCLUSIONS 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated.
Collapse
Affiliation(s)
- Maria D Logacheva
- Department of Evolutionary Biochemistry, A.N. Belozersky Institute of Physico-Chemical Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
- Evolutionary Genomics Laboratory, Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
| | - Artem S Kasianov
- V.A. Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy V Vinogradov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
| | - Tagir H Samigullin
- Department of Evolutionary Biochemistry, A.N. Belozersky Institute of Physico-Chemical Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Mikhail S Gelfand
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Vsevolod J Makeev
- V.A. Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- N.I Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- State Scientific Institute of Genetics and Selection of Industrial Microorganisms, GosNIIgenetika, Moscow, Russia
| | - Aleksey A Penin
- Evolutionary Genomics Laboratory, Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
- Department of Genetics, Biological faculty, M.V. Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
53
|
Li C, Lu G, Ortí G. Optimal data partitioning and a test case for ray-finned fishes (Actinopterygii) based on ten nuclear loci. Syst Biol 2010; 57:519-39. [PMID: 18622808 DOI: 10.1080/10635150802206883] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
Data partitioning, the combined phylogenetic analysis of homogeneous blocks of data, is a common strategy used to accommodate heterogeneities in complex multilocus data sets. Variation in evolutionary rates and substitution patterns among sites are typically addressed by partitioning data by gene, codon position, or both. Excessive partitioning of the data, however, could lead to overparameterization; therefore, it seems critical to define the minimum numbers of partitions necessary to improve the overall fit of the model. We propose a new method, based on cluster analysis, to find an optimal partitioning strategy for multilocus protein-coding data sets. A heuristic exploration of alternative partitioning schemes, based on Bayesian and maximum likelihood (ML) criteria, is shown here to produce an optimal number of partitions. We tested this method using sequence data of 10 nuclear genes collected from 52 ray-finned fish (Actinopterygii) and four tetrapods. The concatenated sequences included 7995 nucleotide sites maximally split into 30 partitions defined a priori based on gene and codon position. Our results show that a model based on only 10 partitions defined by cluster analysis performed better than partitioning by both gene and codon position. Alternative data partitioning schemes also are shown to affect the topologies resulting from phylogenetic analysis, especially when Bayesian methods are used, suggesting that overpartitioning may be of major concern. The phylogenetic relationships among the major clades of ray-finned fish were assessed using the best data-partitioning schemes under ML and Bayesian methods. Some significant results include the monophyly of "Holostei" (Amia and Lepisosteus), the sister-group relationships between (1) esociforms and salmoniforms and (2) osmeriforms and stomiiforms, the polyphyly of Perciformes, and a close relationship of cichlids and atherinomorphs.
Collapse
Affiliation(s)
- Chenhong Li
- School of Biological Sciences, University of Nebraska, Lincoln, NE 68588, USA.
| | | | | |
Collapse
|
54
|
Chagué V, Just J, Mestiri I, Balzergue S, Tanguy AM, Huneau C, Huteau V, Belcram H, Coriton O, Jahier J, Chalhoub B. Genome-wide gene expression changes in genetically stable synthetic and natural wheat allohexaploids. THE NEW PHYTOLOGIST 2010; 187:1181-1194. [PMID: 20591055 DOI: 10.1111/j.1469-8137.2010.03339.x] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
*The present study aims to understand regulation of gene expression in synthetic and natural wheat (Triticum aestivum) allohexaploids, that combines the AB genome of Triticum turgidum and the D genome of Aegilops tauschii; and which we have recently characterized as genetically stable. *We conducted a comprehensive genome-wide analysis of gene expression that allowed characterization of the effect of variability of the D genome progenitor, the intergenerational stability as well as the comparison with natural wheat allohexaploid. We used the Affymetrix GeneChip Wheat Genome Array, on which 55 049 transcripts are represented. *Additive expression was shown to represent the majority of expression regulation in the synthetic allohexaploids, where expression for more than c. 93% of transcripts was equal to the mid-parent value measured from a mixture of parental RNA. This leaves c. 2000 (c. 7%) transcripts, in which expression was nonadditive. No global gene expression bias or dominance towards any of the progenitor genomes was observed whereas high intergenerational stability and low effect of the D genome progenitor variability were revealed. *Our study suggests that gene expression regulation in wheat allohexaploids is established early upon allohexaploidization and highly conserved over generations, as demonstrated by the high similarity of expression with natural wheat allohexaploids.
Collapse
Affiliation(s)
- Véronique Chagué
- Organization and Evolution of Plant Genomes (OEPG), Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - CNRS 8114 - UEVE, F-91057 Evry Cedex, France
| | - Jérémy Just
- Organization and Evolution of Plant Genomes (OEPG), Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - CNRS 8114 - UEVE, F-91057 Evry Cedex, France
| | - Imen Mestiri
- Organization and Evolution of Plant Genomes (OEPG), Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - CNRS 8114 - UEVE, F-91057 Evry Cedex, France
| | - Sandrine Balzergue
- Organization and Evolution of Plant Genomes (OEPG), Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - CNRS 8114 - UEVE, F-91057 Evry Cedex, France
| | - Anne-Marie Tanguy
- Unité Mixte de Recherches INRA - Agrocampus Rennes, Amélioration des Plantes & Biotechnologies Végétales, F-35653 Le Rheu, France
| | - Cecile Huneau
- Organization and Evolution of Plant Genomes (OEPG), Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - CNRS 8114 - UEVE, F-91057 Evry Cedex, France
| | - Virginie Huteau
- Unité Mixte de Recherches INRA - Agrocampus Rennes, Amélioration des Plantes & Biotechnologies Végétales, F-35653 Le Rheu, France
| | - Harry Belcram
- Organization and Evolution of Plant Genomes (OEPG), Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - CNRS 8114 - UEVE, F-91057 Evry Cedex, France
| | - Olivier Coriton
- Unité Mixte de Recherches INRA - Agrocampus Rennes, Amélioration des Plantes & Biotechnologies Végétales, F-35653 Le Rheu, France
| | - Joseph Jahier
- Unité Mixte de Recherches INRA - Agrocampus Rennes, Amélioration des Plantes & Biotechnologies Végétales, F-35653 Le Rheu, France
| | - Boulos Chalhoub
- Organization and Evolution of Plant Genomes (OEPG), Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - CNRS 8114 - UEVE, F-91057 Evry Cedex, France
| |
Collapse
|
55
|
Leebens-Mack J, Soltis DE, Soltis PS. Plant reproductive genomics at the Plant and Animal Genome Conference. Comp Funct Genomics 2010; 6:159-69. [PMID: 18629227 PMCID: PMC2447523 DOI: 10.1002/cfg.469] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2005] [Accepted: 02/08/2005] [Indexed: 11/08/2022] Open
Affiliation(s)
- Jim Leebens-Mack
- Department of Biology and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA.
| | | | | |
Collapse
|
56
|
Jasinski S, Vialette-Guiraud ACM, Scutt CP. The evolutionary-developmental analysis of plant microRNAs. Philos Trans R Soc Lond B Biol Sci 2010; 365:469-76. [PMID: 20047873 DOI: 10.1098/rstb.2009.0246] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
MicroRNAs (miRNAs) control many important aspects of plant development, suggesting these molecules may also have played key roles in the evolution of developmental processes in plants. However, evolutionary-developmental (evo-devo) studies of miRNAs have been held back by technical difficulties in gene identification. To help solve this problem, we have developed a two-step procedure for the efficient identification of miRNA genes in any plant species. As a test case, we have studied the evolution of the MIR164 family in the angiosperms. We have identified novel MIR164 genes in three species occupying key phylogenetic positions and used these, together with published sequence data, to partially reconstruct the evolution of the MIR164 family since the last common ancestor of the extant flowering plants. We use our evolutionary reconstruction to discuss potential roles for MIR164 genes in the evolution of leaf shape and carpel closure in the angiosperms. The techniques we describe may be applied to any miRNA family and should thus enable plant evo-devo to begin to investigate the contributions miRNAs have made to the evolution of plant development.
Collapse
Affiliation(s)
- Sophie Jasinski
- Laboratoire de Reproduction et Développement des Plantes, UMR 5667- CNRS/INRA/Université de Lyon, Ecole Normale Supérieure de Lyon, 46, allée d'Italie 69364, Lyon Cedex 07, France
| | | | | |
Collapse
|
57
|
Bonaventura MPD, Lee EK, DeSalle R, Planet PJ. A whole-genome phylogeny of the family Pasteurellaceae. Mol Phylogenet Evol 2010; 54:950-6. [DOI: 10.1016/j.ympev.2009.08.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Revised: 08/05/2009] [Accepted: 08/11/2009] [Indexed: 11/16/2022]
|
58
|
Cahoon AB, Sharpe RM, Mysayphonh C, Thompson EJ, Ward AD, Lin A. The complete chloroplast genome of tall fescue (Lolium arundinaceum; Poaceae) and comparison of whole plastomes from the family Poaceae. AMERICAN JOURNAL OF BOTANY 2010; 97:49-58. [PMID: 21622366 DOI: 10.3732/ajb.0900008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
In this paper, we describe the complete chloroplast genome of Lolium arundinaceum. This sequence is the culmination of a long-term project completed by >400 undergraduates who took general genetics at Middle Tennessee State University from 2004-2007. It was undertaken in an attempt to introduce these students to an open-ended experiential/exploratory lesson to produce and analyze novel data. The data they produced should provide the necessary information for both phylogenetic comparisons and plastome engineering of tall fescue. The fescue plastome (GenBank FJ466687) is 136048 bp with a typical quadripartite structure and a gene order similar to other grasses; 56% of the plastome is coding region comprised of 75 protein-coding genes, 29 tRNAs, four rRNAs, and one hypothetical coding region (ycf). Comparisons of Poaceae plastomes reveal size differences between the PACC (subfamilies Panicoideae, Arundinoideae, Centothecoideae, and Chloridoideae) and BOP (subfamilies Bambusoideae, Oryzoideae, and Pooideae) clades. Alignment analysis suggests that several potentially conserved large deletions in previously identified intergenic length polymorphic regions are responsible for the majority of the size discrepancy. Phylogenetic analysis using whole plastome data suggests that fescue closely aligns with Lolium perenne. Some unique features as well as phylogenetic branch length calculations, however, suggest that a number of changes have occurred since these species diverged.
Collapse
Affiliation(s)
- A Bruce Cahoon
- Department of Biology, Middle Tennessee State University, Box 60, Murfreesboro, Tennessee 37132 USA
| | | | | | | | | | | |
Collapse
|
59
|
Malay MCMD, Paulay G. Peripatric speciation drives diversification and distributional pattern of reef hermit crabs (Decapoda: Diogenidae: Calcinus). Evolution 2009; 64:634-62. [PMID: 19796150 DOI: 10.1111/j.1558-5646.2009.00848.x] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The diversity on coral reefs has long captivated observers. We examine the mechanisms of speciation, role of ecology in speciation, and patterns of species distribution in a typical reef-associated clade-the diverse and colorful Calcinus hermit crabs-to address the origin of tropical marine diversity. We sequenced COI, 16S, and H3 gene regions for approximately 90% of 56 putative species, including nine undescribed, "cryptic" taxa, and mapped their distributions. Speciation in Calcinus is largely peripatric at remote locations. Allopatric species pairs are younger than sympatric ones, and molecular clock analyses suggest that >2 million years are needed for secondary sympatry. Substantial niche conservatism is evident within clades, as well as a few major ecological shifts between sister species. Color patterns follow species boundaries and evolve rapidly, suggesting a role in species recognition. Most species prefer and several are restricted to oceanic areas, suggesting great dispersal abilities and giving rise to an ocean-centric diversity pattern. Calcinus diversity patterns are atypical in that the diversity peaks in the west-central oceanic Pacific rather than in the Indo-Malayan "diversity center." Calcinus speciation patterns do not match well-worn models put forth to explain the origin of Indo-West Pacific diversity, but underscore the complexity of marine diversification.
Collapse
Affiliation(s)
- Maria Celia Machel D Malay
- Florida Museum of Natural History and Department of Biology, University of Florida, Gainesville, Florida 32611, USA.
| | | |
Collapse
|
60
|
Abstract
In addition to the nuclear genome, organisms have organelle genomes. Most of the DNA present in eukaryotic organisms is located in the cell nucleus. Chloroplasts have independent genomes which are inherited from the mother. Duplicated genes are common in the genomes of all organisms. It is believed that gene duplication is the most important step for the origin of genetic variation, leading to the creation of new genes and new gene functions. Despite the fact that extensive gene duplications are rare among the chloroplast genome, gene duplication in the chloroplast genome is an essential source of new genetic functions and a mechanism of neo-evolution. The events of gene transfer between the chloroplast genome and nuclear genome via duplication and subsequent recombination are important processes in evolution. The duplicated gene or genome in the nucleus has been the subject of several recent reviews. In this review, we will briefly summarize gene duplication and evolution in the chloroplast genome. Also, we will provide an overview of gene transfer events between chloroplast and nuclear genomes.
Collapse
|
61
|
San Mauro D, Gower DJ, Massingham T, Wilkinson M, Zardoya R, Cotton JA. Experimental design in caecilian systematics: phylogenetic information of mitochondrial genomes and nuclear rag1. Syst Biol 2009; 58:425-38. [PMID: 20525595 DOI: 10.1093/sysbio/syp043] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
In molecular phylogenetic studies, a major aspect of experimental design concerns the choice of markers and taxa. Although previous studies have investigated the phylogenetic performance of different genes and the effectiveness of increasing taxon sampling, their conclusions are partly contradictory, probably because they are highly context specific and dependent on the group of organisms used in each study. Goldman introduced a method for experimental design in phylogenetics based on the expected information to be gained that has barely been used in practice. Here we use this method to explore the phylogenetic utility of mitochondrial (mt) genes, mt genomes, and nuclear rag1 for studies of the systematics of caecilian amphibians, as well as the effect of taxon addition on the stabilization of a controversial branch of the tree. Overall phylogenetic information estimates per gene, specific estimates per branch of the tree, estimates for combined (mitogenomic) data sets, and estimates as a hypothetical new taxon is added to different parts of the caecilian tree are calculated and compared. In general, the most informative data sets are those for mt transfer and ribosomal RNA genes. Our results also show at which positions in the caecilian tree the addition of taxa have the greatest potential to increase phylogenetic information with respect to the controversial relationships of Scolecomorphus, Boulengerula, and all other teresomatan caecilians. These positions are, as intuitively expected, mostly (but not all) adjacent to the controversial branch. Generating whole mitogenomic and rag1 data for additional taxa joining the Scolecomorphus branch may be a more efficient strategy than sequencing a similar amount of additional nucleotides spread across the current caecilian taxon sampling. The methodology employed in this study allows an a priori evaluation and testable predictions of the appropriateness of particular experimental designs to solve specific questions at different levels of the caecilian phylogeny.
Collapse
Affiliation(s)
- Diego San Mauro
- Department of Zoology, The Natural History Museum, Cromwell Road, London SW7 5BD, UK.
| | | | | | | | | | | |
Collapse
|
62
|
Abstract
Heterotachy is a general term to describe positions in a sequence that evolve at different rates in different lineages. Kolaczkowski and Thornton (2004. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431:980-984.) recently described an intriguing heterotachy model that leads to topological bias for likelihood-based methods and parsimony methods. In this article, we show that heterotachy can generally be viewed as multivariate rates-across-sites variation, which can be described as randomly drawing rates (or branch lengths) from a multivariate distribution for each branch at each site. Motivated by this idea, we propose a pairwise alpha heterotachy adjustment model, which gives us much improved topological estimation in the settings by Kolaczkowski and Thornton (2004).
Collapse
Affiliation(s)
- Jihua Wu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.
| | | |
Collapse
|
63
|
de la Torre-Bárcena JE, Kolokotronis SO, Lee EK, Stevenson DW, Brenner ED, Katari MS, Coruzzi GM, DeSalle R. The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data. PLoS One 2009; 4:e5764. [PMID: 19503618 PMCID: PMC2685480 DOI: 10.1371/journal.pone.0005764] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 04/16/2009] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group. METHODOLOGY We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations. CONCLUSIONS We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized.
Collapse
Affiliation(s)
- Jose Eduardo de la Torre-Bárcena
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
- Cullman Molecular Systematics Laboratory and Genomics Laboratory, The New York Botanical Garden, Bronx, New York, United States of America
| | - Sergios-Orestis Kolokotronis
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| | - Ernest K. Lee
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| | - Dennis Wm. Stevenson
- Cullman Molecular Systematics Laboratory and Genomics Laboratory, The New York Botanical Garden, Bronx, New York, United States of America
| | - Eric D. Brenner
- Cullman Molecular Systematics Laboratory and Genomics Laboratory, The New York Botanical Garden, Bronx, New York, United States of America
| | - Manpreet S. Katari
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | - Gloria M. Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | - Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| |
Collapse
|
64
|
Goloboff PA, Catalano SA, Marcos Mirande J, Szumik CA, Salvador Arias J, Källersjö M, Farris JS. Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups. Cladistics 2009; 25:211-230. [DOI: 10.1111/j.1096-0031.2009.00255.x] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
65
|
Burleigh JG, Hilu KW, Soltis DE. Inferring phylogenies with incomplete data sets: a 5-gene, 567-taxon analysis of angiosperms. BMC Evol Biol 2009; 9:61. [PMID: 19292928 PMCID: PMC2674047 DOI: 10.1186/1471-2148-9-61] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2008] [Accepted: 03/17/2009] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Phylogenetic analyses of angiosperm relationships have used only a small percentage of available sequence data, but phylogenetic data matrices often can be augmented with existing data, especially if one allows missing characters. We explore the effects on phylogenetic analyses of adding 378 matK sequences and 240 26S rDNA sequences to the complete 3-gene, 567-taxon angiosperm phylogenetic matrix of Soltis et al. RESULTS We performed maximum likelihood bootstrap analyses of the complete, 3-gene 567-taxon data matrix and the incomplete, 5-gene 567-taxon data matrix. Although the 5-gene matrix has more missing data (27.5%) than the 3-gene data matrix (2.9%), the 5-gene analysis resulted in higher levels of bootstrap support. Within the 567-taxon tree, the increase in support is most evident for relationships among the 170 taxa for which both matK and 26S rDNA sequences were added, and there is little gain in support for relationships among the 119 taxa having neither matK nor 26S rDNA sequences. The 5-gene analysis also places the enigmatic Hydrostachys in Lamiales (BS = 97%) rather than in Cornales (BS = 100% in 3-gene analysis). The placement of Hydrostachys in Lamiales is unprecedented in molecular analyses, but it is consistent with embryological and morphological data. CONCLUSION Adding available, and often incomplete, sets of sequences to existing data sets can be a fast and inexpensive way to increase support for phylogenetic relationships and produce novel and credible new phylogenetic hypotheses.
Collapse
Affiliation(s)
- J Gordon Burleigh
- National Evolutionary Synthesis Center (NESCent), Durham, NC 27705, USA
- Department of Botany and Zoology, University of Florida, Gainesville, FL 32611, USA
| | - Khidir W Hilu
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Douglas E Soltis
- Department of Botany and Zoology, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
66
|
Goremykin VV, Viola R, Hellwig FH. Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum. J Mol Evol 2009; 68:197-204. [DOI: 10.1007/s00239-009-9206-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2008] [Revised: 01/18/2009] [Accepted: 01/29/2009] [Indexed: 11/29/2022]
|
67
|
Graham SW, Iles WJD. Different gymnosperm outgroups have (mostly) congruent signal regarding the root of flowering plant phylogeny. AMERICAN JOURNAL OF BOTANY 2009; 96:216-227. [PMID: 21628185 DOI: 10.3732/ajb.0800320] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
We examined multiple plastid genes from a diversity of gymnosperm lineages to explore the consistency of signal among different outgroups for rooting flowering plant phylogeny. For maximum parsimony (MP), most outgroups attach on a branch of the underlying ingroup tree that leads to Amborella. Maximum likelihood (ML) analyses either root angiosperms on a nearby branch or find split support for these neighboring root placements, depending on the outgroup. The inclusion of two species of Hydatellaceae, recently recognized as an ancient line of angiosperms, does not aid in inference of the root. Cost profiles for placing the root in suboptimal locations are highly correlated across most outgroup comparisons, even comparing MP and ML profiles. Those for Gnetales are the most deviant of all those considered. This divergent outgroup either attaches on a long eudicot branch with moderate bootstrap support in MP analyses or supports no particular root location in ML analysis. Removing the most rapidly evolving sites in rate classifications based on two divergent angiosperm root placements with Gnetales yields strongly conflicting root placements in MP analysis, despite substantial overlap in the estimated sets of conservative sites. However, the generally high consistency in rooting signal among distantly related gymnosperm clades suggests that the long branch connecting angiosperms to their extant relatives may not interfere substantially with inference of the angiosperm root.
Collapse
Affiliation(s)
- Sean W Graham
- UBC Botanical Garden & Centre for Plant Research (Faculty of Land and Food Systems), 2357 Main Mall, and Department of Botany, 6270 University Boulevard, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | | |
Collapse
|
68
|
Kuo CH, Wares JP, Kissinger JC. The Apicomplexan whole-genome phylogeny: an analysis of incongruence among gene trees. Mol Biol Evol 2008; 25:2689-98. [PMID: 18820254 PMCID: PMC2582981 DOI: 10.1093/molbev/msn213] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/18/2008] [Indexed: 11/26/2022] Open
Abstract
The protistan phylum Apicomplexa contains many important pathogens and is the subject of intense genome sequencing efforts. Based upon the genome sequences from seven apicomplexan species and a ciliate outgroup, we identified 268 single-copy genes suitable for phylogenetic inference. Both concatenation and consensus approaches inferred the same species tree topology. This topology is consistent with most prior conceptions of apicomplexan evolution based upon ultrastructural and developmental characters, that is, the piroplasm genera Theileria and Babesia form the sister group to the Plasmodium species, the coccidian genera Eimeria and Toxoplasma are monophyletic and are the sister group to the Plasmodium species and piroplasm genera, and Cryptosporidium forms the sister group to the above mentioned with the ciliate Tetrahymena as the outgroup. The level of incongruence among gene trees appears to be high at first glance; only 19% of the genes support the species tree, and a total of 48 different gene-tree topologies are observed. Detailed investigations suggest that the low signal-to-noise ratio in many genes may be the main source of incongruence. The probability of being consistent with the species tree increases as a function of the minimum bootstrap support observed at tree nodes for a given gene tree. Moreover, gene sequences that generate high bootstrap support are robust to the changes in alignment parameters or phylogenetic method used. However, caution should be taken in that some genes can infer a "wrong" tree with strong support because of paralogy, model violations, or other causes. The importance of examining multiple, unlinked genes that possess a strong phylogenetic signal cannot be overstated.
Collapse
|
69
|
Goremykin VV, Salamini F, Velasco R, Viola R. Mitochondrial DNA of Vitis vinifera and the issue of rampant horizontal gene transfer. Mol Biol Evol 2008; 26:99-110. [PMID: 18922764 DOI: 10.1093/molbev/msn226] [Citation(s) in RCA: 165] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The mitochondrial genome of grape (Vitis vinifera), the largest organelle genome sequenced so far, is presented. The genome is 773,279 nt long and has the highest coding capacity among known angiosperm mitochondrial DNAs (mtDNAs). The proportion of promiscuous DNA of plastid origin in the genome is also the largest ever reported for an angiosperm mtDNA, both in absolute and relative terms. In all, 42.4% of chloroplast genome of Vitis has been incorporated into its mitochondrial genome. In order to test if horizontal gene transfer (HGT) has also contributed to the gene content of the grape mtDNA, we built phylogenetic trees with the coding sequences of mitochondrial genes of grape and their homologs from plant mitochondrial genomes. Many incongruent gene tree topologies were obtained. However, the extent of incongruence between these gene trees is not significantly greater than that observed among optimal trees for chloroplast genes, the common ancestry of which has never been in doubt. In both cases, we attribute this incongruence to artifacts of tree reconstruction, insufficient numbers of characters, and gene paralogy. This finding leads us to question the recent phylogenetic interpretation of Bergthorsson et al. (2003, 2004) and Richardson and Palmer (2007) that rampant HGT into the mtDNA of Amborella best explains phylogenetic incongruence between mitochondrial gene trees for angiosperms. The only evidence for HGT into the Vitis mtDNA found involves fragments of two coding sequences stemming from two closteroviruses that cause the leaf roll disease of this plant. We also report that analysis of sequences shared by both chloroplast and mitochondrial genomes provides evidence for a previously unknown gene transfer route from the mitochondrion to the chloroplast.
Collapse
Affiliation(s)
- Vadim V Goremykin
- Istituto Agrario San Michele all'Adige Research Center, San Michele all'Adige (TN), Italy.
| | | | | | | |
Collapse
|
70
|
The phylogeny of Cetartiodactyla: The importance of dense taxon sampling, missing data, and the remarkable promise of cytochrome b to provide reliable species-level phylogenies. Mol Phylogenet Evol 2008; 48:964-85. [DOI: 10.1016/j.ympev.2008.05.046] [Citation(s) in RCA: 146] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2007] [Revised: 05/08/2008] [Accepted: 05/21/2008] [Indexed: 11/18/2022]
|
71
|
Turmel M, Brouard JS, Gagnon C, Otis C, Lemieux C. DEEP DIVISION IN THE CHLOROPHYCEAE (CHLOROPHYTA) REVEALED BY CHLOROPLAST PHYLOGENOMIC ANALYSES(1). JOURNAL OF PHYCOLOGY 2008; 44:739-750. [PMID: 27041432 DOI: 10.1111/j.1529-8817.2008.00510.x] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The Chlorophyceae (sensu Mattox and Stewart) is a morphologically diverse class of the Chlorophyta displaying biflagellate and quadriflagellate motile cells with varying configurations of the flagellar apparatus. Phylogenetic analyses of 18S rDNA data and combined 18S and 26S rDNA data from a broad range of chlorophycean taxa uncovered five major monophyletic groups (Chlamydomonadales, Sphaeropleales, Oedogoniales, Chaetophorales, and Chaetopeltidales) but could not resolve their branching order. To gain insight into the interrelationships of these groups, we analyzed multiple genes encoded by the chloroplast genomes of Chlamydomonas reinhardtii P. A. Dang. and Chlamydomonas moewusii Gerloff (Chlamydomonadales), Scenedesmus obliquus (Turpin) Kütz. (Sphaeropleales), Oedogonium cardiacum Wittr. (Oedogoniales), Stigeoclonium helveticum Vischer (Chaetophorales), and Floydiella terrestris (Groover et Hofstetter) Friedl et O'Kelly (Chaetopeltidales). The C. moewusii, Oedogonium, and Floydiella chloroplast DNAs were partly sequenced using a random strategy. Trees were reconstructed from nucleotide and amino acid data sets derived from 44 protein-coding genes of 11 chlorophytes and nine streptophytes as well as from 57 protein-coding genes of the six chlorophycean taxa. All best trees identified two robustly supported major lineages within the Chlorophyceae: a clade uniting the Chlamydomonadales and Sphaeropleales, and a clade uniting the Oedogoniales, Chaetophorales, and Chaetopeltidales (OCC clade). This dichotomy is independently supported by molecular signatures in chloroplast genes, such as insertions/deletions and the distribution of trans-spliced group II introns. Within the OCC clade, the sister relationship observed for the Chaetophorales and Chaetopeltidales is also strengthened by independent data. Character state reconstruction of basal body orientation allowed us to refine hypotheses regarding the evolution of the flagellar apparatus.
Collapse
Affiliation(s)
- Monique Turmel
- Département de Biochimie et de Microbiologie, Université Laval, Québec, Canada G1K 7P4
| | - Jean-Simon Brouard
- Département de Biochimie et de Microbiologie, Université Laval, Québec, Canada G1K 7P4
| | - Cédric Gagnon
- Département de Biochimie et de Microbiologie, Université Laval, Québec, Canada G1K 7P4
| | - Christian Otis
- Département de Biochimie et de Microbiologie, Université Laval, Québec, Canada G1K 7P4
| | - Claude Lemieux
- Département de Biochimie et de Microbiologie, Université Laval, Québec, Canada G1K 7P4
| |
Collapse
|
72
|
Soltis DE, Bell CD, Kim S, Soltis PS. Origin and Early Evolution of Angiosperms. Ann N Y Acad Sci 2008; 1133:3-25. [DOI: 10.1196/annals.1438.005] [Citation(s) in RCA: 182] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
73
|
Nishihara H, Okada N, Hasegawa M. Rooting the eutherian tree: the power and pitfalls of phylogenomics. Genome Biol 2008; 8:R199. [PMID: 17883877 PMCID: PMC2375037 DOI: 10.1186/gb-2007-8-9-r199] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2006] [Revised: 07/02/2007] [Accepted: 09/21/2007] [Indexed: 11/10/2022] Open
Abstract
In an attempt to root the eutherian tree using genome-scale data with the maximum likelihood method, a concatenate analysis supports a putatively wrong tree, whereas separate analyses of different genes reduced the bias. Background Ongoing genome sequencing projects have led to a phylogenetic approach based on genome-scale data (phylogenomics), which is beginning to shed light on longstanding unresolved phylogenetic issues. The use of large datasets in phylogenomic analysis results in a global increase in resolution due to a decrease in sampling error. However, a fully resolved tree can still be wrong if the phylogenetic inference is biased. Results Here, in an attempt to root the eutherian tree using genome-scale data with the maximum likelihood method, we demonstrate a case in which a concatenate analysis strongly supports a putatively wrong tree, whereas the total evaluation of separate analyses of different genes grossly reduced the bias of the phylogenetic inference. A conventional method of concatenate analysis of nucleotide sequences from our dataset, which includes a more than 1 megabase alignment of 2,789 nuclear genes, suggests a misled monophyly of Afrotheria (for example, elephant) and Xenarthra (for example, armadillo) with 100% bootstrap probability. However, this tree is not supported by our 'separate method', which takes into account the different tempos and modes of evolution among genes, and instead the basal Afrotheria tree is favored. Conclusion Our analysis demonstrates that in cases in which there is great variation in evolutionary features among different genes, the separate model, rather than the concatenate model, should be used for phylogenetic inference, especially in genome-scale data.
Collapse
Affiliation(s)
- Hidenori Nishihara
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 4259-B-21 Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan
- Department of Statistical Modeling, Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan
| | - Norihiro Okada
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 4259-B-21 Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan
| | - Masami Hasegawa
- Department of Statistical Modeling, Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan
- School of Life Sciences, Fudan University, Handan Road 220#, Shanghai 200433, China
| |
Collapse
|
74
|
Logacheva MD, Samigullin TH, Dhingra A, Penin AA. Comparative chloroplast genomics and phylogenetics of Fagopyrum esculentum ssp. ancestrale -a wild ancestor of cultivated buckwheat. BMC PLANT BIOLOGY 2008; 8:59. [PMID: 18492277 PMCID: PMC2430205 DOI: 10.1186/1471-2229-8-59] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2008] [Accepted: 05/20/2008] [Indexed: 05/07/2023]
Abstract
BACKGROUND Chloroplast genome sequences are extremely informative about species-interrelationships owing to its non-meiotic and often uniparental inheritance over generations. The subject of our study, Fagopyrum esculentum, is a member of the family Polygonaceae belonging to the order Caryophyllales. An uncertainty remains regarding the affinity of Caryophyllales and the asterids that could be due to undersampling of the taxa. With that background, having access to the complete chloroplast genome sequence for Fagopyrum becomes quite pertinent. RESULTS We report the complete chloroplast genome sequence of a wild ancestor of cultivated buckwheat, Fagopyrum esculentum ssp. ancestrale. The sequence was rapidly determined using a previously described approach that utilized a PCR-based method and employed universal primers, designed on the scaffold of multiple sequence alignment of chloroplast genomes. The gene content and order in buckwheat chloroplast genome is similar to Spinacia oleracea. However, some unique structural differences exist: the presence of an intron in the rpl2 gene, a frameshift mutation in the rpl23 gene and extension of the inverted repeat region to include the ycf1 gene. Phylogenetic analysis of 61 protein-coding gene sequences from 44 complete plastid genomes provided strong support for the sister relationships of Caryophyllales (including Polygonaceae) to asterids. Further, our analysis also provided support for Amborella as sister to all other angiosperms, but interestingly, in the bayesian phylogeny inference based on first two codon positions Amborella united with Nymphaeales. CONCLUSION Comparative genomics analyses revealed that the Fagopyrum chloroplast genome harbors the characteristic gene content and organization as has been described for several other chloroplast genomes. However, it has some unique structural features distinct from previously reported complete chloroplast genome sequences. Phylogenetic analysis of the dataset, including this new sequence from non-core Caryophyllales supports the sister relationship between Caryophyllales and asterids.
Collapse
Affiliation(s)
- Maria D Logacheva
- Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Tahir H Samigullin
- Department of Evolutionary Biochemistry, A.N. Belozersky Institute, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Amit Dhingra
- Department of Horticulture and Landscape Architecture, Washington State University, Pullman, USA
| | - Aleksey A Penin
- Department of Genetics, Biological Faculty, M.V. Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
75
|
Abstract
Genomes and genes diversify during evolution; however, it is unclear to what extent genes still retain the relationship among species. Model species for molecular phylogenetic studies include yeasts and viruses whose genomes were sequenced as well as plants that have the fossil-supported true phylogenetic trees available. In this study, we generated single gene trees of seven yeast species as well as single gene trees of nine baculovirus species using all the orthologous genes among the species compared. Homologous genes among seven known plants were used for validation of the finding. Four algorithms-maximum parsimony (MP), minimum evolution (ME), maximum likelihood (ML), and neighbor-joining (NJ)-were used. Trees were reconstructed before and after weighting the DNA and protein sequence lengths among genes. Rarely a gene can always generate the "true tree" by all the four algorithms. However, the most frequent gene tree, termed "maximum gene-support tree" (MGS tree, or WMGS tree for the weighted one), in yeasts, baculoviruses, or plants was consistently found to be the "true tree" among the species. The results provide insights into the overall degree of divergence of orthologous genes of the genomes analyzed and suggest the following: 1) The true tree relationship among the species studied is still maintained by the largest group of orthologous genes; 2) There are usually more orthologous genes with higher similarities between genetically closer species than between genetically more distant ones; and 3) The maximum gene-support tree reflects the phylogenetic relationship among species in comparison.
Collapse
Affiliation(s)
- Yunfeng Shan
- Molecular Genetics Laboratory, Potato Research Centre, Agriculture and Agri-Food Canada, 850 Lincoln Rd, P.O. Box 20280, Fredericton, New Brunswick, E3B 4Z7, Canada
| | | |
Collapse
|
76
|
Dávalos LM, Perkins SL. Saturation and base composition bias explain phylogenomic conflict in Plasmodium. Genomics 2008; 91:433-42. [DOI: 10.1016/j.ygeno.2008.01.006] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2007] [Revised: 01/09/2008] [Accepted: 01/12/2008] [Indexed: 10/22/2022]
|
77
|
Reticulate or tree-like chloroplast DNA evolution in Sileneae (Caryophyllaceae)? Mol Phylogenet Evol 2008; 48:313-25. [PMID: 18490181 DOI: 10.1016/j.ympev.2008.04.015] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2007] [Revised: 04/04/2008] [Accepted: 04/07/2008] [Indexed: 11/23/2022]
Abstract
Despite sampling of up to 25kb of chloroplast DNA sequence from 24 species in Sileneae a number of nodes in the phylogeny remain poorly supported and it is not expected that additional sequence sampling will converge to a reliable phylogenetic hypothesis in these parts of the tree. The main reason for this is probably a combination of rapid radiation and substitution rate heterogeneity. Poor resolution among closely related species are often explained by low levels of variation in chloroplast data, but the problem with our data appear to be high levels of homoplasy. Tree-like cpDNA evolution cannot be rejected, but apparent incongruent patterns between different regions are evaluated with the possibility of ancient interspecific chloroplast recombination as explanatory model. However, several major phylogenetic relationships, previously not recognized, are confidently resolved, e.g. the grouping of the two SW Anatolian taxa S. cryptoneura and S. sordida strongly disagrees with previous studies on nuclear DNA sequence data, and indicate a possible case of homoploid hybrid origin. The closely related S. atocioides and S. aegyptiaca form a sister group to Lychnis and the rest of Silene, thus suggesting that Silene may be paraphyletic, despite recent revisions based on molecular data.
Collapse
|
78
|
Logacheva MD, Penin AA, Samigullin TH, Vallejo-Roman CM, Antonov AS. Phylogeny of flowering plants by the chloroplast genome sequences: in search of a "lucky gene". BIOCHEMISTRY (MOSCOW) 2008; 72:1324-30. [PMID: 18205616 DOI: 10.1134/s0006297907120061] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
One of the most complicated remaining problems of molecular-phylogenetic analysis is choosing an appropriate genome region. In an ideal case, such a region should have two specific properties: (i) results of analysis using this region should be similar to the results of multigene analysis using the maximal number of regions; (ii) this region should be arranged compactly and be significantly shorter than the multigene set. The second condition is necessary to facilitate sequencing and extension of taxons under analysis, the number of which is also crucial for molecular phylogenetic analysis. Such regions have been revealed for some groups of animals and have been designated as "lucky genes". We have carried out a computational experiment on analysis of 41 complete chloroplast genomes of flowering plants aimed at searching for a "lucky gene" for reconstruction of their phylogeny. It is shown that the phylogenetic tree inferred from a combination of translated nucleotide sequences of genes encoding subunits of plastid RNA polymerase is closest to the tree constructed using all protein coding sites of the chloroplast genome. The only node for which a contradiction is observed is unstable according to the different type analyses. For all the other genes or their combinations, the coincidence is significantly worse. The RNA polymerase genes are compactly arranged in the genome and are fourfold shorter than the total length of protein coding genes used for phylogenetic analysis. The combination of all necessary features makes this group of genes main candidates for the role of "lucky gene" in studying phylogeny of flowering plants.
Collapse
Affiliation(s)
- M D Logacheva
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119991, Russia.
| | | | | | | | | |
Collapse
|
79
|
Hahn MW. Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biol 2008; 8:R141. [PMID: 17634151 PMCID: PMC2323230 DOI: 10.1186/gb-2007-8-7-r141] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2007] [Accepted: 07/16/2007] [Indexed: 12/04/2022] Open
Abstract
Background Comparative genomic studies are revealing frequent gains and losses of whole genes via duplication and pseudogenization. One commonly used method for inferring the number and timing of gene gains and losses reconciles the gene tree for each gene family with the species tree of the taxa considered. Recent studies using this approach have found a large number of ancient duplications and recent losses among vertebrate genomes. Results I show that tree reconciliation methods are biased when the inferred gene tree is not correct. This bias places duplicates towards the root of the tree and losses towards the tips of the tree. I demonstrate that this bias is present when tree reconciliation is conducted on both multiple mammal and Drosophila genomes, and that lower bootstrap cut-off values on gene trees lead to more extreme bias. I also suggest a method for dealing with reconciliation bias, although this method only corrects for the number of gene gains on some branches of the species tree. Conclusion Based on the results presented, it is likely that most tree reconciliation analyses show biases, unless the gene trees used are exceptionally well-resolved and well-supported. These results cast doubt upon previous conclusions that vertebrate genome history has been marked by many ancient duplications and many recent gene losses.
Collapse
Affiliation(s)
- Matthew W Hahn
- Department of Biology and School of Informatics, E, 3rd Street, Indiana University, Bloomington, IN 47405, USA.
| |
Collapse
|
80
|
Troitsky AV, Ignatov MS, Bobrova VK, Milyutina IA. Contribution of genosystematics to current concepts of phylogeny and classification of bryophytes. BIOCHEMISTRY (MOSCOW) 2008; 72:1368-76. [PMID: 18205621 DOI: 10.1134/s0006297907120115] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
This paper is a survey of the current state of molecular studies on bryophyte phylogeny. Molecular data have greatly contributed to developing a phylogeny and classification of bryophytes. The previous traditional systems of classification based on morphological data are being significantly revised. New data of the authors are presented on phylogeny of Hypnales pleurocarpous mosses inferred from nucleotide sequence data of the nuclear DNA internal transcribed spacers ITS1-2 and the trnL-F region of the chloroplast genome.
Collapse
Affiliation(s)
- A V Troitsky
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119992, Russia.
| | | | | | | |
Collapse
|
81
|
Jansen RK, Cai Z, Raubeson LA, Daniell H, Depamphilis CW, Leebens-Mack J, Müller KF, Guisinger-Bellian M, Haberle RC, Hansen AK, Chumley TW, Lee SB, Peery R, McNeal JR, Kuehl JV, Boore JL. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci U S A 2007; 104:19369-74. [PMID: 18048330 PMCID: PMC2148296 DOI: 10.1073/pnas.0709121104] [Citation(s) in RCA: 729] [Impact Index Per Article: 42.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2007] [Indexed: 11/18/2022] Open
Abstract
Angiosperms are the largest and most successful clade of land plants with >250,000 species distributed in nearly every terrestrial habitat. Many phylogenetic studies have been based on DNA sequences of one to several genes, but, despite decades of intensive efforts, relationships among early diverging lineages and several of the major clades remain either incompletely resolved or weakly supported. We performed phylogenetic analyses of 81 plastid genes in 64 sequenced genomes, including 13 new genomes, to estimate relationships among the major angiosperm clades, and the resulting trees are used to examine the evolution of gene and intron content. Phylogenetic trees from multiple methods, including model-based approaches, provide strong support for the position of Amborella as the earliest diverging lineage of flowering plants, followed by Nymphaeales and Austrobaileyales. The plastid genome trees also provide strong support for a sister relationship between eudicots and monocots, and this group is sister to a clade that includes Chloranthales and magnoliids. Resolution of relationships among the major clades of angiosperms provides the necessary framework for addressing numerous evolutionary questions regarding the rapid diversification of angiosperms. Gene and intron content are highly conserved among the early diverging angiosperms and basal eudicots, but 62 independent gene and intron losses are limited to the more derived monocot and eudicot clades. Moreover, a lineage-specific correlation was detected between rates of nucleotide substitutions, indels, and genomic rearrangements.
Collapse
Affiliation(s)
- Robert K Jansen
- Section of Integrative Biology and Institute of Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
82
|
|
83
|
Hansen DR, Dastidar SG, Cai Z, Penaflor C, Kuehl JV, Boore JL, Jansen RK. Phylogenetic and evolutionary implications of complete chloroplast genome sequences of four early-diverging angiosperms: Buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae). Mol Phylogenet Evol 2007; 45:547-63. [PMID: 17644003 DOI: 10.1016/j.ympev.2007.06.004] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2007] [Revised: 06/05/2007] [Accepted: 06/11/2007] [Indexed: 10/23/2022]
Abstract
We have determined the complete chloroplast genome sequences of four early-diverging lineages of angiosperms, Buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae), to examine the organization and evolution of plastid genomes and to estimate phylogenetic relationships among angiosperms. For the most part, the organization of these plastid genomes is quite similar to the ancestral angiosperm plastid genome with a few notable exceptions. Dioscorea has lost one protein-coding gene, rps16; this gene loss has also happened independently in four other land plant lineages, liverworts, conifers, Populus, and legumes. There has also been a small expansion of the inverted repeat (IR) in Dioscorea that has duplicated trnH-GUG. This event has also occurred multiple times in angiosperms, including in monocots, and in the two basal angiosperms Nuphar and Drimys. The Illicium chloroplast genome is unusual by having a 10 kb contraction of the IR. The four taxa sequenced represent key groups in resolving phylogenetic relationships among angiosperms. Illicium is one of the basal angiosperms in the Austrobaileyales, Chloranthus (Chloranthales) remains unplaced in angiosperm classifications, and Buxus and Dioscorea are early-diverging eudicots and monocots, respectively. We have used sequences for 61 shared protein-coding genes from these four genomes and combined them with sequences from 35 other genomes to estimate phylogenetic relationships using parsimony, likelihood, and Bayesian methods. There is strong congruence among the trees generated by the three methods, and most nodes have high levels of support. The results indicate that Amborella alone is sister to the remaining angiosperms; the Nymphaeales represent the next-diverging clade followed by Illicium; Chloranthus is sister to the magnoliids and together this group is sister to a large clade that includes eudicots and monocots; and Dioscorea represents an early-diverging lineage of monocots just internal to Acorus.
Collapse
Affiliation(s)
- Debra R Hansen
- Section of Integrative Biology and Institute of Cellular and Molecular Biology, Biological Laboratories 404, University of Texas, Austin, TX 78712, USA
| | | | | | | | | | | | | |
Collapse
|
84
|
Abstract
Recent progress resolving the phylogenetic relationships of the major lineages of mammals has had a broad impact in evolutionary biology, comparative genomics and the biomedical sciences. Novel insights into the timing and historical biogeography of early mammalian diversification have resulted from a new molecular tree for placental mammals coupled with dating approaches that relax the assumption of the molecular clock. We highlight the numerous applications to come from a well-resolved phylogeny and genomic prospecting in multiple lineages of mammals, from identifying regulatory elements in mammalian genomes to assessing the functional consequences of mutations in human disease loci and those driving adaptive evolution.
Collapse
Affiliation(s)
- Mark S Springer
- Department of Biology, University of California, Riverside, CA 92521, USA.
| | | |
Collapse
|
85
|
Abstract
There are many examples of groups (such as birds, bees, mammals, multicellular animals, and flowering plants) that have undergone a rapid radiation. In such cases, where there is a combination of short internal and long external branches, correctly estimating and rooting phylogenetic trees is known to be a difficult problem. In this simulation study, we tested the performances of different phylogenetic methods at estimating a tree that models a rapid radiation. We found that maximum likelihood, corrected and uncorrected neighbor-joining, and corrected and uncorrected parsimony, all suffer from biases toward specific tree topologies. In addition, we found that using a single-taxon outgroup to root a tree frequently disrupts an otherwise correct ingroup phylogeny. Moreover, for uncorrected parsimony, we found cases where several individual trees (in which the outgroup was placed incorrectly) were selected more frequently than the correct tree. Even for parameter settings where the correct tree was selected most frequently when using extremely long sequences, for sequences of up to 60,000 nucleotides the incorrectly rooted trees were each selected more frequently than the correct tree. For all the cases tested here, tree estimation using a two taxon outgroup was more accurate than when using a single-taxon outgroup. However, the ingroup was most accurately recovered when no outgroup was used.
Collapse
Affiliation(s)
- Liat Shavit
- The Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.
| | | | | | | |
Collapse
|
86
|
Geuten K, Massingham T, Darius P, Smets E, Goldman N. Experimental Design Criteria in Phylogenetics: Where to Add Taxa. Syst Biol 2007; 56:609-22. [PMID: 17654365 DOI: 10.1080/10635150701499563] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Accurate phylogenetic inference is a topic of intensive research and debate and has been studied in response to many different factors: for example, differences in the method of reconstruction, the shape of the underlying tree, the substitution model, and varying quantities and types of data. Investigating whether the conditions used might lead to inaccurate inference has been attempted through elaborate data exploration but less attention has been given to creating a unified methodology to enable experimental designs in phylogenetic analysis to be improved and so avoid suboptimal conditions. Experimental design has been part of the field of statistics since the seminal work of Fisher in the early 20th century and a large body of literature exists on how to design optimum experiments. Here we investigate the use of the Fisher information matrix to decide between candidate positions for adding a taxon to a fixed topology, and introduce a parameter transformation that permits comparison of these different designs. This extension to Goldman (1998. Proc. R. Soc. Lond. B. 265: 1779-1786) thus allows investigation of "where to add taxa" in a phylogeny. We compare three different measures of the total information for selecting the position to add a taxon to a tree. Our methods are illustrated by investigating the behavior of the three criteria when adding a branch to model trees, and by applying the different criteria to two biological examples: a simplified taxon-sampling problem in the balsaminoid Ericales and the phylogeny of seed plants.
Collapse
Affiliation(s)
- Koen Geuten
- Laboratory of Plant Systematics, KU Leuven, Belgium.
| | | | | | | | | |
Collapse
|
87
|
Gatesy J, DeSalle R, Wahlberg N. How many genes should a systematist sample? Conflicting insights from a phylogenomic matrix characterized by replicated incongruence. Syst Biol 2007; 56:355-63. [PMID: 17464890 DOI: 10.1080/10635150701294733] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Affiliation(s)
- John Gatesy
- Department of Biology, University of California Riverside, Spieth Hall, Riverside, California 92521, USA.
| | | | | |
Collapse
|
88
|
Whitfield JB, Lockhart PJ. Deciphering ancient rapid radiations. Trends Ecol Evol 2007; 22:258-65. [PMID: 17300853 DOI: 10.1016/j.tree.2007.01.012] [Citation(s) in RCA: 259] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2006] [Revised: 01/04/2007] [Accepted: 01/29/2007] [Indexed: 10/23/2022]
Abstract
A deeper phylogenetic understanding of ancient patterns of diversification would contribute to solving many problems in evolutionary biology, yet many of these phylogenies remain poorly resolved. Ancient rapid radiations pose a major challenge for phylogenetic analysis for two main reasons. First, the pattern to be deciphered, the order of divergence among lineages, tends to be supported by small amounts of data. Second, the time since divergence is large and, thus, the potential for misinterpreting phylogenetic information is great. Here, we review the underlying causes of difficulty in determining the branching patterns of diversification in ancient rapid radiations, and review novel data exploration tools that can facilitate understanding of these radiations.
Collapse
Affiliation(s)
- James B Whitfield
- Department of Entomology, 320 Morrill Hall, 505 S. Goodwin Ave., University of Illinois, Urbana, IL 61801, USA.
| | | |
Collapse
|
89
|
Li C, Ortí G, Zhang G, Lu G. A practical approach to phylogenomics: the phylogeny of ray-finned fish (Actinopterygii) as a case study. BMC Evol Biol 2007; 7:44. [PMID: 17374158 PMCID: PMC1838417 DOI: 10.1186/1471-2148-7-44] [Citation(s) in RCA: 290] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2006] [Accepted: 03/20/2007] [Indexed: 12/05/2022] Open
Abstract
Background Molecular systematics occupies one of the central stages in biology in the genomic era, ushered in by unprecedented progress in DNA technology. The inference of organismal phylogeny is now based on many independent genetic loci, a widely accepted approach to assemble the tree of life. Surprisingly, this approach is hindered by lack of appropriate nuclear gene markers for many taxonomic groups especially at high taxonomic level, partially due to the lack of tools for efficiently developing new phylogenetic makers. We report here a genome-comparison strategy to identifying nuclear gene markers for phylogenetic inference and apply it to the ray-finned fishes – the largest vertebrate clade in need of phylogenetic resolution. Results A total of 154 candidate molecular markers – relatively well conserved, putatively single-copy gene fragments with long, uninterrupted exons – were obtained by comparing whole genome sequences of two model organisms, Danio rerio and Takifugu rubripes. Experimental tests of 15 of these (randomly picked) markers on 36 taxa (representing two-thirds of the ray-finned fish orders) demonstrate the feasibility of amplifying by PCR and directly sequencing most of these candidates from whole genomic DNA in a vast diversity of fish species. Preliminary phylogenetic analyses of sequence data obtained for 14 taxa and 10 markers (total of 7,872 bp for each species) are encouraging, suggesting that the markers obtained will make significant contributions to future fish phylogenetic studies. Conclusion We present a practical approach that systematically compares whole genome sequences to identify single-copy nuclear gene markers for inferring phylogeny. Our method is an improvement over traditional approaches (e.g., manually picking genes for testing) because it uses genomic information and automates the process to identify large numbers of candidate makers. This approach is shown here to be successful for fishes, but also could be applied to other groups of organisms for which two or more complete genome sequences exist, which has important implications for assembling the tree of life.
Collapse
Affiliation(s)
- Chenhong Li
- School of Biological Sciences, University of Nebraska, Lincoln, NE 68588, USA
| | - Guillermo Ortí
- School of Biological Sciences, University of Nebraska, Lincoln, NE 68588, USA
| | - Gong Zhang
- Department of Mathematics, University of Nebraska, Omaha, NE 68182, USA
| | - Guoqing Lu
- Department of Biology, University of Nebraska, Omaha, NE 68182, USA
| |
Collapse
|
90
|
Sanchez-Puerta MV, Bachvaroff TR, Delwiche CF. Sorting wheat from chaff in multi-gene analyses of chlorophyll c-containing plastids. Mol Phylogenet Evol 2007; 44:885-97. [PMID: 17449283 DOI: 10.1016/j.ympev.2007.03.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2006] [Revised: 02/24/2007] [Accepted: 03/05/2007] [Indexed: 10/23/2022]
Abstract
Photosynthetic eukaryotes contain primary, secondary or tertiary plastids, depending on the source of the organelle (a cyanobacterium or a photosynthetic eukaryote). Plastid phylogeny is relatively well investigated, but molecular phylogenies have conflicted as a function of gene choice, taxon-representations, and analytical method. To better understand the influences of these variables, we performed analyses of a multi-gene data set based on 62 plastid-associated genes of 15 taxa representing the major plastid lineages. In an attempt to distinguish phylogenetic signal from non-phylogenetic patterns, we analyzed the data using a wide range of phylogenetic methods and examined the effect of covarion evolution and compositional bias. The data suggest that the chlorophyll c-containing plastids are monophyletic and acquired their plastids from the red algae after the emergence of the Cyanidiales. The relationships among chl c-containing plastids are particularly hard to resolve. This is the largest data set used for this purpose; the analyses show that cryptophyte plastids are sister to other chl c-containing plastids, and haptophyte and peridinin-containing dinoflagellate plastids are closely related.
Collapse
Affiliation(s)
- M Virginia Sanchez-Puerta
- Department of Cell Biology and Molecular Genetics, University of Maryland College Park, College Park, MD 20742-5815, USA.
| | | | | |
Collapse
|
91
|
Xin Z, Mandaokar A, Chen J, Last RL, Browse J. Arabidopsis ESK1 encodes a novel regulator of freezing tolerance. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2007; 49:786-99. [PMID: 17316173 DOI: 10.1111/j.1365-313x.2006.02994.x] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
The eskimo1 (esk1) mutation of Arabidopsis resulted in a 5.5 degrees C improvement in freezing tolerance in the absence of cold acclimation. Here we show that the increase in freezing tolerance is not associated with any increase in the ability to survive drought or salt stresses, which are similar to freezing in their induction of cellular dehydration. Genome-wide comparisons of gene expression between esk1-1 and wild type indicate that mutations at esk1 result in altered expression of transcription factors and signaling components and of a set of stress-responsive genes. Interestingly, the list of 312 genes regulated by ESK1 shows greater overlap with sets of genes regulated by salt, osmotic and abscisic acid treatments than with genes regulated by cold acclimation or by the transcription factors CBF3 and ICE1, which have been shown to control genetic pathways for freezing tolerance. Map-based cloning identified the esk1 locus as At3g55990. The wild-type ESK1 gene encodes a 57-kDa protein and is a member of a large gene family of DUF231 domain proteins whose members encode a total of 45 proteins of unknown function. Our results indicate that ESK1 is a novel negative regulator of cold acclimation. Mutations in the ESK1 gene provide strong freezing tolerance through genetic regulation that is apparently very different from previously described genetic mechanisms of cold acclimation.
Collapse
Affiliation(s)
- Zhanguo Xin
- Plant Stress and Germplasm Development Unit, USDA-ARS, 3810 4th Street, Lubbock, TX 79415, USA.
| | | | | | | | | |
Collapse
|
92
|
Samson N, Bausher MG, Lee SB, Jansen RK, Daniell H. The complete nucleotide sequence of the coffee (Coffea arabica L.) chloroplast genome: organization and implications for biotechnology and phylogenetic relationships amongst angiosperms. PLANT BIOTECHNOLOGY JOURNAL 2007; 5:339-53. [PMID: 17309688 PMCID: PMC3473179 DOI: 10.1111/j.1467-7652.2007.00245.x] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
The chloroplast genome sequence of Coffea arabica L., the first sequenced member of the fourth largest family of angiosperms, Rubiaceae, is reported. The genome is 155 189 bp in length, including a pair of inverted repeats of 25,943 bp. Of the 130 genes present, 112 are distinct and 18 are duplicated in the inverted repeat. The coding region comprises 79 protein genes, 29 transfer RNA genes, four ribosomal RNA genes and 18 genes containing introns (three with three exons). Repeat analysis revealed five direct and three inverted repeats of 30 bp or longer with a sequence identity of 90% or more. Comparisons of the coffee chloroplast genome with sequenced genomes of the closely related family Solanaceae indicated that coffee has a portion of rps19 duplicated in the inverted repeat and an intact copy of infA. Furthermore, whole-genome comparisons identified large indels (> 500 bp) in several intergenic spacer regions and introns in the Solanaceae, including trnE (UUC)-trnT (GGU) spacer, ycf4-cemA spacer, trnI (GAU) intron and rrn5-trnR (ACG) spacer. Phylogenetic analyses based on the DNA sequences of 61 protein-coding genes for 35 taxa, performed using both maximum parsimony and maximum likelihood methods, strongly supported the monophyly of several major clades of angiosperms, including monocots, eudicots, rosids, asterids, eurosids II, and euasterids I and II. Coffea (Rubiaceae, Gentianales) is only the second order sampled from the euasterid I clade. The availability of the complete chloroplast genome of coffee provides regulatory and intergenic spacer sequences for utilization in chloroplast genetic engineering to improve this important crop.
Collapse
Affiliation(s)
- Nalapalli Samson
- University of Central Florida, Department of Molecular Biology and Microbiology, Biomolecular Science, Building #20, Orlando, FL 32816-2364, USA
| | - Michael G. Bausher
- USDA-ARS, Horticultural Research Laboratory, Fort Pierce, FL 34945-3030, USA
| | - Seung-Bum Lee
- University of Central Florida, Department of Molecular Biology and Microbiology, Biomolecular Science, Building #20, Orlando, FL 32816-2364, USA
| | - Robert K. Jansen
- Section of Integrative Biology and Institute of Cellular and Molecular Biology, Patterson Laboratories 141, University of Texas, Austin, TX 78712, USA
| | - Henry Daniell
- University of Central Florida, Department of Molecular Biology and Microbiology, Biomolecular Science, Building #20, Orlando, FL 32816-2364, USA
| |
Collapse
|
93
|
Lartillot N, Brinkmann H, Philippe H. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol 2007; 7 Suppl 1:S4. [PMID: 17288577 PMCID: PMC1796613 DOI: 10.1186/1471-2148-7-s1-s4] [Citation(s) in RCA: 416] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions. METHODS We focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation. RESULTS Adopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences. CONCLUSION The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, 161, rue Ada, 34392 Montpellier Cedex 5, France
| | - Henner Brinkmann
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec Canada
| | - Hervé Philippe
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec Canada
| |
Collapse
|
94
|
Lemieux C, Otis C, Turmel M. A clade uniting the green algae Mesostigma viride and Chlorokybus atmophyticus represents the deepest branch of the Streptophyta in chloroplast genome-based phylogenies. BMC Biol 2007; 5:2. [PMID: 17222354 PMCID: PMC1781420 DOI: 10.1186/1741-7007-5-2] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2006] [Accepted: 01/12/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Viridiplantae comprise two major phyla: the Streptophyta, containing the charophycean green algae and all land plants, and the Chlorophyta, containing the remaining green algae. Despite recent progress in unravelling phylogenetic relationships among major green plant lineages, problematic nodes still remain in the green tree of life. One of the major issues concerns the scaly biflagellate Mesostigma viride, which is either regarded as representing the earliest divergence of the Streptophyta or a separate lineage that diverged before the Chlorophyta and Streptophyta. Phylogenies based on chloroplast and mitochondrial genomes support the latter view. Because some green plant lineages are not represented in these phylogenies, sparse taxon sampling has been suspected to yield misleading topologies. Here, we describe the complete chloroplast DNA (cpDNA) sequence of the early-diverging charophycean alga Chlorokybus atmophyticus and present chloroplast genome-based phylogenies with an expanded taxon sampling. RESULTS The 152,254 bp Chlorokybus cpDNA closely resembles its Mesostigma homologue at the gene content and gene order levels. Using various methods of phylogenetic inference, we analyzed amino acid and nucleotide data sets that were derived from 45 protein-coding genes common to the cpDNAs of 37 green algal/land plant taxa and eight non-green algae. Unexpectedly, all best trees recovered a robust clade uniting Chlorokybus and Mesostigma. In protein trees, this clade was sister to all streptophytes and chlorophytes and this placement received moderate support. In contrast, gene trees provided unequivocal support to the notion that the Mesostigma + Chlorokybus clade represents the earliest-diverging branch of the Streptophyta. Independent analyses of structural data (gene content and/or gene order) and of subsets of amino acid data progressively enriched in slow-evolving sites led us to conclude that the latter topology reflects the true organismal relationships. CONCLUSION In disclosing a sister relationship between the Mesostigmatales and Chlorokybales, our study resolves the long-standing debate about the nature of the unicellular flagellated ancestors of land plants and alters significantly our concepts regarding the evolution of streptophyte algae. Moreover, in predicting a richer chloroplast gene repertoire than previously inferred for the common ancestor of all streptophytes, our study has contributed to a better understanding of chloroplast genome evolution in the Viridiplantae.
Collapse
Affiliation(s)
- Claude Lemieux
- Département de biochimie et de microbiologie, Université Laval, Québec, QC, G1K 7P4, Canada
| | - Christian Otis
- Département de biochimie et de microbiologie, Université Laval, Québec, QC, G1K 7P4, Canada
| | - Monique Turmel
- Département de biochimie et de microbiologie, Université Laval, Québec, QC, G1K 7P4, Canada
| |
Collapse
|
95
|
Jouannic S, Collin M, Vidal B, Verdeil JL, Tregear JW. A class I KNOX gene from the palm species Elaeis guineensis (Arecaceae) is associated with meristem function and a distinct mode of leaf dissection. THE NEW PHYTOLOGIST 2007; 174:551-568. [PMID: 17447911 DOI: 10.1111/j.1469-8137.2007.02020.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Class I Knotted-like homeobox (KNOX) transcription factors are important regulators of shoot apical meristem function and leaf morphology by their contribution to dissected leaf development. Palms are of particular interest as they produce dissected leaves generated by a distinct mechanism compared with eudicots. The question addressed here was whether class I KNOX genes might be involved in meristem function and leaf dissection in palms. Here, we characterized the EgKNOX1 gene from oil palm (Elaeis guineensis, Arecaceae) and compared it with available sequences from other plant species using phylogenetic analysis. Gene expression pattern was investigated using reverse transcription-polymerase chain reaction (RT-PCR) and in situ hybridization. Functional analysis was carried out by ectopic expression in Arabidopsis and rice. EgKNOX1 was orthologous to STM from Arabidopsis and to OSH1 from rice. It was expressed in the central zone of both vegetative and reproductive meristems. During leaf development, its expression was associated with plications from which the leaflets originate. Different modes of leaf dissection are seen to involve a similar class of genes to control meristematic activities, which govern the production of dissected morphologies.
Collapse
Affiliation(s)
- Stefan Jouannic
- IRD/CIRAD Palm Group, UMR 1098, Centre IRD Montpellier, 911 avenue Agropolis, 34394 Montpellier cedex 5, France
| | - Myriam Collin
- IRD/CIRAD Palm Group, UMR 1098, Centre IRD Montpellier, 911 avenue Agropolis, 34394 Montpellier cedex 5, France
| | - Benjamin Vidal
- IRD/CIRAD Palm Group, UMR 1098, Centre IRD Montpellier, 911 avenue Agropolis, 34394 Montpellier cedex 5, France
| | - Jean-Luc Verdeil
- Plateau d'Histocytologie et d'Imagerie Cellulaire Végétale, IFR 127, TA/40/02, CIRAD, Avenue Agropolis, F-34398 Montpellier cedex 5, France
| | - James W Tregear
- IRD/CIRAD Palm Group, UMR 1098, Centre IRD Montpellier, 911 avenue Agropolis, 34394 Montpellier cedex 5, France
| |
Collapse
|
96
|
Ravi V, Khurana JP, Tyagi AK, Khurana P. Rosales sister to Fabales: towards resolving the rosid puzzle. Mol Phylogenet Evol 2006; 44:488-93. [PMID: 17196401 DOI: 10.1016/j.ympev.2006.11.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2006] [Revised: 11/02/2006] [Accepted: 11/13/2006] [Indexed: 11/26/2022]
Affiliation(s)
- V Ravi
- Interdisciplinary Centre for Plant Genomics and Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi, India
| | | | | | | |
Collapse
|
97
|
Wang HC, Spencer M, Susko E, Roger AJ. Testing for covarion-like evolution in protein sequences. Mol Biol Evol 2006; 24:294-305. [PMID: 17056642 DOI: 10.1093/molbev/msl155] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The covarion hypothesis of molecular evolution proposes that selective pressures on an amino acid or nucleotide site change through time, thus causing changes of evolutionary rate along the edges of a phylogenetic tree. Several kinds of Markov models for the covarion process have been proposed. One model, proposed by Huelsenbeck (2002), has 2 substitution rate classes: the substitution process at a site can switch between a single variable rate, drawn from a discrete gamma distribution, and a zero invariable rate. A second model, suggested by Galtier (2001), assumes rate switches among an arbitrary number of rate classes but switching to and from the invariable rate class is not allowed. The latter model allows for some sites that do not participate in the rate-switching process. Here we propose a general covarion model that combines features of both models, allowing evolutionary rates not only to switch between variable and invariable classes but also to switch among different rates when they are in a variable state. We have implemented all 3 covarion models in a maximum likelihood framework for amino acid sequences and tested them on 23 protein data sets. We found significant likelihood increases for all data sets for the 3 models, compared with a model that does not allow site-specific rate switches along the tree. Furthermore, we found that the general model fit the data better than the simpler covarion models in the majority of the cases, highlighting the complexity in modeling the covarion process. The general covarion model can be used for comparing tree topologies, molecular dating studies, and the investigation of protein adaptation.
Collapse
Affiliation(s)
- Huai-Chun Wang
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.
| | | | | | | |
Collapse
|
98
|
Cai Z, Penaflor C, Kuehl JV, Leebens-Mack J, Carlson JE, dePamphilis CW, Boore JL, Jansen RK. Complete plastid genome sequences of Drimys, Liriodendron, and Piper: implications for the phylogenetic relationships of magnoliids. BMC Evol Biol 2006; 6:77. [PMID: 17020608 PMCID: PMC1626487 DOI: 10.1186/1471-2148-6-77] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2006] [Accepted: 10/04/2006] [Indexed: 11/20/2022] Open
Abstract
Background The magnoliids with four orders, 19 families, and 8,500 species represent one of the largest clades of early diverging angiosperms. Although several recent angiosperm phylogenetic analyses supported the monophyly of magnoliids and suggested relationships among the orders, the limited number of genes examined resulted in only weak support, and these issues remain controversial. Furthermore, considerable incongruence resulted in phylogenetic reconstructions supporting three different sets of relationships among magnoliids and the two large angiosperm clades, monocots and eudicots. We sequenced the plastid genomes of three magnoliids, Drimys (Canellales), Liriodendron (Magnoliales), and Piper (Piperales), and used these data in combination with 32 other angiosperm plastid genomes to assess phylogenetic relationships among magnoliids and to examine patterns of variation of GC content. Results The Drimys, Liriodendron, and Piper plastid genomes are very similar in size at 160,604, 159,886 bp, and 160,624 bp, respectively. Gene content and order are nearly identical to many other unrearranged angiosperm plastid genomes, including Calycanthus, the other published magnoliid genome. Overall GC content ranges from 34–39%, and coding regions have a substantially higher GC content than non-coding regions. Among protein-coding genes, GC content varies by codon position with 1st codon > 2nd codon > 3rd codon, and it varies by functional group with photosynthetic genes having the highest percentage and NADH genes the lowest. Phylogenetic analyses using parsimony and likelihood methods and sequences of 61 protein-coding genes provided strong support for the monophyly of magnoliids and two strongly supported groups were identified, the Canellales/Piperales and the Laurales/Magnoliales. Strong support is reported for monocots and eudicots as sister clades with magnoliids diverging before the monocot-eudicot split. The trees also provided moderate or strong support for the position of Amborella as sister to a clade including all other angiosperms. Conclusion Evolutionary comparisons of three new magnoliid plastid genome sequences, combined with other published angiosperm genomes, confirm that GC content is unevenly distributed across the genome by location, codon position, and functional group. Furthermore, phylogenetic analyses provide the strongest support so far for the hypothesis that the magnoliids are sister to a large clade that includes both monocots and eudicots.
Collapse
Affiliation(s)
- Zhengqiu Cai
- Section of Integrative Biology and Institute of Cellular and Molecular Biology, Patterson Laboratories 141, University of Texas, Austin, TX 78712, USA
| | - Cynthia Penaflor
- Biology Department, 373 WIDB, Brigham Young University, Provo, UT 84602, USA
| | - Jennifer V Kuehl
- DOE Joint Genome Institute and Lawrence Berkeley National Laboratory, Walnut Creek, CA 94598, USA
| | | | - John E Carlson
- School of Forest Resources and Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Claude W dePamphilis
- Department of Biology, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jeffrey L Boore
- DOE Joint Genome Institute and Lawrence Berkeley National Laboratory, Walnut Creek, CA 94598, USA
| | - Robert K Jansen
- Section of Integrative Biology and Institute of Cellular and Molecular Biology, Patterson Laboratories 141, University of Texas, Austin, TX 78712, USA
| |
Collapse
|
99
|
Bausher MG, Singh ND, Lee SB, Jansen RK, Daniell H. The complete chloroplast genome sequence of Citrus sinensis (L.) Osbeck var 'Ridge Pineapple': organization and phylogenetic relationships to other angiosperms. BMC PLANT BIOLOGY 2006; 6:21. [PMID: 17010212 PMCID: PMC1599732 DOI: 10.1186/1471-2229-6-21] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2006] [Accepted: 09/30/2006] [Indexed: 05/12/2023]
Abstract
BACKGROUND The production of Citrus, the largest fruit crop of international economic value, has recently been imperiled due to the introduction of the bacterial disease Citrus canker. No significant improvements have been made to combat this disease by plant breeding and nuclear transgenic approaches. Chloroplast genetic engineering has a number of advantages over nuclear transformation; it not only increases transgene expression but also facilitates transgene containment, which is one of the major impediments for development of transgenic trees. We have sequenced the Citrus chloroplast genome to facilitate genetic improvement of this crop and to assess phylogenetic relationships among major lineages of angiosperms. RESULTS The complete chloroplast genome sequence of Citrus sinensis is 160,129 bp in length, and contains 133 genes (89 protein-coding, 4 rRNAs and 30 distinct tRNAs). Genome organization is very similar to the inferred ancestral angiosperm chloroplast genome. However, in Citrus the infA gene is absent. The inverted repeat region has expanded to duplicate rps19 and the first 84 amino acids of rpl22. The rpl22 gene in the IRb region has a nonsense mutation resulting in 9 stop codons. This was confirmed by PCR amplification and sequencing using primers that flank the IR/LSC boundaries. Repeat analysis identified 29 direct and inverted repeats 30 bp or longer with a sequence identity > or = 90%. Comparison of protein-coding sequences with expressed sequence tags revealed six putative RNA edits, five of which resulted in non-synonymous modifications in petL, psbH, ycf2 and ndhA. Phylogenetic analyses using maximum parsimony (MP) and maximum likelihood (ML) methods of a dataset composed of 61 protein-coding genes for 30 taxa provide strong support for the monophyly of several major clades of angiosperms, including monocots, eudicots, rosids and asterids. The MP and ML trees are incongruent in three areas: the position of Amborella and Nymphaeales, relationship of the magnoliid genus Calycanthus, and the monophyly of the eurosid I clade. Both MP and ML trees provide strong support for the monophyly of eurosids II and for the placement of Citrus (Sapindales) sister to a clade including the Malvales/Brassicales. CONCLUSION This is the first complete chloroplast genome sequence for a member of the Rutaceae and Sapindales. Expansion of the inverted repeat region to include rps19 and part of rpl22 and presence of two truncated copies of rpl22 is unusual among sequenced chloroplast genomes. Availability of a complete Citrus chloroplast genome sequence provides valuable information on intergenic spacer regions and endogenous regulatory sequences for chloroplast genetic engineering. Phylogenetic analyses resolve relationships among several major clades of angiosperms and provide strong support for the monophyly of the eurosid II clade and the position of the Sapindales sister to the Brassicales/Malvales.
Collapse
Affiliation(s)
- Michael G Bausher
- USDA-ARS, Horticultural Research Laboratory, Fort Pierce, FL 34945–3030, USA
| | - Nameirakpam D Singh
- Dept. of Molecular Biology & Microbiology, University of Central Florida, Biomolecular Science, Building #20, Orlando, FL 32816–2364, USA
| | - Seung-Bum Lee
- Dept. of Molecular Biology & Microbiology, University of Central Florida, Biomolecular Science, Building #20, Orlando, FL 32816–2364, USA
| | - Robert K Jansen
- Section of Integrative Biology and Institute of Cellular and Molecular Biology, Patterson Laboratories 141, University of Texas, Austin, TX 78712, USA
| | - Henry Daniell
- Dept. of Molecular Biology & Microbiology, University of Central Florida, Biomolecular Science, Building #20, Orlando, FL 32816–2364, USA
| |
Collapse
|
100
|
Cannon CH, Kua CS, Lobenhofer EK, Hurban P. Capturing genomic signatures of DNA sequence variation using a standard anonymous microarray platform. Nucleic Acids Res 2006; 34:e121. [PMID: 17000641 PMCID: PMC1636412 DOI: 10.1093/nar/gkl478] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Comparative genomics, using the model organism approach, has provided powerful insights into the structure and evolution of whole genomes. Unfortunately, only a small fraction of Earth's biodiversity will have its genome sequenced in the foreseeable future. Most wild organisms have radically different life histories and evolutionary genomics than current model systems. A novel technique is needed to expand comparative genomics to a wider range of organisms. Here, we describe a novel approach using an anonymous DNA microarray platform that gathers genomic samples of sequence variation from any organism. Oligonucleotide probe sequences placed on a custom 44 K array were 25 bp long and designed using a simple set of criteria to maximize their complexity and dispersion in sequence probability space. Using whole genomic samples from three known genomes (mouse, rat and human) and one unknown (Gonystylus bancanus), we demonstrate and validate its power, reliability, transitivity and sensitivity. Using two separate statistical analyses, a large numbers of genomic ‘indicator’ probes were discovered. The construction of a genomic signature database based upon this technique would allow virtual comparisons and simple queries could generate optimal subsets of markers to be used in large-scale assays, using simple downstream techniques. Biologists from a wide range of fields, studying almost any organism, could efficiently perform genomic comparisons, at potentially any phylogenetic level after performing a small number of standardized DNA microarray hybridizations. Possibilities for refining and expanding the approach are discussed.
Collapse
Affiliation(s)
- C. H. Cannon
- To whom correspondence should be addressed. Tel: +1 806 742 3993; Fax: +1 806 742 2963;
| | - C. S. Kua
- 27 Jln. Dato Haji Harun, Taman Tayton ViewKuala Lumpur, Malaysia
| | - E. K. Lobenhofer
- Paradigm Array Labs, a Service Unit of Icoria Inc.Research Triangle Park, NC 27709, USA
| | - P. Hurban
- Paradigm Array Labs, a Service Unit of Icoria Inc.Research Triangle Park, NC 27709, USA
| |
Collapse
|