51
|
de Miguel M, Bartholomé J, Ehrenmann F, Murat F, Moriguchi Y, Uchiyama K, Ueno S, Tsumura Y, Lagraulet H, de Maria N, Cabezas JA, Cervera MT, Gion JM, Salse J, Plomion C. Evidence of intense chromosomal shuffling during conifer evolution. Genome Biol Evol 2015; 7:2799-2809. [PMID: 26400405 PMCID: PMC4684699 DOI: 10.1093/gbe/evv185] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Although recent advances have been gained on genome evolution in angiosperm lineages, virtually nothing is known about karyotype evolution in the other group of seed plants, the gymnosperms. Here, we used high-density gene-based linkage mapping to compare the karyotype structure of two families of conifers (the most abundant group of gymnosperms) separated around 290 Ma: Pinaceae and Cupressaceae. We propose for the first time a model based on the fusion of 20 ancestral chromosomal blocks that may have shaped the modern karyotpes of Pinaceae (with n = 12) and Cupressaceae (with n = 11). The considerable difference in modern genome organization between these two lineages contrasts strongly with the remarkable level of synteny already reported within the Pinaceae. It also suggests a convergent evolutionary mechanism of chromosomal block shuffling that has shaped the genomes of the spermatophytes.
Collapse
Affiliation(s)
- Marina de Miguel
- INRA, UMR 1202 BIOGECO, 69 Route d'Arcachon,F-33610 Cestas, France Université de Bordeaux, UMR 1202 BIOGECO, F-33170 Talence, France
| | - Jérôme Bartholomé
- INRA, UMR 1202 BIOGECO, 69 Route d'Arcachon,F-33610 Cestas, France Université de Bordeaux, UMR 1202 BIOGECO, F-33170 Talence, France
| | - François Ehrenmann
- INRA, UMR 1202 BIOGECO, 69 Route d'Arcachon,F-33610 Cestas, France Université de Bordeaux, UMR 1202 BIOGECO, F-33170 Talence, France
| | - Florent Murat
- INRA/UBP UMR 1095 GDEC 'Génétique, Diversité et Ecophysiologie des Céréales', 5 Chemin de Beaulieu, 63100 Clermont Ferrand, France
| | - Yoshinari Moriguchi
- Niigata University, Graduate School of Science and Technology, 8050, Igarashi 2-Nocho, Nishi-ku, Niigata 950-2181, Japan
| | - Kentaro Uchiyama
- Forestry and Forest Products Research Institute, Department of Forest Genetics, Tsukuba, Ibaraki 305-8687, Japan
| | - Saneyoshi Ueno
- Forestry and Forest Products Research Institute, Department of Forest Genetics, Tsukuba, Ibaraki 305-8687, Japan
| | - Yoshihiko Tsumura
- University of Tsukuba, Faculty of Life & Environmental Sciences, 1-1-1, Tennodai, Tsukuba, Ibaraki 305-8572, Japan
| | - Hélène Lagraulet
- INRA, UMR 1202 BIOGECO, 69 Route d'Arcachon,F-33610 Cestas, France Université de Bordeaux, UMR 1202 BIOGECO, F-33170 Talence, France
| | - Nuria de Maria
- INIA-CIFOR, departamento de Ecologia y Genetica Forestal, 28040, Madrid, Spain INIA-UPM, Unidad mixta de Genomica y Ecofisiologia Forestal, Madrid, Spain
| | - José-Antonio Cabezas
- INIA-CIFOR, departamento de Ecologia y Genetica Forestal, 28040, Madrid, Spain INIA-UPM, Unidad mixta de Genomica y Ecofisiologia Forestal, Madrid, Spain
| | - Maria-Teresa Cervera
- INIA-CIFOR, departamento de Ecologia y Genetica Forestal, 28040, Madrid, Spain INIA-UPM, Unidad mixta de Genomica y Ecofisiologia Forestal, Madrid, Spain
| | - Jean Marc Gion
- INRA, UMR 1202 BIOGECO, 69 Route d'Arcachon,F-33610 Cestas, France Université de Bordeaux, UMR 1202 BIOGECO, F-33170 Talence, France CIRAD, UMR AGAP, F-33612 Cestas, France
| | - Jérôme Salse
- INRA/UBP UMR 1095 GDEC 'Génétique, Diversité et Ecophysiologie des Céréales', 5 Chemin de Beaulieu, 63100 Clermont Ferrand, France
| | - Christophe Plomion
- INRA, UMR 1202 BIOGECO, 69 Route d'Arcachon,F-33610 Cestas, France Université de Bordeaux, UMR 1202 BIOGECO, F-33170 Talence, France
| |
Collapse
|
52
|
Ma L, Hatlen A, Kelly LJ, Becher H, Wang W, Kovarik A, Leitch IJ, Leitch AR. Angiosperms Are Unique among Land Plant Lineages in the Occurrence of Key Genes in the RNA-Directed DNA Methylation (RdDM) Pathway. Genome Biol Evol 2015; 7:2648-62. [PMID: 26338185 PMCID: PMC4607528 DOI: 10.1093/gbe/evv171] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The RNA-directed DNA methylation (RdDM) pathway can be divided into three phases: 1) small interfering RNA biogenesis, 2) de novo methylation, and 3) chromatin modification. To determine the degree of conservation of this pathway we searched for key genes among land plants. We used OrthoMCL and the OrthoMCL Viridiplantae database to analyze proteomes of species in bryophytes, lycophytes, monilophytes, gymnosperms, and angiosperms. We also analyzed small RNA size categories and, in two gymnosperms, cytosine methylation in ribosomal DNA. Six proteins were restricted to angiosperms, these being NRPD4/NRPE4, RDM1, DMS3 (defective in meristem silencing 3), SHH1 (SAWADEE homeodomain homolog 1), KTF1, and SUVR2, although we failed to find the latter three proteins in Fritillaria persica, a species with a giant genome. Small RNAs of 24 nt in length were abundant only in angiosperms. Phylogenetic analyses of Dicer-like (DCL) proteins showed that DCL2 was restricted to seed plants, although it was absent in Gnetum gnemon and Welwitschia mirabilis. The data suggest that phases (1) and (2) of the RdDM pathway, described for model angiosperms, evolved with angiosperms. The absence of some features of RdDM in F. persica may be associated with its large genome. Phase (3) is probably the most conserved part of the pathway across land plants. DCL2, involved in virus defense and interaction with the canonical RdDM pathway to facilitate methylation of CHH, is absent outside seed plants. Its absence in G. gnemon, and W. mirabilis coupled with distinctive patterns of CHH methylation, suggest a secondary loss of DCL2 following the divergence of Gnetales.
Collapse
Affiliation(s)
- Lu Ma
- School of Biological and Chemical Sciences, Queen Mary University of London, United Kingdom
| | - Andrea Hatlen
- School of Biological and Chemical Sciences, Queen Mary University of London, United Kingdom
| | - Laura J Kelly
- School of Biological and Chemical Sciences, Queen Mary University of London, United Kingdom
| | - Hannes Becher
- School of Biological and Chemical Sciences, Queen Mary University of London, United Kingdom
| | - Wencai Wang
- School of Biological and Chemical Sciences, Queen Mary University of London, United Kingdom
| | - Ales Kovarik
- Department of Molecular Epigenetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, Czech Republic
| | - Ilia J Leitch
- Department of Comparative Plant and Fungal Biology Royal Botanic Gardens, Kew, Richmond, Surrey, United Kingdom
| | - Andrew R Leitch
- School of Biological and Chemical Sciences, Queen Mary University of London, United Kingdom
| |
Collapse
|
53
|
Huang MD, Huang AHC. Bioinformatics Reveal Five Lineages of Oleosins and the Mechanism of Lineage Evolution Related to Structure/Function from Green Algae to Seed Plants. PLANT PHYSIOLOGY 2015; 169:453-70. [PMID: 26232488 PMCID: PMC4577406 DOI: 10.1104/pp.15.00634] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Accepted: 07/28/2015] [Indexed: 05/20/2023]
Abstract
Plant cells contain subcellular lipid droplets with a triacylglycerol matrix enclosed by a layer of phospholipids and the small structural protein oleosin. Oleosins possess a conserved central hydrophobic hairpin of approximately 72 residues penetrating into the lipid droplet matrix and amphipathic amino- and carboxyl (C)-terminal peptides lying on the phospholipid surface. Bioinformatics of 1,000 oleosins of green algae and all plants emphasizing biological implications reveal five oleosin lineages: primitive (in green algae, mosses, and ferns), universal (U; all land plants), and three in specific organs or phylogenetic groups, termed seed low-molecular-weight (SL; seed plants), seed high-molecular-weight (SH; angiosperms), and tapetum (T; Brassicaceae) oleosins. Transition from one lineage to the next is depicted from lineage intermediates at junctions of phylogeny and organ distributions. Within a species, each lineage, except the T oleosin lineage, has one to four genes per haploid genome, only approximately two of which are active. Primitive oleosins already possess all the general characteristics of oleosins. U oleosins have C-terminal sequences as highly conserved as the hairpin sequences; thus, U oleosins including their C-terminal peptide exert indispensable, unknown functions. SL and SH oleosin transcripts in seeds are in an approximately 1:1 ratio, which suggests the occurrence of SL-SH oleosin dimers/multimers. T oleosins in Brassicaceae are encoded by rapidly evolved multitandem genes for alkane storage and transfer. Overall, oleosins have evolved to retain conserved hairpin structures but diversified for unique structures and functions in specific cells and plant families. Also, our studies reveal oleosin in avocado (Persea americana) mesocarp and no acyltransferase/lipase motifs in most oleosins.
Collapse
Affiliation(s)
- Ming-Der Huang
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan 11529 (M.-D.H.); andCenter for Plant Cell Biology, Department of Botany and Plant Sciences, University of California, Riverside, California 92521 (A.H.C.H.)
| | - Anthony H C Huang
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan 11529 (M.-D.H.); andCenter for Plant Cell Biology, Department of Botany and Plant Sciences, University of California, Riverside, California 92521 (A.H.C.H.)
| |
Collapse
|
54
|
Cao H. Genome-Wide Analysis of Oleosin Gene Family in 22 Tree Species: An Accelerator for Metabolic Engineering of BioFuel Crops and Agrigenomics Industrial Applications? OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2015; 19:521-41. [PMID: 26258573 DOI: 10.1089/omi.2015.0073] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Trees contribute to enormous plant oil reserves because many trees contain 50%-80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the "proline knot" motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs.
Collapse
Affiliation(s)
- Heping Cao
- U.S. Department of Agriculture, Agricultural Research Service, Southern Regional Research Center , New Orleans, Louisiana
| |
Collapse
|
55
|
Tian Y, Zeng Y, Zhang J, Yang C, Yan L, Wang X, Shi C, Xie J, Dai T, Peng L, Zeng Huan Y, Xu A, Huang Y, Zhang J, Ma X, Dong Y, Hao S, Sheng J. High quality reference genome of drumstick tree (Moringa oleifera Lam.), a potential perennial crop. SCIENCE CHINA-LIFE SCIENCES 2015; 58:627-38. [PMID: 26032590 DOI: 10.1007/s11427-015-4872-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Accepted: 03/10/2015] [Indexed: 12/18/2022]
Abstract
The drumstick tree (Moringa oleifera Lam.) is a perennial crop that has gained popularity in certain developing countries for its high-nutrition content and adaptability to arid and semi-arid environments. Here we report a high-quality draft genome sequence of M. oleifera. This assembly represents 91.78% of the estimated genome size and contains 19,465 protein-coding genes. Comparative genomic analysis between M. oleifera and related woody plant genomes helps clarify the general evolution of this species, while the identification of several species-specific gene families and positively selected genes in M. oleifera may help identify genes related to M. oleifera's high protein content, fast-growth, heat and stress tolerance. This reference genome greatly extends the basic research on M. oleifera, and may further promote applying genomics to enhanced breeding and improvement of M. oleifera.
Collapse
Affiliation(s)
- Yang Tian
- College of Life Sciences, Jilin University, Changchun, 130012, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
56
|
Zuccolo A, Scofield DG, De Paoli E, Morgante M. The Ty1-copia LTR retroelement family PARTC is highly conserved in conifers over 200 MY of evolution. Gene 2015; 568:89-99. [PMID: 25982862 DOI: 10.1016/j.gene.2015.05.028] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Revised: 04/06/2015] [Accepted: 05/11/2015] [Indexed: 11/26/2022]
Abstract
Long Terminal Repeat retroelements (LTR-RTs) are a major component of many plant genomes. Although well studied and described in angiosperms, their features and dynamics are poorly understood in gymnosperms. Representative complete copies of a Ty1-copia element isolate in Picea abies and named PARTC were identified in six other conifer species (Picea glauca, Pinus sylvestris, Pinus taeda, Abies sibirica, Taxus baccata and Juniperus communis) covering more than 200 million years of evolution. Here we characterized the structure of this element, assessed its abundance across conifers, studied the modes and timing of its amplification, and evaluated the degree of conservation of its extant copies at nucleotide level over distant species. We demonstrated that the element is ancient, abundant, widespread and its paralogous copies are present in the genera Picea, Pinus and Abies as an LTR-RT family. The amplification leading to the extant copies of PARTC occurred over long evolutionary times spanning 10s of MY and mostly took place after the speciation of the conifers analyzed. The level of conservation of PARTC is striking and may be explained by low substitution rates and limited removal mechanisms for LTR-RTs. These PARTC features and dynamics are representative of a more general scenario for LTR-RTs in gymnosperms quite different from that characterizing the vast majority of LTR-RT elements in angiosperms.
Collapse
Affiliation(s)
- Andrea Zuccolo
- Institute of Life Sciences, Scuola Superiore Sant'Anna, 56127 Pisa, Italy; Istituto di Genomica Applicata, Via J. Linussio 51, 33100 Udine, Italy.
| | - Douglas G Scofield
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-75236 Uppsala, Sweden
| | - Emanuele De Paoli
- Università degli Studi di Udine, Via delle Scienze 208, 33100 Udine, Italy
| | - Michele Morgante
- Istituto di Genomica Applicata, Via J. Linussio 51, 33100 Udine, Italy; Università degli Studi di Udine, Via delle Scienze 208, 33100 Udine, Italy
| |
Collapse
|
57
|
Wachowiak W, Trivedi U, Perry A, Cavers S. Comparative transcriptomics of a complex of four European pine species. BMC Genomics 2015; 16:234. [PMID: 25887584 PMCID: PMC4458023 DOI: 10.1186/s12864-015-1401-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 02/24/2015] [Indexed: 11/25/2022] Open
Abstract
Background Pinus sylvestris, P. mugo, P. uliginosa and P. uncinata are closely related but phenotypically and ecologically very distinct European pine species providing an excellent study system for analysis of the genetic basis of adaptive variation and speciation. For comparative genomic analysis of the species, transcriptome sequence was generated for 17 samples collected across the European distribution range using Illumina paired-end sequencing technology. Results De novo transcriptome assembly of a reference sample of P. sylvestris contained 40968 unigenes, of which fewer than 0.5% were identified as putative retrotransposon sequences. Based on gene annotation approaches, 19659 contigs were identified and assigned to unique genes covering a broad range of gene ontology categories. About 80% of the reads from each sample were successfully mapped to the reference transcriptome of P. sylvestris. Single nucleotide polymorphisms were identified in 22041-24096 of the unigenes providing a set of ~220-262 k SNPs identified for each species. Very similar levels of nucleotide polymorphism were observed across species (π=0.0044-0.0053) and highest pairwise nucleotide divergence (0.006) was found between P. mugo and P. sylvestris at a common set of unigenes. Conclusions The study provides whole transcriptome sequence and a large set of SNPs to advance population and association genetic studies in pines. Our study demonstrates that transcriptome sequencing can be a very useful approach for development of novel genomic resources in species with large and complex genomes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1401-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Witold Wachowiak
- Centre for Ecology and Hydrology Edinburgh, Bush Estate, Penicuik, Midlothian, EH26 0QB, UK. .,Institute of Dendrology, Polish Academy of Sciences, Parkowa 5, 62-035, Kórnik, Poland.
| | - Urmi Trivedi
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh, Edinburgh, EH9 3JT, UK.
| | - Annika Perry
- Centre for Ecology and Hydrology Edinburgh, Bush Estate, Penicuik, Midlothian, EH26 0QB, UK.
| | - Stephen Cavers
- Centre for Ecology and Hydrology Edinburgh, Bush Estate, Penicuik, Midlothian, EH26 0QB, UK.
| |
Collapse
|
58
|
Jiao Y, Paterson AH. Polyploidy-associated genome modifications during land plant evolution. Philos Trans R Soc Lond B Biol Sci 2015; 369:rstb.2013.0355. [PMID: 24958928 DOI: 10.1098/rstb.2013.0355] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The occurrence of polyploidy in land plant evolution has led to an acceleration of genome modifications relative to other crown eukaryotes and is correlated with key innovations in plant evolution. Extensive genome resources provide for relating genomic changes to the origins of novel morphological and physiological features of plants. Ancestral gene contents for key nodes of the plant family tree are inferred. Pervasive polyploidy in angiosperms appears likely to be the major factor generating novel angiosperm genes and expanding some gene families. However, most gene families lose most duplicated copies in a quasi-neutral process, and a few families are actively selected for single-copy status. One of the great challenges of evolutionary genomics is to link genome modifications to speciation, diversification and the morphological and/or physiological innovations that collectively compose biodiversity. Rapid accumulation of genomic data and its ongoing investigation may greatly improve the resolution at which evolutionary approaches can contribute to the identification of specific genes responsible for particular innovations. The resulting, more 'particulate' understanding of plant evolution, may elevate to a new level fundamental knowledge of botanical diversity, including economically important traits in the crop plants that sustain humanity.
Collapse
Affiliation(s)
- Yuannian Jiao
- Plant Genome Mapping Laboratory, University of Georgia, 111 Riverbend Road, Athens, GA 30606, USA
| | - Andrew H Paterson
- Plant Genome Mapping Laboratory, University of Georgia, 111 Riverbend Road, Athens, GA 30606, USA
| |
Collapse
|
59
|
Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol 2015. [PMID: 25637298 DOI: 10.1186/s13059‐015‐0582‐8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.
Collapse
Affiliation(s)
- Jarrod A Chapman
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Aydın Buluç
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Kerrie Barry
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Evangelos Georganas
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Adam Session
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| | - Veronika Strnadova
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA.
| | - Jerry Jenkins
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Sunish Sehgal
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA. .,Present address: Department of Plant Science, South Dakota State University, Brookings, SD, 57007, USA.
| | - Leonid Oliker
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Jeremy Schmutz
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Katherine A Yelick
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Robbie Waugh
- Division of Plant Sciences, University of Dundee & The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK.
| | - Jesse A Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA.
| | - Gary J Muehlbauer
- Departments of Agronomy and Plant Genetics, and Plant Biology, University of Minnesota, St Paul, MN, 55108, USA.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Daniel S Rokhsar
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| |
Collapse
|
60
|
Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol 2015; 16:26. [PMID: 25637298 PMCID: PMC4373400 DOI: 10.1186/s13059-015-0582-8] [Citation(s) in RCA: 164] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 01/06/2015] [Indexed: 11/10/2022] Open
Abstract
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.
Collapse
Affiliation(s)
- Jarrod A Chapman
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Aydın Buluç
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Kerrie Barry
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Evangelos Georganas
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Adam Session
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| | - Veronika Strnadova
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA.
| | - Jerry Jenkins
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Sunish Sehgal
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA. .,Present address: Department of Plant Science, South Dakota State University, Brookings, SD, 57007, USA.
| | - Leonid Oliker
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Jeremy Schmutz
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Katherine A Yelick
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Robbie Waugh
- Division of Plant Sciences, University of Dundee & The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK.
| | - Jesse A Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA.
| | - Gary J Muehlbauer
- Departments of Agronomy and Plant Genetics, and Plant Biology, University of Minnesota, St Paul, MN, 55108, USA.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Daniel S Rokhsar
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| |
Collapse
|
61
|
De La Torre AR, Birol I, Bousquet J, Ingvarsson PK, Jansson S, Jones SJM, Keeling CI, MacKay J, Nilsson O, Ritland K, Street N, Yanchuk A, Zerbe P, Bohlmann J. Insights into conifer giga-genomes. PLANT PHYSIOLOGY 2014; 166:1724-32. [PMID: 25349325 PMCID: PMC4256843 DOI: 10.1104/pp.114.248708] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Insights from sequenced genomes of major land plant lineages have advanced research in almost every aspect of plant biology. Until recently, however, assembled genome sequences of gymnosperms have been missing from this picture. Conifers of the pine family (Pinaceae) are a group of gymnosperms that dominate large parts of the world's forests. Despite their ecological and economic importance, conifers seemed long out of reach for complete genome sequencing, due in part to their enormous genome size (20-30 Gb) and the highly repetitive nature of their genomes. Technological advances in genome sequencing and assembly enabled the recent publication of three conifer genomes: white spruce (Picea glauca), Norway spruce (Picea abies), and loblolly pine (Pinus taeda). These genome sequences revealed distinctive features compared with other plant genomes and may represent a window into the past of seed plant genomes. This Update highlights recent advances, remaining challenges, and opportunities in light of the publication of the first conifer and gymnosperm genomes.
Collapse
Affiliation(s)
- Amanda R De La Torre
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Inanc Birol
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Jean Bousquet
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Pär K Ingvarsson
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Stefan Jansson
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Steven J M Jones
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Christopher I Keeling
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - John MacKay
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Ove Nilsson
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Kermit Ritland
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Nathaniel Street
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Alvin Yanchuk
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Philipp Zerbe
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| | - Jörg Bohlmann
- Department of Ecology and Environmental Sciences (A.R.D.L.T., P.K.I.) and Umeå Plant Science Center, Department of Plant Physiology (P.K.I., S.J., O.N., N.S.), Umeå University, SE-901 87 Umea, Sweden;Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4S6 (I.B., S.J.M.J.);Canada Research Chair in Forest and Environmental Genomics (J.Bou.) and Center for Forest Research and Institute for Systems and Integrative Biology (J.Bou., J.M.), Université Laval, Quebec, Quebec, Canada G1V 0A6;Michael Smith Laboratories (C.I.K., P.Z., J.Boh.) and Department of Forest and Conservation Sciences (K.R., J.Boh.), University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4; andBritish Columbia Ministry of Forests, Lands, and Natural Resource Operations, Victoria, British Columbia, Canada V8W 9C2 (A.Y.)
| |
Collapse
|
62
|
Abstract
Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
Collapse
|
63
|
Metcalfe CJ, Casane D. Accommodating the load: The transposable element content of very large genomes. Mob Genet Elements 2014; 3:e24775. [PMID: 24616835 PMCID: PMC3943481 DOI: 10.4161/mge.24775] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Revised: 04/20/2013] [Accepted: 04/22/2013] [Indexed: 01/31/2023] Open
Abstract
Very large genomes, that is, those above 20 Gb, are rare but widely distributed throughout the eukaryotes. They are found within the diatoms, dinoflagellates, metazoans and green plants, but so far have not been found in the excavates. There is a known positive correlation between genome size and the proportion of the genome composed of transposable elements (TEs). Very large genomes may therefore be expected to be almost entirely composed of TEs. Of the large genomes examined, in the angiosperms, gymnosperms and the dinoflagellates only a small portion of the genome was identified as TEs, most of these genomes were unidentified and may be novel or diverse TEs. In the salamanders and lungfish, 25 to 47% of the genome were identifiable retrotransposons, that is, TEs that copy themselves before insertion. However, the predominant class of TEs found in the lungfish was not the same as that found in the salamanders. The little data we have at the moment suggests therefore that the diversity and abundance of TEs is variable between taxa with large genomes, similar to patterns found in taxa with smaller genomes. Based on results from the human genome, we suggest that the ‘missing’ portion of the lungfish and salamander genomes are old, highly divergent, and therefore inactive copies of TEs. The data available indicate that, unlike plants with large genomes, neither the lungfish nor the salamanders show an increased risk of extinction. Based on a slow rate of DNA loss in salamanders it has been suggested that the large salamander genome is the result of run-away genome expansion involving genome size increases via TE proliferation associated with reduced recombination rate. We know of no studies on DNA loss or recombination rates in lungfish genomes, however a similar scenario could describe the process of genome expansion in the lungfish. A series of waves of TE transposition and sequence decay would describe the pattern of TE content seen in both the lungfish and the salamanders. The lungfish and salamanders, therefore, may accommodate their large load of TEs because these TEs have accumulated gradually over a long period of time and have been subject to inactivation and decay.
Collapse
Affiliation(s)
- Cushla J Metcalfe
- Instituto de Biociências; Universidade de São Paulo; Cidade Universitária; São Paulo, Brazil
| | - Didier Casane
- Laboratoire Evolution Génomes et Spéciation; UPR9034 CNRS; Gif-sur-Yvette, France ; Université Paris Diderot; Sorbonne Paris Cité, France
| |
Collapse
|
64
|
Karam MJ, Lefèvre F, Dagher-Kharrat MB, Pinosio S, Vendramin G. Genomic exploration and molecular marker development in a large and complex conifer genome using RADseq and mRNAseq. Mol Ecol Resour 2014; 15:601-12. [DOI: 10.1111/1755-0998.12329] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 08/30/2014] [Accepted: 09/05/2014] [Indexed: 01/05/2023]
Affiliation(s)
- M.-J. Karam
- INRA; UR 629 Ecologie des Forêts Méditerranéennes; URFM; Avignon France
| | - F. Lefèvre
- INRA; UR 629 Ecologie des Forêts Méditerranéennes; URFM; Avignon France
| | - M. Bou Dagher-Kharrat
- Laboratoire Caractérisation Génomique des Plantes; Département Sciences de la Vie et de la Terre; Faculté des Sciences; Campus Sciences et Technologies; Université Saint-Joseph; Mar Roukos Mkalles Lebanon
| | - S. Pinosio
- Istituto di Genomica Applicata (IGA); Udine Italy
- Institute of Biosciences and Bioresources; National Research Council; Florence Italy
| | - G.G. Vendramin
- Institute of Biosciences and Bioresources; National Research Council; Florence Italy
| |
Collapse
|
65
|
Lee SI, Kim NS. Transposable elements and genome size variations in plants. Genomics Inform 2014; 12:87-97. [PMID: 25317107 PMCID: PMC4196380 DOI: 10.5808/gi.2014.12.3.87] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 08/18/2014] [Accepted: 08/22/2014] [Indexed: 02/01/2023] Open
Abstract
Although the number of protein-coding genes is not highly variable between plant taxa, the DNA content in their genomes is highly variable, by as much as 2,056-fold from a 1C amount of 0.0648 pg to 132.5 pg. The mean 1C-value in plants is 2.4 pg, and genome size expansion/contraction is lineage-specific in plant taxonomy. Transposable element fractions in plant genomes are also variable, as low as ~3% in small genomes and as high as ~85% in large genomes, indicating that genome size is a linear function of transposable element content. Of the 2 classes of transposable elements, the dynamics of class 1 long terminal repeat (LTR) retrotransposons is a major contributor to the 1C value differences among plants. The activity of LTR retrotransposons is under the control of epigenetic suppressing mechanisms. Also, genome-purging mechanisms have been adopted to counter-balance the genome size amplification. With a wealth of information on whole-genome sequences in plant genomes, it was revealed that several genome-purging mechanisms have been employed, depending on plant taxa. Two genera, Lilium and Fritillaria, are known to have large genomes in angiosperms. There were twice times of concerted genome size evolutions in the family Liliaceae during the divergence of the current genera in Liliaceae. In addition to the LTR retrotransposons, non-LTR retrotransposons and satellite DNAs contributed to the huge genomes in the two genera by possible failure of genome counter-balancing mechanisms.
Collapse
Affiliation(s)
- Sung-Il Lee
- Department of Molecular Bioscience, Kangwon National University, Chuncheon 200-701, Korea
| | - Nam-Soo Kim
- Department of Molecular Bioscience, Kangwon National University, Chuncheon 200-701, Korea
| |
Collapse
|
66
|
Evolution and biogeography of gymnosperms. Mol Phylogenet Evol 2014; 75:24-40. [DOI: 10.1016/j.ympev.2014.02.005] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 02/06/2014] [Accepted: 02/10/2014] [Indexed: 11/20/2022]
|
67
|
Oliver KR, McComb JA, Greene WK. Transposable elements: powerful contributors to angiosperm evolution and diversity. Genome Biol Evol 2014; 5:1886-901. [PMID: 24065734 PMCID: PMC3814199 DOI: 10.1093/gbe/evt141] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Transposable elements (TEs) are a dominant feature of most flowering plant genomes. Together with other accepted facilitators of evolution, accumulating data indicate that TEs can explain much about their rapid evolution and diversification. Genome size in angiosperms is highly correlated with TE content and the overwhelming bulk (>80%) of large genomes can be composed of TEs. Among retro-TEs, long terminal repeats (LTRs) are abundant, whereas DNA-TEs, which are often less abundant than retro-TEs, are more active. Much adaptive or evolutionary potential in angiosperms is due to the activity of TEs (active TE-Thrust), resulting in an extraordinary array of genetic changes, including gene modifications, duplications, altered expression patterns, and exaptation to create novel genes, with occasional gene disruption. TEs implicated in the earliest origins of the angiosperms include the exapted Mustang, Sleeper, and Fhy3/Far1 gene families. Passive TE-Thrust can create a high degree of adaptive or evolutionary potential by engendering ectopic recombination events resulting in deletions, duplications, and karyotypic changes. TE activity can also alter epigenetic patterning, including that governing endosperm development, thus promoting reproductive isolation. Continuing evolution of long-lived resprouter angiosperms, together with genetic variation in their multiple meristems, indicates that TEs can facilitate somatic evolution in addition to germ line evolution. Critical to their success, angiosperms have a high frequency of polyploidy and hybridization, with resultant increased TE activity and introgression, and beneficial gene duplication. Together with traditional explanations, the enhanced genomic plasticity facilitated by TE-Thrust, suggests a more complete and satisfactory explanation for Darwin's "abominable mystery": the spectacular success of the angiosperms.
Collapse
Affiliation(s)
- Keith R Oliver
- School of Veterinary and Life Sciences, Murdoch University, Perth, Western Australia, Australia
| | | | | |
Collapse
|
68
|
Stival Sena J, Giguère I, Boyle B, Rigault P, Birol I, Zuccolo A, Ritland K, Ritland C, Bohlmann J, Jones S, Bousquet J, Mackay J. Evolution of gene structure in the conifer Picea glauca: a comparative analysis of the impact of intron size. BMC PLANT BIOLOGY 2014; 14:95. [PMID: 24734980 PMCID: PMC4108047 DOI: 10.1186/1471-2229-14-95] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 04/09/2014] [Indexed: 05/04/2023]
Abstract
BACKGROUND A positive relationship between genome size and intron length is observed across eukaryotes including Angiosperms plants, indicating a co-evolution of genome size and gene structure. Conifers have very large genomes and longer introns on average than most plants, but impacts of their large genome and longer introns on gene structure has not be described. RESULTS Gene structure was analyzed for 35 genes of Picea glauca obtained from BAC sequencing and genome assembly, including comparisons with A. thaliana, P. trichocarpa and Z. mays. We aimed to develop an understanding of impact of long introns on the structure of individual genes. The number and length of exons was well conserved among the species compared but on average, P. glauca introns were longer and genes had four times more intronic sequence than Arabidopsis, and 2 times more than poplar and maize. However, pairwise comparisons of individual genes gave variable results and not all contrasts were statistically significant. Genes generally accumulated one or a few longer introns in species with larger genomes but the position of long introns was variable between plant lineages. In P. glauca, highly expressed genes generally had more intronic sequence than tissue preferential genes. Comparisons with the Pinus taeda BACs and genome scaffolds showed a high conservation for position of long introns and for sequence of short introns. A survey of 1836 P. glauca genes obtained by sequence capture mostly containing introns <1 Kbp showed that repeated sequences were 10× more abundant in introns than in exons. CONCLUSION Conifers have large amounts of intronic sequence per gene for seed plants due to the presence of few long introns and repetitive element sequences are ubiquitous in their introns. Results indicate a complex landscape of intron sizes and distribution across taxa and between genes with different expression profiles.
Collapse
Affiliation(s)
- Juliana Stival Sena
- Center for Forest Research and Institute for Systems and Integrative Biology, 1030 rue de la Médecine, Université Laval, Québec, QC G1V 0A6, Canada
| | - Isabelle Giguère
- Center for Forest Research and Institute for Systems and Integrative Biology, 1030 rue de la Médecine, Université Laval, Québec, QC G1V 0A6, Canada
| | - Brian Boyle
- Center for Forest Research and Institute for Systems and Integrative Biology, 1030 rue de la Médecine, Université Laval, Québec, QC G1V 0A6, Canada
| | | | - Inanc Birol
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Andrea Zuccolo
- Applied Genomics Institute, Udine 33100, Italy
- Institute of Life Sciences, Scuola Superiore Sant’Anna, Pisa 56127, Italy
| | - Kermit Ritland
- Department of Forest Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Carol Ritland
- Department of Forest Sciences, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Joerg Bohlmann
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Steven Jones
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Jean Bousquet
- Center for Forest Research and Institute for Systems and Integrative Biology, 1030 rue de la Médecine, Université Laval, Québec, QC G1V 0A6, Canada
- Canada Research Chair in Forest Genomics, Université Laval, Québec, QC G1V 0A6, Canada
| | - John Mackay
- Center for Forest Research and Institute for Systems and Integrative Biology, 1030 rue de la Médecine, Université Laval, Québec, QC G1V 0A6, Canada
| |
Collapse
|
69
|
Michael TP. Plant genome size variation: bloating and purging DNA. Brief Funct Genomics 2014; 13:308-17. [DOI: 10.1093/bfgp/elu005] [Citation(s) in RCA: 102] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
70
|
Wegrzyn JL, Liechty JD, Stevens KA, Wu LS, Loopstra CA, Vasquez-Gross HA, Dougherty WM, Lin BY, Zieve JJ, Martínez-García PJ, Holt C, Yandell M, Zimin AV, Yorke JA, Crepeau MW, Puiu D, Salzberg SL, de Jong PJ, Mockaitis K, Main D, Langley CH, Neale DB. Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation. Genetics 2014; 196:891-909. [PMID: 24653211 PMCID: PMC3948814 DOI: 10.1534/genetics.113.159996] [Citation(s) in RCA: 129] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 12/13/2013] [Indexed: 01/08/2023] Open
Abstract
The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20-40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%.
Collapse
Affiliation(s)
- Jill L. Wegrzyn
- Department of Plant Sciences, University of California, Davis, California 95616
| | - John D. Liechty
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Kristian A. Stevens
- Department of Evolution and Ecology, University of California, Davis, California 95616
| | - Le-Shin Wu
- National Center for Genome Analysis Support, Indiana University, Bloomington, Indiana 47405
| | - Carol A. Loopstra
- Department of Ecosystem Science and Management, Texas A&M University, College Station, Texas 77843
| | | | - William M. Dougherty
- Department of Evolution and Ecology, University of California, Davis, California 95616
| | - Brian Y. Lin
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Jacob J. Zieve
- Department of Plant Sciences, University of California, Davis, California 95616
| | | | - Carson Holt
- Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112
| | - Mark Yandell
- Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112
| | - Aleksey V. Zimin
- Institute for Physical Sciences and Technology, University of Maryland, College Park, Maryland 20742
| | - James A. Yorke
- Institute for Physical Sciences and Technology, University of Maryland, College Park, Maryland 20742
- Departments of Mathematics and Physics, University of Maryland, College Park, Maryland 20742
| | - Marc W. Crepeau
- Department of Evolution and Ecology, University of California, Davis, California 95616
| | - Daniela Puiu
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, The Johns Hopkins University, Baltimore, Maryland 21205
| | - Steven L. Salzberg
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, The Johns Hopkins University, Baltimore, Maryland 21205
| | - Pieter J. de Jong
- Children’s Hospital Oakland Research Institute, Oakland, California 94609
| | | | - Doreen Main
- Department of Horticulture, Washington State University, Pullman, Washington 99163
| | - Charles H. Langley
- Department of Evolution and Ecology, University of California, Davis, California 95616
| | - David B. Neale
- Department of Plant Sciences, University of California, Davis, California 95616
| |
Collapse
|
71
|
Goyal N, Ginwal HS. WGDB: Wood Gene Database with search interface. Bioinformation 2014; 10:39-42. [PMID: 24516325 PMCID: PMC3916818 DOI: 10.6026/97320630010039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 01/17/2014] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED Wood quality can be defined in terms of particular end use with the involvement of several traits. Over the last fifteen years researchers have assessed the wood quality traits in forest trees. The wood quality was categorized as: cell wall biochemical traits, fibre properties include the microfibril angle, density and stiffness in loblolly pine [1]. The user friendly and an open-access database has been developed named Wood Gene Database (WGDB) for describing the wood genes along the information of protein and published research articles. It contains 720 wood genes from species namely Pinus, Deodar, fast growing trees namely Poplar, Eucalyptus. WGDB designed to encompass the majority of publicly accessible genes codes for cellulose, hemicellulose and lignin in tree species which are responsive to wood formation and quality. It is an interactive platform for collecting, managing and searching the specific wood genes; it also enables the data mining relate to the genomic information specifically in Arabidopsis thaliana, Populus trichocarpa, Eucalyptus grandis, Pinus taeda, Pinus radiata, Cedrus deodara, Cedrus atlantica. For user convenience, this database is cross linked with public databases namely NCBI, EMBL & Dendrome with the search engine Google for making it more informative and provides bioinformatics tools named BLAST,COBALT. AVAILABILITY The database is freely available on www.wgdb.in.
Collapse
Affiliation(s)
- Neha Goyal
- Division of Genetics and Tree Propagation, Forest Research Institute, Dehradun, U.K, INDIA
| | - H S Ginwal
- Division of Genetics and Tree Propagation, Forest Research Institute, Dehradun, U.K, INDIA
| |
Collapse
|
72
|
Cao H, Zhang L, Tan X, Long H, Shockey JM. Identification, classification and differential expression of oleosin genes in tung tree (Vernicia fordii). PLoS One 2014; 9:e88409. [PMID: 24516650 PMCID: PMC3916434 DOI: 10.1371/journal.pone.0088409] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Accepted: 01/06/2014] [Indexed: 11/19/2022] Open
Abstract
Triacylglycerols (TAG) are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE) are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii), whose seeds are rich in novel TAG with a wide range of industrial applications. The objectives of this study were to identify OLE genes, classify OLE proteins and analyze OLE gene expression in tung trees. We identified five tung tree OLE genes coding for small hydrophobic proteins. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that the five tung OLE genes represented the five OLE subfamilies and all contained the "proline knot" motif (PX5SPX3P) shared among 65 OLE from 19 tree species, including the sequenced genomes of Prunus persica (peach), Populus trichocarpa (poplar), Ricinus communis (castor bean), Theobroma cacao (cacao) and Vitis vinifera (grapevine). Tung OLE1, OLE2 and OLE3 belong to the S type and OLE4 and OLE5 belong to the SM type of Arabidopsis OLE. TaqMan and SYBR Green qPCR methods were used to study the differential expression of OLE genes in tung tree tissues. Expression results demonstrated that 1) All five OLE genes were expressed in developing tung seeds, leaves and flowers; 2) OLE mRNA levels were much higher in seeds than leaves or flowers; 3) OLE1, OLE2 and OLE3 genes were expressed in tung seeds at much higher levels than OLE4 and OLE5 genes; 4) OLE mRNA levels rapidly increased during seed development; and 5) OLE gene expression was well-coordinated with tung oil accumulation in the seeds. These results suggest that tung OLE genes 1-3 probably play major roles in tung oil accumulation and/or oil body development. Therefore, they might be preferred targets for tung oil engineering in transgenic plants.
Collapse
Affiliation(s)
- Heping Cao
- U.S. Department of Agriculture, Agricultural Research Service, Southern Regional Research Center, Commodity Utilization Research Unit, New Orleans, Louisiana, United States of America
- * E-mail:
| | - Lin Zhang
- Key Laboratory of Cultivation and Protection for Non-Wood Forest Trees, Ministry of Education, Central South University of Forestry and Technology, Changsha, Hunan Province, People's Republic of China
| | - Xiaofeng Tan
- Key Laboratory of Cultivation and Protection for Non-Wood Forest Trees, Ministry of Education, Central South University of Forestry and Technology, Changsha, Hunan Province, People's Republic of China
| | - Hongxu Long
- Key Laboratory of Cultivation and Protection for Non-Wood Forest Trees, Ministry of Education, Central South University of Forestry and Technology, Changsha, Hunan Province, People's Republic of China
| | - Jay M. Shockey
- U.S. Department of Agriculture, Agricultural Research Service, Southern Regional Research Center, Commodity Utilization Research Unit, New Orleans, Louisiana, United States of America
| |
Collapse
|
73
|
A high-density gene map of loblolly pine (Pinus taeda L.) based on exome sequence capture genotyping. G3-GENES GENOMES GENETICS 2014; 4:29-37. [PMID: 24192835 PMCID: PMC3887537 DOI: 10.1534/g3.113.008714] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Loblolly pine (Pinus taeda L.) is an economically and ecologically important conifer for which a suite of genomic resources is being generated. Despite recent attempts to sequence the large genome of conifers, their assembly and the positioning of genes remains largely incomplete. The interspecific synteny in pines suggests that a gene-based map would be useful to support genome assemblies and analysis of conifers. To establish a reference gene-based genetic map, we performed exome sequencing of 14729 genes on a mapping population of 72 haploid samples, generating a resource of 7434 sequence variants segregating for 3787 genes. Most markers are single-nucleotide polymorphisms, although short insertions/deletions and multiple nucleotide polymorphisms also were used. Marker segregation in the population was used to generate a high-density, gene-based genetic map. A total of 2841 genes were mapped to pine’s 12 linkage groups with an average of one marker every 0.58 cM. Capture data were used to detect gene presence/absence variations and position 65 genes on the map. We compared the marker order of genes previously mapped in loblolly pine and found high agreement. We estimated that 4123 genes had enough sequencing depth for reliable detection of markers, suggesting a high marker conversation rate of 92% (3787/4123). This is possible because a significant portion of the gene is captured and sequenced, increasing the chances of identifying a polymorphic site for characterization and mapping. This sub-centiMorgan genetic map provides a valuable resource for gene positioning on chromosomes and guide for the assembly of a reference pine genome.
Collapse
|
74
|
Wegrzyn JL, Lin BY, Zieve JJ, Dougherty WM, Martínez-García PJ, Koriabine M, Holtz-Morris A, deJong P, Crepeau M, Langley CH, Puiu D, Salzberg SL, Neale DB, Stevens KA. Insights into the loblolly pine genome: characterization of BAC and fosmid sequences. PLoS One 2013; 8:e72439. [PMID: 24023741 PMCID: PMC3762812 DOI: 10.1371/journal.pone.0072439] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 07/10/2013] [Indexed: 12/22/2022] Open
Abstract
Despite their prevalence and importance, the genome sequences of loblolly pine, Norway spruce, and white spruce, three ecologically and economically important conifer species, are just becoming available to the research community. Following the completion of these large assemblies, annotation efforts will be undertaken to characterize the reference sequences. Accurate annotation of these ancient genomes would be aided by a comprehensive repeat library; however, few studies have generated enough sequence to fully evaluate and catalog their non-genic content. In this paper, two sets of loblolly pine genomic sequence, 103 previously assembled BACs and 90,954 newly sequenced and assembled fosmid scaffolds, were analyzed. Together, this sequence represents 280 Mbp (roughly 1% of the loblolly pine genome) and one of the most comprehensive studies of repetitive elements and genes in a gymnosperm species. A combination of homology and de novo methodologies were applied to identify both conserved and novel repeats. Similarity analysis estimated a repetitive content of 27% that included both full and partial elements. When combined with the de novo investigation, the estimate increased to almost 86%. Over 60% of the repetitive sequence consists of full or partial LTR (long terminal repeat) retrotransposons. Through de novo approaches, 6,270 novel, full-length transposable element families and 9,415 sub-families were identified. Among those 6,270 families, 82% were annotated as single-copy. Several of the novel, high-copy families are described here, with the largest, PtPiedmont, comprising 133 full-length copies. In addition to repeats, analysis of the coding region reported 23 full-length eukaryotic orthologous proteins (KOGS) and another 29 novel or orthologous genes. These discoveries, along with other genomic resources, will be used to annotate conifer genomes and address long-standing questions about gymnosperm evolution.
Collapse
Affiliation(s)
- Jill L. Wegrzyn
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
- * E-mail: (JLW); (KAS)
| | - Brian Y. Lin
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - Jacob J. Zieve
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - William M. Dougherty
- Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
| | - Pedro J. Martínez-García
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - Maxim Koriabine
- Children's Hospital Oakland Research Institute, Oakland, California, United States of America
| | - Ann Holtz-Morris
- Children's Hospital Oakland Research Institute, Oakland, California, United States of America
| | - Pieter deJong
- Children's Hospital Oakland Research Institute, Oakland, California, United States of America
| | - Marc Crepeau
- Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
| | - Charles H. Langley
- Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
| | - Daniela Puiu
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Steven L. Salzberg
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - David B. Neale
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - Kristian A. Stevens
- Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
- * E-mail: (JLW); (KAS)
| |
Collapse
|
75
|
Leushkin EV, Sutormin RA, Nabieva ER, Penin AA, Kondrashov AS, Logacheva MD. The miniature genome of a carnivorous plant Genlisea aurea contains a low number of genes and short non-coding sequences. BMC Genomics 2013; 14:476. [PMID: 23855885 PMCID: PMC3728226 DOI: 10.1186/1471-2164-14-476] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2013] [Accepted: 07/09/2013] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Genlisea aurea (Lentibulariaceae) is a carnivorous plant with unusually small genome size - 63.6 Mb - one of the smallest known among higher plants. Data on the genome sizes and the phylogeny of Genlisea suggest that this is a derived state within the genus. Thus, G. aurea is an excellent model organism for studying evolutionary mechanisms of genome contraction. RESULTS Here we report sequencing and de novo draft assembly of G. aurea genome. The assembly consists of 10,687 contigs of the total length of 43.4 Mb and includes 17,755 complete and partial protein-coding genes. Its comparison with the genome of Mimulus guttatus, another representative of higher core Lamiales clade, reveals striking differences in gene content and length of non-coding regions. CONCLUSIONS Genome contraction was a complex process, which involved gene loss and reduction of lengths of introns and intergenic regions, but not intron loss. The gene loss is more frequent for the genes that belong to multigenic families indicating that genetic redundancy is an important prerequisite for genome size reduction.
Collapse
Affiliation(s)
- Evgeny V Leushkin
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskye Gory 1-73, Moscow 119992, Russia
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow 127994, Russia
| | - Roman A Sutormin
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskye Gory 1-73, Moscow 119992, Russia
| | - Elena R Nabieva
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskye Gory 1-73, Moscow 119992, Russia
| | - Aleksey A Penin
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskye Gory 1-73, Moscow 119992, Russia
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow 127994, Russia
- Department of Genetics, Lomonosov Moscow State University, Moscow 119992, Russia
| | - Alexey S Kondrashov
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskye Gory 1-73, Moscow 119992, Russia
- Department of Ecology and Evolutionary Biology and Life Sciences Institute, University of Michigan, Ann Arbor, MI 48109, USA
| | - Maria D Logacheva
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskye Gory 1-73, Moscow 119992, Russia
- A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
76
|
Neale DB, Langley CH, Salzberg SL, Wegrzyn JL. Open access to tree genomes: the path to a better forest. Genome Biol 2013; 14:120. [PMID: 23796049 PMCID: PMC3706761 DOI: 10.1186/gb-2013-14-6-120] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
An open-access culture and a well-developed comparative-genomics infrastructure must be developed in forest trees to derive the full potential of genome sequencing in this diverse group of plants that are the dominant species in much of the earth's terrestrial ecosystems.
Collapse
|
77
|
Källman T, Chen J, Gyllenstrand N, Lagercrantz U. A significant fraction of 21-nucleotide small RNA originates from phased degradation of resistance genes in several perennial species. PLANT PHYSIOLOGY 2013; 162:741-54. [PMID: 23580593 PMCID: PMC3668067 DOI: 10.1104/pp.113.214643] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Accepted: 03/29/2013] [Indexed: 05/18/2023]
Abstract
Small RNAs (sRNAs), including microRNA (miRNA) and short-interfering RNA (siRNA), are important in the regulation of diverse biological processes. Comparative studies of sRNAs from plants have mainly focused on miRNA, even though they constitute a mere fraction of the total sRNA diversity. In this study, we report results from an in-depth analysis of the sRNA population from the conifer spruce (Picea abies) and compared the results with those of a range of plant species. The vast majority of sRNA sequences in spruce can be assigned to 21-nucleotide-long siRNA sequences, of which a large fraction originate from the degradation of transcribed sequences related to nucleotide-binding site-leucine-rich repeat-type resistance genes. Over 90% of all genes predicted to contain either a Toll/interleukin-1 receptor or nucleotide-binding site domain showed evidence of siRNA degradation. The data further suggest that this phased degradation of resistance-related genes is initiated from miRNA-guided cleavage, often by an abundant 22-nucleotide miRNA. Comparative analysis over a range of plant species revealed a huge variation in the abundance of this phenomenon. The process seemed to be virtually absent in several species, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and nonvascular plants, while particularly high frequencies were observed in spruce, grape (Vitis vinifera), and poplar (Populus trichocarpa). This divergent pattern might reflect a mechanism to limit runaway transcription of these genes in species with rapidly expanding nucleotide-binding site-leucine-rich repeat gene families. Alternatively, it might reflect variation in a counter-counter defense mechanism between plant species.
Collapse
|
78
|
The Norway spruce genome sequence and conifer genome evolution. Nature 2013; 497:579-84. [DOI: 10.1038/nature12211] [Citation(s) in RCA: 1065] [Impact Index Per Article: 96.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2013] [Accepted: 04/22/2013] [Indexed: 12/18/2022]
|
79
|
Matvienko M, Kozik A, Froenicke L, Lavelle D, Martineau B, Perroud B, Michelmore R. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride. PLoS One 2013; 8:e55913. [PMID: 23409088 PMCID: PMC3568094 DOI: 10.1371/journal.pone.0055913] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Accepted: 01/04/2013] [Indexed: 12/22/2022] Open
Abstract
Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.
Collapse
Affiliation(s)
- Marta Matvienko
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Alexander Kozik
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Lutz Froenicke
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Dean Lavelle
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Belinda Martineau
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Bertrand Perroud
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Richard Michelmore
- Genome Center, University of California Davis, Davis, California, United States of America
- Departments of Plant Sciences, Molecular and Cellular Biology, and Medical Microbiology and Immunology, University of California Davis, Davis, California, United States of America
| |
Collapse
|
80
|
Mackay J, Dean JFD, Plomion C, Peterson DG, Cánovas FM, Pavy N, Ingvarsson PK, Savolainen O, Guevara MÁ, Fluch S, Vinceti B, Abarca D, Díaz-Sala C, Cervera MT. Towards decoding the conifer giga-genome. PLANT MOLECULAR BIOLOGY 2012; 80:555-69. [PMID: 22960864 DOI: 10.1007/s11103-012-9961-7] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 08/24/2012] [Indexed: 05/21/2023]
Abstract
Several new initiatives have been launched recently to sequence conifer genomes including pines, spruces and Douglas-fir. Owing to the very large genome sizes ranging from 18 to 35 gigabases, sequencing even a single conifer genome had been considered unattainable until the recent throughput increases and cost reductions afforded by next generation sequencers. The purpose of this review is to describe the context for these new initiatives. A knowledge foundation has been acquired in several conifers of commercial and ecological interest through large-scale cDNA analyses, construction of genetic maps and gene mapping studies aiming to link phenotype and genotype. Exploratory sequencing in pines and spruces have pointed out some of the unique properties of these giga-genomes and suggested strategies that may be needed to extract value from their sequencing. The hope is that recent and pending developments in sequencing technology will contribute to rapidly filling the knowledge vacuum surrounding their structure, contents and evolution. Researchers are also making plans to use comparative analyses that will help to turn the data into a valuable resource for enhancing and protecting the world's conifer forests.
Collapse
Affiliation(s)
- John Mackay
- Center for Forest Research, Institute for Integrative and Systems Biology, Université Laval, Québec, Québec G1V 0A6, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
81
|
Pavy N, Pelgas B, Laroche J, Rigault P, Isabel N, Bousquet J. A spruce gene map infers ancient plant genome reshuffling and subsequent slow evolution in the gymnosperm lineage leading to extant conifers. BMC Biol 2012; 10:84. [PMID: 23102090 PMCID: PMC3519789 DOI: 10.1186/1741-7007-10-84] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/26/2012] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Seed plants are composed of angiosperms and gymnosperms, which diverged from each other around 300 million years ago. While much light has been shed on the mechanisms and rate of genome evolution in flowering plants, such knowledge remains conspicuously meagre for the gymnosperms. Conifers are key representatives of gymnosperms and the sheer size of their genomes represents a significant challenge for characterization, sequencing and assembling. RESULTS To gain insight into the macro-organisation and long-term evolution of the conifer genome, we developed a genetic map involving 1,801 spruce genes. We designed a statistical approach based on kernel density estimation to analyse gene density and identified seven gene-rich isochors. Groups of co-localizing genes were also found that were transcriptionally co-regulated, indicative of functional clusters. Phylogenetic analyses of 157 gene families for which at least two duplicates were mapped on the spruce genome indicated that ancient gene duplicates shared by angiosperms and gymnosperms outnumbered conifer-specific duplicates by a ratio of eight to one. Ancient duplicates were much more translocated within and among spruce chromosomes than conifer-specific duplicates, which were mostly organised in tandem arrays. Both high synteny and collinearity were also observed between the genomes of spruce and pine, two conifers that diverged more than 100 million years ago. CONCLUSIONS Taken together, these results indicate that much genomic evolution has occurred in the seed plant lineage before the split between gymnosperms and angiosperms, and that the pace of evolution of the genome macro-structure has been much slower in the gymnosperm lineage leading to extent conifers than that seen for the same period of time in flowering plants. This trend is largely congruent with the contrasted rates of diversification and morphological evolution observed between these two groups of seed plants.
Collapse
Affiliation(s)
- Nathalie Pavy
- Canada Research Chair in Forest and Environmental Genomics, Centre for Forest Research and Institute for Systems and Integrative Biology, Université Laval, Québec, Québec G1V 0A6, Canada.
| | | | | | | | | | | |
Collapse
|
82
|
Raherison E, Rigault P, Caron S, Poulin PL, Boyle B, Verta JP, Giguère I, Bomal C, Bohlmann J, MacKay J. Transcriptome profiling in conifers and the PiceaGenExpress database show patterns of diversification within gene families and interspecific conservation in vascular gene expression. BMC Genomics 2012; 13:434. [PMID: 22931377 PMCID: PMC3534630 DOI: 10.1186/1471-2164-13-434] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2012] [Accepted: 07/11/2012] [Indexed: 12/22/2022] Open
Abstract
Background Conifers have very large genomes (13 to 30 Gigabases) that are mostly uncharacterized although extensive cDNA resources have recently become available. This report presents a global overview of transcriptome variation in a conifer tree and documents conservation and diversity of gene expression patterns among major vegetative tissues. Results An oligonucleotide microarray was developed from Picea glauca and P. sitchensis cDNA datasets. It represents 23,853 unique genes and was shown to be suitable for transcriptome profiling in several species. A comparison of secondary xylem and phelloderm tissues showed that preferential expression in these vascular tissues was highly conserved among Picea spp. RNA-Sequencing strongly confirmed tissue preferential expression and provided a robust validation of the microarray design. A small database of transcription profiles called PiceaGenExpress was developed from over 150 hybridizations spanning eight major tissue types. In total, transcripts were detected for 92% of the genes on the microarray, in at least one tissue. Non-annotated genes were predominantly expressed at low levels in fewer tissues than genes of known or predicted function. Diversity of expression within gene families may be rapidly assessed from PiceaGenExpress. In conifer trees, dehydrins and late embryogenesis abundant (LEA) osmotic regulation proteins occur in large gene families compared to angiosperms. Strong contrasts and low diversity was observed in the dehydrin family, while diverse patterns suggested a greater degree of diversification among LEAs. Conclusion Together, the oligonucleotide microarray and the PiceaGenExpress database represent the first resource of this kind for gymnosperm plants. The spruce transcriptome analysis reported here is expected to accelerate genetic studies in the large and important group comprised of conifer trees.
Collapse
Affiliation(s)
- Elie Raherison
- Center for Forest Research and Institute for Integrative and Systems Biology, Université Laval, Québec, QC, Canada, G1V 0A6
| | | | | | | | | | | | | | | | | | | |
Collapse
|
83
|
Rocheta M, Carvalho L, Viegas W, Morais-Cecílio L. Corky, a gypsy-like retrotransposon is differentially transcribed in Quercus suber tissues. BMC Res Notes 2012; 5:432. [PMID: 22888907 PMCID: PMC3465219 DOI: 10.1186/1756-0500-5-432] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2012] [Accepted: 08/02/2012] [Indexed: 12/01/2022] Open
Abstract
Background Transposable elements (TEs) make up a large part of eukaryotic genomes. Due to their repetitive nature and to the fact that they harbour regulatory signals, TEs can be responsible for chromosomal rearrangements, movement of gene sequences and evolution of gene regulation and function. Retrotransposon ubiquity raises the question about their function in genomes and most are transcriptionally inactive due to rearrangements that compromise their activity. However, the activity of TEs is currently considered to have been one of the major processes in genome evolution. Findings We report on the characterization of a transcriptionally active gypsy-like retrotransposon (named Corky) from Quercus suber, in a comparative and quantitative study of expression levels in different tissues and distinct developmental stages through RT-qPCR. We observed Corky’s differential transcription levels in all the tissues analysed. Conclusions These results document that Corky’s transcription levels are not constant. Nevertheless, they depend upon the developmental stage, the tissue analysed and the potential occurring events during an individuals’ life span. This modulation brought upon by different developmental and environmental influences suggests an involvement of Corky in stress response and during development.
Collapse
Affiliation(s)
- Margarida Rocheta
- Centro de Botânica Aplicada à Agricultura, Departamento de Recursos Naturais, Ambiente e Território, Instituto Superior de Agronomia, Universidade Técnica de Lisboa, Portugal.
| | | | | | | |
Collapse
|
84
|
Leitch AR, Leitch IJ. Ecological and genetic factors linked to contrasting genome dynamics in seed plants. THE NEW PHYTOLOGIST 2012; 194:629-646. [PMID: 22432525 DOI: 10.1111/j.1469-8137.2012.04105.x] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The large-scale replacement of gymnosperms by angiosperms in many ecological niches over time and the huge disparity in species numbers have led scientists to explore factors (e.g. polyploidy, developmental systems, floral evolution) that may have contributed to the astonishing rise of angiosperm diversity. Here, we explore genomic and ecological factors influencing seed plant genomes. This is timely given the recent surge in genomic data. We compare and contrast the genomic structure and evolution of angiosperms and gymnosperms and find that angiosperm genomes are more dynamic and diverse, particularly amongst the herbaceous species. Gymnosperms typically have reduced frequencies of a number of processes (e.g. polyploidy) that have shaped the genomes of other vascular plants and have alternative mechanisms to suppress genome dynamism (e.g. epigenetics and activity of transposable elements). Furthermore, the presence of several characters in angiosperms (e.g. herbaceous habit, short minimum generation time) has enabled them to exploit new niches and to be viable with small population sizes, where the power of genetic drift can outweigh that of selection. Together these processes have led to increased rates of genetic divergence and faster fixation times of variation in many angiosperms compared with gymnosperms.
Collapse
Affiliation(s)
- A R Leitch
- School of Biological and Chemical Sciences, Queen Mary University of London, E1 4NS, UK
| | - I J Leitch
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, UK
| |
Collapse
|
85
|
Kelly LJ, Leitch IJ. Exploring giant plant genomes with next-generation sequencing technology. Chromosome Res 2012; 19:939-53. [PMID: 21987187 DOI: 10.1007/s10577-011-9246-z] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Genome size in plants is characterised by its extraordinary range. Although it appears that the majority of plants have small genomes, in several lineages genome size has reached giant proportions. The recent advent of next-generation sequencing (NGS) methods has for the first time made detailed analysis of even the largest of plant genomes a possibility. In this review, we highlight investigations that have utilised NGS for the study of plants with large genomes, as well as describing ongoing work that aims to harness the power of these technologies to gain insights into their evolution. In addition, we emphasise some areas of research where the use of NGS has the potential to generate significant advances in our current understanding of how plant genomes evolve. Finally, we discuss some of the future developments in sequencing technology that may further improve our ability to explore the content and evolutionary dynamics of the very largest genomes.
Collapse
Affiliation(s)
- Laura J Kelly
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, UK.
| | | |
Collapse
|
86
|
Buschiazzo E, Ritland C, Bohlmann J, Ritland K. Slow but not low: genomic comparisons reveal slower evolutionary rate and higher dN/dS in conifers compared to angiosperms. BMC Evol Biol 2012; 12:8. [PMID: 22264329 PMCID: PMC3328258 DOI: 10.1186/1471-2148-12-8] [Citation(s) in RCA: 101] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Accepted: 01/20/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Comparative genomics can inform us about the processes of mutation and selection across diverse taxa. Among seed plants, gymnosperms have been lacking in genomic comparisons. Recent EST and full-length cDNA collections for two conifers, Sitka spruce (Picea sitchensis) and loblolly pine (Pinus taeda), together with full genome sequences for two angiosperms, Arabidopsis thaliana and poplar (Populus trichocarpa), offer an opportunity to infer the evolutionary processes underlying thousands of orthologous protein-coding genes in gymnosperms compared with an angiosperm orthologue set. RESULTS Based upon pairwise comparisons of 3,723 spruce and pine orthologues, we found an average synonymous genetic distance (dS) of 0.191, and an average dN/dS ratio of 0.314. Using a fossil-established divergence time of 140 million years between spruce and pine, we extrapolated a nucleotide substitution rate of 0.68 × 10(-9) synonymous substitutions per site per year. When compared to angiosperms, this indicates a dramatically slower rate of nucleotide substitution rates in conifers: on average 15-fold. Coincidentally, we found a three-fold higher dN/dS for the spruce-pine lineage compared to the poplar-Arabidopsis lineage. This joint occurrence of a slower evolutionary rate in conifers with higher dN/dS, and possibly positive selection, showcases the uniqueness of conifer genome evolution. CONCLUSIONS Our results are in line with documented reduced nucleotide diversity, conservative genome evolution and low rates of diversification in conifers on the one hand and numerous examples of local adaptation in conifers on the other hand. We propose that reduced levels of nucleotide mutation in large and long-lived conifer trees, coupled with large effective population size, were the main factors leading to slow substitution rates but retention of beneficial mutations.
Collapse
Affiliation(s)
- Emmanuel Buschiazzo
- Department of Forest Sciences, University of British Columbia, 2424 Main Mall, Vancouver, BC V6T 1Z4, Canada.
| | | | | | | |
Collapse
|
87
|
Exploring Diversification and Genome Size Evolution in Extant Gymnosperms through Phylogenetic Synthesis. ACTA ACUST UNITED AC 2012. [DOI: 10.1155/2012/292857] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Gymnosperms, comprising cycads, Ginkgo, Gnetales, and conifers, represent one of the major groups of extant seed plants. Yet compared to angiosperms, little is known about the patterns of diversification and genome evolution in gymnosperms. We assembled a phylogenetic supermatrix containing over 4.5 million nucleotides from 739 gymnosperm taxa. Although 93.6% of the cells in the supermatrix are empty, the data reveal many strongly supported nodes that are generally consistent with previous phylogenetic analyses, including weak support for Gnetales sister to Pinaceae. A lineage through time plot suggests elevated rates of diversification within the last 100 million years, and there is evidence of shifts in diversification rates in several clades within cycads and conifers. A likelihood-based analysis of the evolution of genome size in 165 gymnosperms finds evidence for heterogeneous rates of genome size evolution due to an elevated rate in Pinus.
Collapse
|
88
|
Extended linkage disequilibrium in noncoding regions in a conifer, Cryptomeria japonica. Genetics 2011; 190:1145-8. [PMID: 22209904 DOI: 10.1534/genetics.111.136697] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We measured linkage disequilibrium in mostly noncoding regions of Cryptomeria japonica, a conifer belonging to Cupressaceae. Linkage disequilibrium was extensive and did not decay even at a distance of 100 kb. The average estimate of the population recombination rate per base pair was 1.55 × 10(-5) and was <1/70 of that in the coding regions. We discuss the impact of low recombination rates in a large part of the genome on association studies.
Collapse
|
89
|
Holt C, Yandell M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 2011; 12:491. [PMID: 22192575 PMCID: PMC3280279 DOI: 10.1186/1471-2105-12-491] [Citation(s) in RCA: 1295] [Impact Index Per Article: 99.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Accepted: 12/22/2011] [Indexed: 12/30/2022] Open
Abstract
Background Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. Results We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. Conclusions MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.
Collapse
Affiliation(s)
- Carson Holt
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
| | | |
Collapse
|
90
|
Liu W, Thummasuwan S, Sehgal SK, Chouvarine P, Peterson DG. Characterization of the genome of bald cypress. BMC Genomics 2011; 12:553. [PMID: 22077969 PMCID: PMC3228858 DOI: 10.1186/1471-2164-12-553] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2011] [Accepted: 11/11/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Bald cypress (Taxodium distichum var. distichum) is a coniferous tree of tremendous ecological and economic importance. It is a member of the family Cupressaceae which also includes cypresses, redwoods, sequoias, thujas, and junipers. While the bald cypress genome is more than three times the size of the human genome, its 1C DNA content is amongst the smallest of any conifer. To learn more about the genome of bald cypress and gain insight into the evolution of Cupressaceae genomes, we performed a Cot analysis and used Cot filtration to study Taxodium DNA. Additionally, we constructed a 6.7 genome-equivalent BAC library that we screened with known Taxodium genes and select repeats. RESULTS The bald cypress genome is composed of 90% repetitive DNA with most sequences being found in low to mid copy numbers. The most abundant repeats are found in fewer than 25,000 copies per genome. Approximately 7.4% of the genome is single/low-copy DNA (i.e., sequences found in 1 to 5 copies). Sequencing of highly repetitive Cot clones indicates that most Taxodium repeats are highly diverged from previously characterized plant repeat sequences. The bald cypress BAC library consists of 606,336 clones (average insert size of 113 kb) and collectively provides 6.7-fold genome equivalent coverage of the bald cypress genome. Macroarray screening with known genes produced, on average, about 1.5 positive clones per probe per genome-equivalent. Library screening with Cot-1 DNA revealed that approximately 83% of BAC clones contain repetitive sequences iterated 103 to 104 times per genome. CONCLUSIONS The BAC library for bald cypress is the first to be generated for a conifer species outside of the family Pinaceae. The Taxodium BAC library was shown to be useful in gene isolation and genome characterization and should be an important tool in gymnosperm comparative genomics, physical mapping, genome sequencing, and gene/polymorphism discovery. The single/low-copy (SL) component of bald cypress is 4.6 times the size of the Arabidopsis genome. As suggested for other gymnosperms, the large amount of SL DNA in Taxodium is likely the result of divergence among ancient repeat copies and gene/pseudogene duplication.
Collapse
Affiliation(s)
- Wenxuan Liu
- Mississippi Genome Exploration Laboratory and Department of Plant & Soil Sciences, Mississippi State University, Mississippi State, MS 39762, USA
| | | | | | | | | |
Collapse
|
91
|
Abstract
Over the past two decades, research in forest tree genomics has lagged behind that of model and agricultural systems. However, genomic research in forest trees is poised to enter into an important and productive phase owing to the advent of next-generation sequencing technologies, the enormous genetic diversity in forest trees and the need to mitigate the effects of climate change. Research on long-lived woody perennials is extending our molecular knowledge of complex life histories and adaptations to the environment - enriching a field that has traditionally drawn biological inference from a few short-lived herbaceous species.
Collapse
Affiliation(s)
- David B Neale
- Department of Plant Sciences, University of California, Davis, California 95616, USA.
| | | |
Collapse
|
92
|
Hao D, Yang L, Xiao P. The first insight into the Taxus genome via fosmid library construction and end sequencing. Mol Genet Genomics 2011; 285:197-205. [PMID: 21207064 DOI: 10.1007/s00438-010-0598-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2010] [Accepted: 12/13/2010] [Indexed: 11/26/2022]
Abstract
Taxus mairei is a critically endangered and commercially important cultured medicinal gymnosperm in China and forms an important medicinal resource, but the research of its genome is absent. In this study, we constructed a T. mairei fosmid library and analyzed the fosmid end sequences to provide a preliminary assessment of the genome. The library consists of one million clones with an average insert size of about 39 kb, amounting to 3.9 genome equivalents. Fosmid stability assays indicate that T. mairei DNA was stable during propagation in the fosmid system. End sequencing of both 5' and 3' ends of 968 individual clones generated 1,923 sequences after trimming, with an average sequence length of 839 bp. BLASTN searches of the nr and EST databases of GenBank and BLASTX searches of the nr database resulted in 560 (29.1%) significant hits (E < e(-5)). Repetitive sequences analysis revealed that 20.8% of end sequences are repetitive elements, which were composed of retroelements, DNA transposons, satellites, simple repeats, and low complexity sequences. The distribution pattern of various repeat types was found to be more similar to the gymnosperm Pinus and Picea than to the monocot and dicot. The satellites of T. mairei were significantly longer than those of P. taeda and P. glauca. The tetra-nucleotide repeats of T. mairei were much longer than those of P. glauca and P. taeda. The fosmid library and the fosmid end sequences, for the first time, will serve as a useful resource for large-scale genome sequencing, physical mapping, SSR marker development and positional cloning, and provide a better understanding of the Taxus genome.
Collapse
Affiliation(s)
- DaCheng Hao
- Biotechnology Institute, Dalian Jiaotong University, Dalian 116028, China.
| | | | | |
Collapse
|