101
|
Russell JR, Bayer M, Booth C, Cardle L, Hackett CA, Hedley PE, Jorgensen L, Morris JA, Brennan RM. Identification, utilisation and mapping of novel transcriptome-based markers from blackcurrant (Ribes nigrum). BMC PLANT BIOLOGY 2011; 11:147. [PMID: 22035129 PMCID: PMC3217869 DOI: 10.1186/1471-2229-11-147] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 10/28/2011] [Indexed: 05/27/2023]
Abstract
BACKGROUND Deep-level second generation sequencing (2GS) technologies are now being applied to non-model species as a viable and favourable alternative to Sanger sequencing. Large-scale SNP discovery was undertaken in blackcurrant (Ribes nigrum L.) using transcriptome-based 2GS 454 sequencing on the parental genotypes of a reference mapping population, to generate large numbers of novel markers for the construction of a high-density linkage map. RESULTS Over 700,000 reads were produced, from which a total of 7,000 SNPs were found. A subset of polymorphic SNPs was selected to develop a 384-SNP OPA assay using the Illumina BeadXpress platform. Additionally, the data enabled identification of 3,000 novel EST-SSRs. The selected SNPs and SSRs were validated across diverse Ribes germplasm, including mapping populations and other selected Ribes species.SNP-based maps were developed from two blackcurrant mapping populations, incorporating 48% and 27% of assayed SNPs respectively. A relatively high proportion of visually monomorphic SNPs were investigated further by quantitative trait mapping of theta score outputs from BeadStudio analysis, and this enabled additional SNPs to be placed on the two maps. CONCLUSIONS The use of 2GS technology for the development of markers is superior to previously described methods, in both numbers of markers and biological informativeness of those markers. Whilst the numbers of reads and assembled contigs were comparable to similar sized studies of other non-model species, here a high proportion of novel genes were discovered across a wide range of putative function and localisation. The potential utility of markers developed using the 2GS approach in downstream breeding applications is discussed.
Collapse
Affiliation(s)
- Joanne R Russell
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Micha Bayer
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Clare Booth
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Linda Cardle
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Christine A Hackett
- Biomathematics and Statistics Scotland, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Pete E Hedley
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Linzi Jorgensen
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Jenny A Morris
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Rex M Brennan
- Cell & Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| |
Collapse
|
102
|
Łabaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. ACTA ACUST UNITED AC 2011; 27:i383-91. [PMID: 21685096 PMCID: PMC3117338 DOI: 10.1093/bioinformatics/btr247] [Citation(s) in RCA: 110] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. Results: We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error <20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. Contact:rnaseq10@boku.ac.at Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paweł P Łabaj
- Boku University Vienna, 1190 Muthgasse 18, Vienna, Austria
| | | | | | | | | | | |
Collapse
|
103
|
Lowe CD, Mello LV, Samatar N, Martin LE, Montagnes DJS, Watts PC. The transcriptome of the novel dinoflagellate Oxyrrhis marina (Alveolata: Dinophyceae): response to salinity examined by 454 sequencing. BMC Genomics 2011; 12:519. [PMID: 22014029 PMCID: PMC3209475 DOI: 10.1186/1471-2164-12-519] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 10/20/2011] [Indexed: 11/30/2022] Open
Abstract
Background The heterotrophic dinoflagellate Oxyrrhis marina is increasingly studied in experimental, ecological and evolutionary contexts. Its basal phylogenetic position within the dinoflagellates make O. marina useful for understanding the origin of numerous unusual features of the dinoflagellate lineage; its broad distribution has lent O. marina to the study of protist biogeography; and nutritive flexibility and eurytopy have made it a common lab rat for the investigation of physiological responses of marine heterotrophic flagellates. Nevertheless, genome-scale resources for O. marina are scarce. Here we present a 454-based transcriptome survey for this organism. In addition, we assess sequence read abundance, as a proxy for gene expression, in response to salinity, an environmental factor potentially important in determining O. marina spatial distributions. Results Sequencing generated ~57 Mbp of data which assembled into 7, 398 contigs. Approximately 24% of contigs were nominally identified by BLAST. A further clustering of contigs (at ≥ 90% identity) revealed 164 transcript variant clusters, the largest of which (Phosphoribosylaminoimidazole-succinocarboxamide synthase) was composed of 28 variants displaying predominately synonymous variation. In a genomic context, a sample of 5 different genes were demonstrated to occur as tandem repeats, separated by short (~200-340 bp) inter-genic regions. For HSP90 several intergenic variants were detected suggesting a potentially complex genomic arrangement. In response to salinity, analysis of 454 read abundance highlighted 9 and 20 genes over or under expressed at 50 PSU, respectively. However, 454 read abundance and subsequent qPCR validation did not correlate well - suggesting that measures of gene expression via ad hoc analysis of sequence read abundance require careful interpretation. Conclusion Here we indicate that tandem gene arrangements and the occurrence of multiple transcribed gene variants are common and indicate potentially complex genomic arrangements in O. marina. Comparison of the reported data set with existing O. marina and other dinoflagellates ESTs indicates little sequence overlap likely as a result of the relatively limited extent of genome scale sequence data currently available for the dinoflagellates. This is one of the first 454-based transcriptome surveys of an ancestral dinoflagellate taxon and will undoubtedly prove useful for future comparative studies aimed at reconstructing the origin of novel features of the dinoflagellates.
Collapse
Affiliation(s)
- Chris D Lowe
- Department of Evolution, Ecology, and Behaviour, Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK.
| | | | | | | | | | | |
Collapse
|
104
|
Sloan DB, Keller SR, Berardi AE, Sanderson BJ, Karpovich JF, Taylor DR. De novo transcriptome assembly and polymorphism detection in the flowering plant Silene vulgaris (Caryophyllaceae). Mol Ecol Resour 2011; 12:333-43. [PMID: 21999839 DOI: 10.1111/j.1755-0998.2011.03079.x] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Members of the angiosperm genus Silene are widely used in studies of ecology and evolution, but available genomic and population genetic resources within Silene remain limited. Deep transcriptome (i.e. expressed sequence tag or EST) sequencing has proven to be a rapid and cost-effective means to characterize gene content and identify polymorphic markers in non-model organisms. In this study, we report the results of 454 GS-FLX Titanium sequencing of a polyA-selected and normalized cDNA library from Silene vulgaris. The library was generated from a single pool of transcripts, combining RNA from leaf, root and floral tissue from three genetically divergent European subpopulations of S. vulgaris. A single full-plate 454 run produced 959,520 reads totalling 363.6 Mb of sequence data with an average read length of 379.0 bp after quality trimming and removal of custom library adaptors. We assembled 832,251 (86.7%) of these reads into 40,964 contigs, which have a total length of 25.4 Mb and can be organized into 18,178 graph-based clusters or 'isogroups'. Assembled sequences were annotated based on homology to genes in multiple public databases. Analysis of sequence variants identified 13,432 putative single-nucleotide polymorphisms (SNPs) and 1320 simple sequence repeats (SSRs) that are candidates for microsatellite analysis. Estimates of nucleotide diversity from 1577 contigs were used to generate genome-wide distributions that revealed several outliers with high diversity. All of these resources are publicly available through NCBI and/or our website (http://silenegenomics.biology.virginia.edu) and should provide valuable genomic and population genetic tools for the Silene research community.
Collapse
Affiliation(s)
- Daniel B Sloan
- Department of Biology, University of Virginia, Charlottesville, VA 22903, USA.
| | | | | | | | | | | |
Collapse
|
105
|
Cantu D, Pearce SP, Distelfeld A, Christiansen MW, Uauy C, Akhunov E, Fahima T, Dubcovsky J. Effect of the down-regulation of the high Grain Protein Content (GPC) genes on the wheat transcriptome during monocarpic senescence. BMC Genomics 2011; 12:492. [PMID: 21981858 PMCID: PMC3209470 DOI: 10.1186/1471-2164-12-492] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Accepted: 10/07/2011] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Increasing the nutrient concentration of wheat grains is important to ameliorate nutritional deficiencies in many parts of the world. Proteins and nutrients in the wheat grain are largely derived from the remobilization of degraded leaf molecules during monocarpic senescence. The down-regulation of the NAC transcription factor Grain Protein Content (GPC) in transgenic wheat plants delays senescence (>3 weeks) and reduces the concentration of protein, Zn and Fe in the grain (>30%), linking senescence and nutrient remobilization.Based on the early and rapid up-regulation of GPC in wheat flag leaves after anthesis, we hypothesized that this transcription factor is an early regulator of monocarpic senescence. To test this hypothesis, we used high-throughput mRNA-seq technologies to characterize the effect of the GPC down-regulation on the wheat flag-leaf transcriptome 12 days after anthesis. At this early stage of senescence GPC transcript levels are significantly lower in transgenic GPC-RNAi plants than in the wild type, but there are still no visible phenotypic differences between genotypes. RESULTS We generated 1.4 million 454 reads from early senescing flag leaves (average ~350 nt) and assembled 1.2 million into 30,497 contigs that were used as a reference to map 145 million Illumina reads from three wild type and four GPC-RNAi plants. Following normalization and statistical testing, we identified a set of 691 genes differentially regulated by GPC (431 ≥ 2-fold change). Transcript level ratios between transgenic and wild type plants showed a high correlation (R = 0.83) between qRT-PCR and Illumina results, providing independent validation of the mRNA-seq approach. A set of differentially expressed genes were analyzed across an early senescence time-course. CONCLUSIONS Monocarpic senescence is an active process characterized by large-scale changes in gene expression which begins considerably before the appearance of visual symptoms of senescence. The mRNA-seq approach used here was able to detect small differences in transcript levels during the early stages of senescence. This resulted in an extensive list of GPC-regulated genes, which includes transporters, hormone regulated genes, and transcription factors. These GPC-regulated genes, particularly those up-regulated during senescence, provide valuable entry points to dissect the early stages of monocarpic senescence and nutrient remobilization in wheat.
Collapse
Affiliation(s)
- Dario Cantu
- Department of Plant Sciences, University of California Davis, USA
| | - Stephen P Pearce
- Department of Plant Sciences, University of California Davis, USA
| | - Assaf Distelfeld
- Department of Plant Sciences, University of California Davis, USA
- Faculty of Life Sciences, Plant Sciences, Tel Aviv University, Israel
| | - Michael W Christiansen
- Department of Plant Sciences, University of California Davis, USA
- Aarhus University, Faculty of Agricultural Sciences, Department of Genetics and Biotechnology, Slagelse, Denmark
| | - Cristobal Uauy
- Department of Crop Genetics, John Innes Centre, Norwich, UK
| | - Eduard Akhunov
- Department of Plant Pathology, Kansas State University, USA
| | - Tzion Fahima
- Department of Evolutionary and Environmental Biology, University of Haifa, Israel
| | - Jorge Dubcovsky
- Department of Plant Sciences, University of California Davis, USA
| |
Collapse
|
106
|
Su CL, Chao YT, Alex Chang YC, Chen WC, Chen CY, Lee AY, Hwa KT, Shih MC. De novo assembly of expressed transcripts and global analysis of the Phalaenopsis aphrodite transcriptome. PLANT & CELL PHYSIOLOGY 2011; 52:1501-14. [PMID: 21771864 DOI: 10.1093/pcp/pcr097] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Being one of the largest families in the angiosperms, Orchidaceae display a great biodiversity resulting from adaptation to diverse habitats. Genomic information on orchids is rather limited, despite their unique and interesting biological features, thus impeding advanced molecular research. Here we report a strategy to integrate sequence outputs of the moth orchid, Phalaenopsis aphrodite, from two high-throughput sequencing platform technologies, Roche 454 and Illumina/Solexa, in order to maximize assembly efficiency. Tissues collected for cDNA library preparation included a wide range of vegetative and reproductive tissues. We also designed an effective workflow for annotation and functional analysis. After assembly and trimming processes, 233,823 unique sequences were obtained. Among them, 42,590 contigs averaging 875 bp in length were annotated to protein-coding genes, of which 7,263 coding genes were found to be nearly full length. The sequence accuracy of the assembled contigs was validated to be as high as 99.9%. Genes with tissue-specific expression were also categorized by profiling analysis with RNA-Seq. Gene products targeted to specific subcellular localizations were identified by their annotations. We concluded that, with proper assembly to combine outputs of next-generation sequencing platforms, transcriptome information can be enriched in gene discovery, functional annotation and expression profiling of a non-model organism.
Collapse
Affiliation(s)
- Chun-lin Su
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | | | | | | | | | | | | | | |
Collapse
|
107
|
Ekblom R, Galindo J. Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity (Edinb) 2011; 107:1-15. [PMID: 21139633 PMCID: PMC3186121 DOI: 10.1038/hdy.2010.152] [Citation(s) in RCA: 633] [Impact Index Per Article: 48.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2010] [Revised: 09/10/2010] [Accepted: 11/02/2010] [Indexed: 11/09/2022] Open
Abstract
As most biologists are probably aware, technological advances in molecular biology during the last few years have opened up possibilities to rapidly generate large-scale sequencing data from non-model organisms at a reasonable cost. In an era when virtually any study organism can 'go genomic', it is worthwhile to review how this may impact molecular ecology. The first studies to put the next generation sequencing (NGS) to the test in ecologically well-characterized species without previous genome information were published in 2007 and the beginning of 2008. Since then several studies have followed in their footsteps, and a large number are undoubtedly under way. This review focuses on how NGS has been, and can be, applied to ecological, population genetic and conservation genetic studies of non-model species, in which there is no (or very limited) genomic resources. Our aim is to draw attention to the various possibilities that are opening up using the new technologies, but we also highlight some of the pitfalls and drawbacks with these methods. We will try to provide a snapshot of the current state of the art for this rapidly advancing and expanding field of research and give some likely directions for future developments.
Collapse
Affiliation(s)
- R Ekblom
- Department of Animal and Plant Sciences, University of Sheffield, UK.
| | | |
Collapse
|
108
|
Cook N, Aziz N, Hedley PE, Morris J, Milne L, Karley AJ, Hubbard SF, Russell JR. Transcriptome sequencing of an ecologically important graminivorous sawfly: a resource for marker development. CONSERV GENET RESOUR 2011. [DOI: 10.1007/s12686-011-9459-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
109
|
Blanca JM, Pascual L, Ziarsolo P, Nuez F, Cañizares J. ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using next generation sequence. BMC Genomics 2011; 12:285. [PMID: 21635747 PMCID: PMC3124440 DOI: 10.1186/1471-2164-12-285] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Accepted: 06/02/2011] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. RESULTS The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. CONCLUSIONS ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf.comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin.
Collapse
Affiliation(s)
- Jose M Blanca
- Instituto de Conservación y Mejora de la Agrodiversidad Valenciana (COMAV), Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain
| | - Laura Pascual
- Instituto de Conservación y Mejora de la Agrodiversidad Valenciana (COMAV), Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain
| | - Peio Ziarsolo
- Instituto de Conservación y Mejora de la Agrodiversidad Valenciana (COMAV), Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain
| | - Fernando Nuez
- Instituto de Conservación y Mejora de la Agrodiversidad Valenciana (COMAV), Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain
| | - Joaquin Cañizares
- Instituto de Conservación y Mejora de la Agrodiversidad Valenciana (COMAV), Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain
| |
Collapse
|
110
|
Kaur S, Cogan NOI, Pembleton LW, Shinozuka M, Savin KW, Materne M, Forster JW. Transcriptome sequencing of lentil based on second-generation technology permits large-scale unigene assembly and SSR marker discovery. BMC Genomics 2011; 12:265. [PMID: 21609489 PMCID: PMC3113791 DOI: 10.1186/1471-2164-12-265] [Citation(s) in RCA: 150] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2011] [Accepted: 05/25/2011] [Indexed: 12/05/2022] Open
Abstract
Background Lentil (Lens culinaris Medik.) is a cool-season grain legume which provides a rich source of protein for human consumption. In terms of genomic resources, lentil is relatively underdeveloped, in comparison to other Fabaceae species, with limited available data. There is hence a significant need to enhance such resources in order to identify novel genes and alleles for molecular breeding to increase crop productivity and quality. Results Tissue-specific cDNA samples from six distinct lentil genotypes were sequenced using Roche 454 GS-FLX Titanium technology, generating c. 1.38 × 106 expressed sequence tags (ESTs). De novo assembly generated a total of 15,354 contigs and 68,715 singletons. The complete unigene set was sequence-analysed against genome drafts of the model legume species Medicago truncatula and Arabidopsis thaliana to identify 12,639, and 7,476 unique matches, respectively. When compared to the genome of Glycine max, a total of 20,419 unique hits were observed corresponding to c. 31% of the known gene space. A total of 25,592 lentil unigenes were subsequently annoated from GenBank. Simple sequence repeat (SSR)-containing ESTs were identified from consensus sequences and a total of 2,393 primer pairs were designed. A subset of 192 EST-SSR markers was screened for validation across a panel 12 cultivated lentil genotypes and one wild relative species. A total of 166 primer pairs obtained successful amplification, of which 47.5% detected genetic polymorphism. Conclusions A substantial collection of ESTs has been developed from sequence analysis of lentil genotypes using second-generation technology, permitting unigene definition across a broad range of functional categories. As well as providing resources for functional genomics studies, the unigene set has permitted significant enhancement of the number of publicly-available molecular genetic markers as tools for improvement of this species.
Collapse
Affiliation(s)
- Sukhjiwan Kaur
- Department of Primary Industries, Biosciences Research Division, Victorian AgriBiosciences Centre, La Trobe University Research and Development Park, Bundoora, Australia
| | | | | | | | | | | | | |
Collapse
|
111
|
Franssen SU, Shrestha RP, Bräutigam A, Bornberg-Bauer E, Weber APM. Comprehensive transcriptome analysis of the highly complex Pisum sativum genome using next generation sequencing. BMC Genomics 2011; 12:227. [PMID: 21569327 PMCID: PMC3224338 DOI: 10.1186/1471-2164-12-227] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 05/11/2011] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND The garden pea, Pisum sativum, is among the best-investigated legume plants and of significant agro-commercial relevance. Pisum sativum has a large and complex genome and accordingly few comprehensive genomic resources exist. RESULTS We analyzed the pea transcriptome at the highest possible amount of accuracy by current technology. We used next generation sequencing with the Roche/454 platform and evaluated and compared a variety of approaches, including diverse tissue libraries, normalization, alternative sequencing technologies, saturation estimation and diverse assembly strategies. We generated libraries from flowers, leaves, cotyledons, epi- and hypocotyl, and etiolated and light treated etiolated seedlings, comprising a total of 450 megabases. Libraries were assembled into 324,428 unigenes in a first pass assembly.A second pass assembly reduced the amount to 81,449 unigenes but caused a significant number of chimeras. Analyses of the assemblies identified the assembly step as a major possibility for improvement. By recording frequencies of Arabidopsis orthologs hit by randomly drawn reads and fitting parameters of the saturation curve we concluded that sequencing was exhaustive. For leaf libraries we found normalization allows partial recovery of expression strength aside the desired effect of increased coverage. Based on theoretical and biological considerations we concluded that the sequence reads in the database tagged the vast majority of transcripts in the aerial tissues. A pathway representation analysis showed the merits of sampling multiple aerial tissues to increase the number of tagged genes. All results have been made available as a fully annotated database in fasta format. CONCLUSIONS We conclude that the approach taken resulted in a high quality - dataset which serves well as a first comprehensive reference set for the model legume pea. We suggest future deep sequencing transcriptome projects of species lacking a genomics backbone will need to concentrate mainly on resolving the issues of redundancy and paralogy during transcriptome assembly.
Collapse
Affiliation(s)
- Susanne U Franssen
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Hüfferstrasse 1, 48149 Münster, Germany
| | - Roshan P Shrestha
- Department of Plant Biology, Michigan State University, 48823 East Lansing, MI, USA
| | - Andrea Bräutigam
- Department of Plant Biology, Michigan State University, 48823 East Lansing, MI, USA
- Institute of Plant Biochemistry, Heinrich Heine University, Universitätsstrasse 1, 40225 Düsseldorf, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Hüfferstrasse 1, 48149 Münster, Germany
| | - Andreas PM Weber
- Department of Plant Biology, Michigan State University, 48823 East Lansing, MI, USA
- Institute of Plant Biochemistry, Heinrich Heine University, Universitätsstrasse 1, 40225 Düsseldorf, Germany
| |
Collapse
|
112
|
Yang SS, Tu ZJ, Cheung F, Xu WW, Lamb JFS, Jung HJG, Vance CP, Gronwald JW. Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems. BMC Genomics 2011; 12:199. [PMID: 21504589 DOI: 10.1186/1471-2164-12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2010] [Accepted: 04/19/2011] [Indexed: 05/23/2023] Open
Abstract
BACKGROUND Alfalfa, [Medicago sativa (L.) sativa], a widely-grown perennial forage has potential for development as a cellulosic ethanol feedstock. However, the genomics of alfalfa, a non-model species, is still in its infancy. The recent advent of RNA-Seq, a massively parallel sequencing method for transcriptome analysis, provides an opportunity to expand the identification of alfalfa genes and polymorphisms, and conduct in-depth transcript profiling. RESULTS Cell walls in stems of alfalfa genotype 708 have higher cellulose and lower lignin concentrations compared to cell walls in stems of genotype 773. Using the Illumina GA-II platform, a total of 198,861,304 expression sequence tags (ESTs, 76 bp in length) were generated from cDNA libraries derived from elongating stem (ES) and post-elongation stem (PES) internodes of 708 and 773. In addition, 341,984 ESTs were generated from ES and PES internodes of genotype 773 using the GS FLX Titanium platform. The first alfalfa (Medicago sativa) gene index (MSGI 1.0) was assembled using the Sanger ESTs available from GenBank, the GS FLX Titanium EST sequences, and the de novo assembled Illumina sequences. MSGI 1.0 contains 124,025 unique sequences including 22,729 tentative consensus sequences (TCs), 22,315 singletons and 78,981 pseudo-singletons. We identified a total of 1,294 simple sequence repeats (SSR) among the sequences in MSGI 1.0. In addition, a total of 10,826 single nucleotide polymorphisms (SNPs) were predicted between the two genotypes. Out of 55 SNPs randomly selected for experimental validation, 47 (85%) were polymorphic between the two genotypes. We also identified numerous allelic variations within each genotype. Digital gene expression analysis identified numerous candidate genes that may play a role in stem development as well as candidate genes that may contribute to the differences in cell wall composition in stems of the two genotypes. CONCLUSIONS Our results demonstrate that RNA-Seq can be successfully used for gene identification, polymorphism detection and transcript profiling in alfalfa, a non-model, allogamous, autotetraploid species. The alfalfa gene index assembled in this study, and the SNPs, SSRs and candidate genes identified can be used to improve alfalfa as a forage crop and cellulosic feedstock.
Collapse
Affiliation(s)
- S Samuel Yang
- USDA-Agricultural Research Service, Plant Science Research Unit, St, Paul, MN 55108, USA.
| | | | | | | | | | | | | | | |
Collapse
|
113
|
Yang SS, Tu ZJ, Cheung F, Xu WW, Lamb JFS, Jung HJG, Vance CP, Gronwald JW. Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems. BMC Genomics 2011; 12:199. [PMID: 21504589 PMCID: PMC3112146 DOI: 10.1186/1471-2164-12-199] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2010] [Accepted: 04/19/2011] [Indexed: 02/08/2023] Open
Abstract
Background Alfalfa, [Medicago sativa (L.) sativa], a widely-grown perennial forage has potential for development as a cellulosic ethanol feedstock. However, the genomics of alfalfa, a non-model species, is still in its infancy. The recent advent of RNA-Seq, a massively parallel sequencing method for transcriptome analysis, provides an opportunity to expand the identification of alfalfa genes and polymorphisms, and conduct in-depth transcript profiling. Results Cell walls in stems of alfalfa genotype 708 have higher cellulose and lower lignin concentrations compared to cell walls in stems of genotype 773. Using the Illumina GA-II platform, a total of 198,861,304 expression sequence tags (ESTs, 76 bp in length) were generated from cDNA libraries derived from elongating stem (ES) and post-elongation stem (PES) internodes of 708 and 773. In addition, 341,984 ESTs were generated from ES and PES internodes of genotype 773 using the GS FLX Titanium platform. The first alfalfa (Medicago sativa) gene index (MSGI 1.0) was assembled using the Sanger ESTs available from GenBank, the GS FLX Titanium EST sequences, and the de novo assembled Illumina sequences. MSGI 1.0 contains 124,025 unique sequences including 22,729 tentative consensus sequences (TCs), 22,315 singletons and 78,981 pseudo-singletons. We identified a total of 1,294 simple sequence repeats (SSR) among the sequences in MSGI 1.0. In addition, a total of 10,826 single nucleotide polymorphisms (SNPs) were predicted between the two genotypes. Out of 55 SNPs randomly selected for experimental validation, 47 (85%) were polymorphic between the two genotypes. We also identified numerous allelic variations within each genotype. Digital gene expression analysis identified numerous candidate genes that may play a role in stem development as well as candidate genes that may contribute to the differences in cell wall composition in stems of the two genotypes. Conclusions Our results demonstrate that RNA-Seq can be successfully used for gene identification, polymorphism detection and transcript profiling in alfalfa, a non-model, allogamous, autotetraploid species. The alfalfa gene index assembled in this study, and the SNPs, SSRs and candidate genes identified can be used to improve alfalfa as a forage crop and cellulosic feedstock.
Collapse
Affiliation(s)
- S Samuel Yang
- USDA-Agricultural Research Service, Plant Science Research Unit, St, Paul, MN 55108, USA.
| | | | | | | | | | | | | | | |
Collapse
|
114
|
Angeloni F, Wagemaker CAM, Jetten MSM, Op den Camp HJM, Janssen-Megens EM, Francoijs KJ, Stunnenberg HG, Ouborg NJ. De novo transcriptome characterization and development of genomic tools for Scabiosa columbaria L. using next-generation sequencing techniques. Mol Ecol Resour 2011; 11:662-74. [PMID: 21676196 DOI: 10.1111/j.1755-0998.2011.02990.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Next-generation sequencing (NGS) technologies are increasingly applied in many organisms, including nonmodel organisms that are important for ecological and conservation purposes. Illumina and 454 sequencing are among the most used NGS technologies and have been shown to produce optimal results at reasonable costs when used together. Here, we describe the combined application of these two NGS technologies to characterize the transcriptome of a plant species of ecological and conservation relevance for which no genomic resource is available, Scabiosa columbaria. We obtained 528,557 reads from a 454 GS-FLX run and a total of 28,993,627 reads from two lanes of an Illumina GAII single run. After read trimming, the de novo assembly of both types of reads produced 109,630 contigs. Both the contigs and the >75 bp remaining singletons were blasted against the Uniprot/Swissprot database, resulting in 29,676 and 10,515 significant hits, respectively. Based on sequence similarity with known gene products, these sequences represent at least 12,516 unique genes, most of which are well covered by contig sequences. In addition, we identified 4320 microsatellite loci, of which 856 had flanking sequences suitable for PCR primer design. We also identified 75,054 putative SNPs. This annotated sequence collection and the relative molecular markers represent a main genomic resource for S. columbaria which should contribute to future research in conservation and population biology studies. Our results demonstrate the utility of NGS technologies as starting point for the development of genomic tools in nonmodel but ecologically important species.
Collapse
Affiliation(s)
- F Angeloni
- Department of Molecular Ecology, Radboud University Nijmegen, Institute for Water and Wetland Research, Heyendaalseweg 135, 6525 AJ Nijmegen, the Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
115
|
EVERETT MV, GRAU ED, SEEB JE. Short reads and nonmodel species: exploring the complexities of next-generation sequence assembly and SNP discovery in the absence of a reference genome. Mol Ecol Resour 2011; 11 Suppl 1:93-108. [DOI: 10.1111/j.1755-0998.2010.02969.x] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
116
|
Der JP, Barker MS, Wickett NJ, dePamphilis CW, Wolf PG. De novo characterization of the gametophyte transcriptome in bracken fern, Pteridium aquilinum. BMC Genomics 2011; 12:99. [PMID: 21303537 PMCID: PMC3042945 DOI: 10.1186/1471-2164-12-99] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2010] [Accepted: 02/08/2011] [Indexed: 11/23/2022] Open
Abstract
Background Because of their phylogenetic position and unique characteristics of their biology and life cycle, ferns represent an important lineage for studying the evolution of land plants. Large and complex genomes in ferns combined with the absence of economically important species have been a barrier to the development of genomic resources. However, high throughput sequencing technologies are now being widely applied to non-model species. We leveraged the Roche 454 GS-FLX Titanium pyrosequencing platform in sequencing the gametophyte transcriptome of bracken fern (Pteridium aquilinum) to develop genomic resources for evolutionary studies. Results 681,722 quality and adapter trimmed reads totaling 254 Mbp were assembled de novo into 56,256 unique sequences (i.e. unigenes) with a mean length of 547.2 bp and a total assembly size of 30.8 Mbp with an average read-depth coverage of 7.0×. We estimate that 87% of the complete transcriptome has been sequenced and that all transcripts have been tagged. 61.8% of the unigenes had blastx hits in the NCBI nr protein database, representing 22,596 unique best hits. The longest open reading frame in 52.2% of the unigenes had positive domain matches in InterProScan searches. We assigned 46.2% of the unigenes with a GO functional annotation and 16.0% with an enzyme code annotation. Enzyme codes were used to retrieve and color KEGG pathway maps. A comparative genomics approach revealed a substantial proportion of genes expressed in bracken gametophytes to be shared across the genomes of Arabidopsis, Selaginella and Physcomitrella, and identified a substantial number of potentially novel fern genes. By comparing the list of Arabidopsis genes identified by blast with a list of gametophyte-specific Arabidopsis genes taken from the literature, we identified a set of potentially conserved gametophyte specific genes. We screened unigenes for repetitive sequences to identify 548 potentially-amplifiable simple sequence repeat loci and 689 expressed transposable elements. Conclusions This study is the first comprehensive transcriptome analysis for a fern and represents an important scientific resource for comparative evolutionary and functional genomics studies in land plants. We demonstrate the utility of high-throughput sequencing of a normalized cDNA library for de novo transcriptome characterization and gene discovery in a non-model plant.
Collapse
Affiliation(s)
- Joshua P Der
- Department of Biology and Center for Integrated Biosystems, Utah State University, Logan, UT 84322-5305, USA.
| | | | | | | | | |
Collapse
|
117
|
Techniques of cell type-specific transcriptome analysis and applications in researches of sexual plant reproduction. ACTA ACUST UNITED AC 2011. [DOI: 10.1007/s11515-011-1090-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
118
|
Logacheva MD, Kasianov AS, Vinogradov DV, Samigullin TH, Gelfand MS, Makeev VJ, Penin AA. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum). BMC Genomics 2011; 12:30. [PMID: 21232141 PMCID: PMC3027159 DOI: 10.1186/1471-2164-12-30] [Citation(s) in RCA: 124] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Accepted: 01/13/2011] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales--a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. RESULTS Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. CONCLUSIONS 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated.
Collapse
Affiliation(s)
- Maria D Logacheva
- Department of Evolutionary Biochemistry, A.N. Belozersky Institute of Physico-Chemical Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
- Evolutionary Genomics Laboratory, Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
| | - Artem S Kasianov
- V.A. Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy V Vinogradov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
| | - Tagir H Samigullin
- Department of Evolutionary Biochemistry, A.N. Belozersky Institute of Physico-Chemical Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Mikhail S Gelfand
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Vsevolod J Makeev
- V.A. Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- N.I Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- State Scientific Institute of Genetics and Selection of Industrial Microorganisms, GosNIIgenetika, Moscow, Russia
| | - Aleksey A Penin
- Evolutionary Genomics Laboratory, Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
- Department of Genetics, Biological faculty, M.V. Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
119
|
|
120
|
Bräutigam A, Kajala K, Wullenweber J, Sommer M, Gagneul D, Weber KL, Carr KM, Gowik U, Maß J, Lercher MJ, Westhoff P, Hibberd JM, Weber AP. An mRNA blueprint for C4 photosynthesis derived from comparative transcriptomics of closely related C3 and C4 species. PLANT PHYSIOLOGY 2011; 155:142-56. [PMID: 20543093 PMCID: PMC3075794 DOI: 10.1104/pp.110.159442] [Citation(s) in RCA: 181] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2010] [Accepted: 06/09/2010] [Indexed: 05/18/2023]
Abstract
C(4) photosynthesis involves alterations to the biochemistry, cell biology, and development of leaves. Together, these modifications increase the efficiency of photosynthesis, and despite the apparent complexity of the pathway, it has evolved at least 45 times independently within the angiosperms. To provide insight into the extent to which gene expression is altered between C(3) and C(4) leaves, and to identify candidates associated with the C(4) pathway, we used massively parallel mRNA sequencing of closely related C(3) (Cleome spinosa) and C(4) (Cleome gynandra) species. Gene annotation was facilitated by the phylogenetic proximity of Cleome and Arabidopsis (Arabidopsis thaliana). Up to 603 transcripts differ in abundance between these C(3) and C(4) leaves. These include 17 transcription factors, putative transport proteins, as well as genes that in Arabidopsis are implicated in chloroplast movement and expansion, plasmodesmatal connectivity, and cell wall modification. These are all characteristics known to alter in a C(4) leaf but that previously had remained undefined at the molecular level. We also document large shifts in overall transcription profiles for selected functional classes. Our approach defines the extent to which transcript abundance in these C(3) and C(4) leaves differs, provides a blueprint for the NAD-malic enzyme C(4) pathway operating in a dicotyledon, and furthermore identifies potential regulators. We anticipate that comparative transcriptomics of closely related species will provide deep insight into the evolution of other complex traits.
Collapse
|
121
|
Wang Z, Fang B, Chen J, Zhang X, Luo Z, Huang L, Chen X, Li Y. De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweet potato (Ipomoea batatas). BMC Genomics 2010; 11:726. [PMID: 21182800 PMCID: PMC3016421 DOI: 10.1186/1471-2164-11-726] [Citation(s) in RCA: 333] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2010] [Accepted: 12/24/2010] [Indexed: 12/31/2022] Open
Abstract
Background The tuberous root of sweetpotato is an important agricultural and biological organ. There are not sufficient transcriptomic and genomic data in public databases for understanding of the molecular mechanism underlying the tuberous root formation and development. Thus, high throughput transcriptome sequencing is needed to generate enormous transcript sequences from sweetpotato root for gene discovery and molecular marker development. Results In this study, more than 59 million sequencing reads were generated using Illumina paired-end sequencing technology. De novo assembly yielded 56,516 unigenes with an average length of 581 bp. Based on sequence similarity search with known proteins, a total of 35,051 (62.02%) genes were identified. Out of these annotated unigenes, 5,046 and 11,983 unigenes were assigned to gene ontology and clusters of orthologous group, respectively. Searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) indicated that 17,598 (31.14%) unigenes were mapped to 124 KEGG pathways, and 11,056 were assigned to metabolic pathways, which were well represented by carbohydrate metabolism and biosynthesis of secondary metabolite. In addition, 4,114 cDNA SSRs (cSSRs) were identified as potential molecular markers in our unigenes. One hundred pairs of PCR primers were designed and used for validation of the amplification and assessment of the polymorphism in genomic DNA pools. The result revealed that 92 primer pairs were successfully amplified in initial screening tests. Conclusion This study generated a substantial fraction of sweetpotato transcript sequences, which can be used to discover novel genes associated with tuberous root formation and development and will also make it possible to construct high density microarrays for further characterization of gene expression profiles during these processes. Thousands of cSSR markers identified in the present study can enrich molecular markers and will facilitate marker-assisted selection in sweetpotato breeding. Overall, these sequences and markers will provide valuable resources for the sweetpotato community. Additionally, these results also suggested that transcriptome analysis based on Illumina paired-end sequencing is a powerful tool for gene discovery and molecular marker development for non-model species, especially those with large and complex genome.
Collapse
Affiliation(s)
- Zhangying Wang
- Crops Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640 PR China
| | | | | | | | | | | | | | | |
Collapse
|
122
|
Desgagné-Penix I, Khan MF, Schriemer DC, Cram D, Nowak J, Facchini PJ. Integration of deep transcriptome and proteome analyses reveals the components of alkaloid metabolism in opium poppy cell cultures. BMC PLANT BIOLOGY 2010; 10:252. [PMID: 21083930 PMCID: PMC3095332 DOI: 10.1186/1471-2229-10-252] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2010] [Accepted: 11/18/2010] [Indexed: 05/18/2023]
Abstract
BACKGROUND Papaver somniferum (opium poppy) is the source for several pharmaceutical benzylisoquinoline alkaloids including morphine, the codeine and sanguinarine. In response to treatment with a fungal elicitor, the biosynthesis and accumulation of sanguinarine is induced along with other plant defense responses in opium poppy cell cultures. The transcriptional induction of alkaloid metabolism in cultured cells provides an opportunity to identify components of this process via the integration of deep transcriptome and proteome databases generated using next-generation technologies. RESULTS A cDNA library was prepared for opium poppy cell cultures treated with a fungal elicitor for 10 h. Using 454 GS-FLX Titanium pyrosequencing, 427,369 expressed sequence tags (ESTs) with an average length of 462 bp were generated. Assembly of these sequences yielded 93,723 unigenes, of which 23,753 were assigned Gene Ontology annotations. Transcripts encoding all known sanguinarine biosynthetic enzymes were identified in the EST database, 5 of which were represented among the 50 most abundant transcripts. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) of total protein extracts from cell cultures treated with a fungal elicitor for 50 h facilitated the identification of 1,004 proteins. Proteins were fractionated by one-dimensional SDS-PAGE and digested with trypsin prior to LC-MS/MS analysis. Query of an opium poppy-specific EST database substantially enhanced peptide identification. Eight out of 10 known sanguinarine biosynthetic enzymes and many relevant primary metabolic enzymes were represented in the peptide database. CONCLUSIONS The integration of deep transcriptome and proteome analyses provides an effective platform to catalogue the components of secondary metabolism, and to identify genes encoding uncharacterized enzymes. The establishment of corresponding transcript and protein databases generated by next-generation technologies in a system with a well-defined metabolite profile facilitates an improved linkage between genes, enzymes, and pathway components. The proteome database represents the most relevant alkaloid-producing enzymes, compared with the much deeper and more complete transcriptome library. The transcript database contained full-length mRNAs encoding most alkaloid biosynthetic enzymes, which is a key requirement for the functional characterization of novel gene candidates.
Collapse
Affiliation(s)
- Isabel Desgagné-Penix
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
| | - Morgan F Khan
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
- National Research Council-Plant Biotechnology Institute, Saskatoon, Saskatchewan, S7N 0W9, Canada
| | - David C Schriemer
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
- National Research Council-Plant Biotechnology Institute, Saskatoon, Saskatchewan, S7N 0W9, Canada
| | - Dustin Cram
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
- National Research Council-Plant Biotechnology Institute, Saskatoon, Saskatchewan, S7N 0W9, Canada
| | - Jacek Nowak
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
- National Research Council-Plant Biotechnology Institute, Saskatoon, Saskatchewan, S7N 0W9, Canada
| | - Peter J Facchini
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
| |
Collapse
|
123
|
Bräutigam A, Gowik U. What can next generation sequencing do for you? Next generation sequencing as a valuable tool in plant research. PLANT BIOLOGY (STUTTGART, GERMANY) 2010; 12:831-41. [PMID: 21040298 DOI: 10.1111/j.1438-8677.2010.00373.x] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Next generation sequencing (NGS) technologies have opened fascinating opportunities for the analysis of plants with and without a sequenced genome on a genomic scale. During the last few years, NGS methods have become widely available and cost effective. They can be applied to a wide variety of biological questions, from the sequencing of complete eukaryotic genomes and transcriptomes, to the genome-scale analysis of DNA-protein interactions. In this review, we focus on the use of NGS for plant transcriptomics, including gene discovery, transcript quantification and marker discovery for non-model plants, as well as transcript annotation and quantification, small RNA discovery and antisense transcription analysis for model plants. We discuss the experimental design for analysis of plants with and without a sequenced genome, including considerations on sampling, RNA preparation, sequencing platforms and bioinformatics tools for data analysis. NGS technologies offer exciting new opportunities for the plant sciences, especially for work on plants without a sequenced genome, since large sequence resources can be generated at moderate cost.
Collapse
Affiliation(s)
- A Bräutigam
- Institute of Plant Biochemistry, Heinrich-Heine University, Düsseldorf, Germany.
| | | |
Collapse
|
124
|
Abstract
Ecological speciation is the process by which barriers to gene flow between populations evolve due to adaptive divergence via natural selection. A relatively unexplored area in ecological speciation is the role of gene expression. Gene expression may be associated with ecologically important phenotypes not evident from morphology and play a role during colonization of new environments. Here we review two potential roles of gene expression in ecological speciation: (1) its indirect role in facilitating population persistence and (2) its direct role in contributing to genetically based reproductive isolation. We find indirect evidence that gene expression facilitates population persistence, but direct tests are lacking. We also find clear examples of gene expression having effects on phenotypic traits and adaptive genetic divergence, but links to the evolution of reproductive isolation itself remain indirect. Gene expression during adaptive divergence seems to often involve complex genetic architectures controlled by gene networks, regulatory regions, and “eQTL hotspots.” Nonetheless, we review how approaches for isolating the functional mutations contributing to adaptive divergence are proving to be successful. The study of gene expression has promise for increasing our understanding ecological speciation, particularly when integrative approaches are applied.
Collapse
Affiliation(s)
- Scott A Pavey
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
| | | | | | | |
Collapse
|
125
|
Kumar S, Blaxter ML. Comparing de novo assemblers for 454 transcriptome data. BMC Genomics 2010; 11:571. [PMID: 20950480 PMCID: PMC3091720 DOI: 10.1186/1471-2164-11-571] [Citation(s) in RCA: 214] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2010] [Accepted: 10/16/2010] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. RESULTS Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. CONCLUSIONS Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended.
Collapse
Affiliation(s)
- Sujai Kumar
- Institute of Evolutionary Biology, University of Edinburgh, West Mains Road, Edinburgh EH9 3JT, UK
| | - Mark L Blaxter
- Institute of Evolutionary Biology, University of Edinburgh, West Mains Road, Edinburgh EH9 3JT, UK
| |
Collapse
|
126
|
Zahn LM, Ma X, Altman NS, Zhang Q, Wall PK, Tian D, Gibas CJ, Gharaibeh R, Leebens-Mack JH, dePamphilis CW, Ma H. Comparative transcriptomics among floral organs of the basal eudicot Eschscholzia californica as reference for floral evolutionary developmental studies. Genome Biol 2010; 11:R101. [PMID: 20950453 PMCID: PMC3218657 DOI: 10.1186/gb-2010-11-10-r101] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2010] [Revised: 08/03/2010] [Accepted: 10/15/2010] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Molecular genetic studies of floral development have concentrated on several core eudicots and grasses (monocots), which have canalized floral forms. Basal eudicots possess a wider range of floral morphologies than the core eudicots and grasses and can serve as an evolutionary link between core eudicots and monocots, and provide a reference for studies of other basal angiosperms. Recent advances in genomics have enabled researchers to profile gene activities during floral development, primarily in the eudicot Arabidopsis thaliana and the monocots rice and maize. However, our understanding of floral developmental processes among the basal eudicots remains limited. RESULTS Using a recently generated expressed sequence tag (EST) set, we have designed an oligonucleotide microarray for the basal eudicot Eschscholzia californica (California poppy). We performed microarray experiments with an interwoven-loop design in order to characterize the E. californica floral transcriptome and to identify differentially expressed genes in flower buds with pre-meiotic and meiotic cells, four floral organs at preanthesis stages (sepals, petals, stamens and carpels), developing fruits, and leaves. CONCLUSIONS Our results provide a foundation for comparative gene expression studies between eudicots and basal angiosperms. We identified whorl-specific gene expression patterns in E. californica and examined the floral expression of several gene families. Interestingly, most E. californica homologs of Arabidopsis genes important for flower development, except for genes encoding MADS-box transcription factors, show different expression patterns between the two species. Our comparative transcriptomics study highlights the unique evolutionary position of E. californica compared with basal angiosperms and core eudicots.
Collapse
Affiliation(s)
- Laura M Zahn
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Current address: American Association for the Advancement of Science, 1200 New York Avenue NW, Washington DC 20005, USA
| | - Xuan Ma
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- The Intercollege Graduate Program in Cell and Developmental Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Naomi S Altman
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Qing Zhang
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Current address: 2367 Setter Run Lane, State College, PA 16802, USA
| | - P Kerr Wall
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Current address: BASF Plant Science, 26 Davis Drive, Research Triangle Park, NC 27709, USA
| | - Donglan Tian
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Current address: Department of Entomology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Cynthia J Gibas
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA
| | - Raad Gharaibeh
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA
| | - James H Leebens-Mack
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Current address: Department of Plant Biology, University of Georgia, 120 Carlton Street, Athens, GA 30602, USA
| | - Claude W dePamphilis
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Hong Ma
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- The Intercollege Graduate Program in Cell and Developmental Biology, The Pennsylvania State University, University Park, PA 16802, USA
- State Key Laboratory of Genetic Engineering and School of Life Sciences, Fudan University, 220 Handan Road, Shanghai 200433, China
- Institutes of Biomedical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai 200032, China
| |
Collapse
|
127
|
Riggins CW, Peng Y, Stewart CN, Tranel PJ. Characterization of de novo transcriptome for waterhemp (Amaranthus tuberculatus) using GS-FLX 454 pyrosequencing and its application for studies of herbicide target-site genes. PEST MANAGEMENT SCIENCE 2010; 66:1042-52. [PMID: 20680963 DOI: 10.1002/ps.2006] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
BACKGROUND Waterhemp is a model for weed genomics research in part because it possesses many interesting biological characteristics, rapidly evolves resistance to herbicides and has a solid foundation of previous genetics work. To develop further the genomics resources for waterhemp, the transcriptome was sequenced using Roche GS-FLX 454 pyrosequencing technology. RESULTS Pyrosequencing produced 483 225 raw reads, which, after quality control and assembly, yielded 44 469 unigenes (contigs + singletons). A total of 49% of these unigenes displayed highly significant similarities to Arabidopsis proteins and were subsequently grouped into gene ontology categories. Blast searches against public and custom databases helped in identifying and obtaining preliminary sequence data for all of the major target-site genes for which waterhemp has documented resistance. Moreover, sequence data for two other herbicide targets [4-hydroxyphenylpyruvate dioxygenase (HPPD) and glutamine synthetase], where resistance has not yet been reported in any plant, were also investigated in waterhemp and six related weedy Amaranthus species. CONCLUSION These results demonstrate the enormous value of 454 sequencing for gene discovery and polymorphism detection in a major weed species and its relatives. Furthermore, the merging of the 454 transcriptome data with results from a previous whole genome 454 sequencing experiment has made it possible to establish a valuable genomic resource for weed science research.
Collapse
Affiliation(s)
- Chance W Riggins
- Department of Crop Sciences, University of Illinois, Urbana, IL, USA
| | | | | | | |
Collapse
|
128
|
Schadt EE, Turner S, Kasarskis A. A window into third-generation sequencing. Hum Mol Genet 2010; 19:R227-40. [DOI: 10.1093/hmg/ddq416] [Citation(s) in RCA: 636] [Impact Index Per Article: 45.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
129
|
Severin AJ, Woody JL, Bolon YT, Joseph B, Diers BW, Farmer AD, Muehlbauer GJ, Nelson RT, Grant D, Specht JE, Graham MA, Cannon SB, May GD, Vance CP, Shoemaker RC. RNA-Seq Atlas of Glycine max: a guide to the soybean transcriptome. BMC PLANT BIOLOGY 2010; 10:160. [PMID: 20687943 PMCID: PMC3017786 DOI: 10.1186/1471-2229-10-160] [Citation(s) in RCA: 438] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2010] [Accepted: 08/05/2010] [Indexed: 05/18/2023]
Abstract
BACKGROUND Next generation sequencing is transforming our understanding of transcriptomes. It can determine the expression level of transcripts with a dynamic range of over six orders of magnitude from multiple tissues, developmental stages or conditions. Patterns of gene expression provide insight into functions of genes with unknown annotation. RESULTS The RNA Seq-Atlas presented here provides a record of high-resolution gene expression in a set of fourteen diverse tissues. Hierarchical clustering of transcriptional profiles for these tissues suggests three clades with similar profiles: aerial, underground and seed tissues. We also investigate the relationship between gene structure and gene expression and find a correlation between gene length and expression. Additionally, we find dramatic tissue-specific gene expression of both the most highly-expressed genes and the genes specific to legumes in seed development and nodule tissues. Analysis of the gene expression profiles of over 2,000 genes with preferential gene expression in seed suggests there are more than 177 genes with functional roles that are involved in the economically important seed filling process. Finally, the Seq-atlas also provides a means of evaluating existing gene model annotations for the Glycine max genome. CONCLUSIONS This RNA-Seq atlas extends the analyses of previous gene expression atlases performed using Affymetrix GeneChip technology and provides an example of new methods to accommodate the increase in transcriptome data obtained from next generation sequencing. Data contained within this RNA-Seq atlas of Glycine max can be explored at http://www.soybase.org/soyseq.
Collapse
Affiliation(s)
- Andrew J Severin
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Jenna L Woody
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Yung-Tsi Bolon
- United States Department of Agriculture-Agricultural Research Service, Plant Research Unit, St. Paul, MN 55108, USA
| | - Bindu Joseph
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Brian W Diers
- Department of Crop Sciences, University of Illinois, 1101 West Peabody Dr., Urbana, IL 61801, USA
| | - Andrew D Farmer
- National Center for Genome Resources, Santa Fe, NM 87505, USA
| | - Gary J Muehlbauer
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Rex T Nelson
- United States Department of Agriculture-Agricultural Research Service, Corn Insects and Crop Genetics Resources Unit, Ames, IA 50011, USA
| | - David Grant
- United States Department of Agriculture-Agricultural Research Service, Corn Insects and Crop Genetics Resources Unit, Ames, IA 50011, USA
| | - James E Specht
- Department of Agronomy, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | - Michelle A Graham
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
- United States Department of Agriculture-Agricultural Research Service, Corn Insects and Crop Genetics Resources Unit, Ames, IA 50011, USA
| | - Steven B Cannon
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
- United States Department of Agriculture-Agricultural Research Service, Corn Insects and Crop Genetics Resources Unit, Ames, IA 50011, USA
| | - Gregory D May
- National Center for Genome Resources, Santa Fe, NM 87505, USA
| | - Carroll P Vance
- United States Department of Agriculture-Agricultural Research Service, Plant Research Unit, St. Paul, MN 55108, USA
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Randy C Shoemaker
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
- United States Department of Agriculture-Agricultural Research Service, Corn Insects and Crop Genetics Resources Unit, Ames, IA 50011, USA
| |
Collapse
|
130
|
Babik W, Stuglik M, Qi W, Kuenzli M, Kuduk K, Koteja P, Radwan J. Heart transcriptome of the bank vole (Myodes glareolus): towards understanding the evolutionary variation in metabolic rate. BMC Genomics 2010; 11:390. [PMID: 20565972 PMCID: PMC2996923 DOI: 10.1186/1471-2164-11-390] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2010] [Accepted: 06/21/2010] [Indexed: 03/13/2023] Open
Abstract
BACKGROUND Understanding the genetic basis of adaptive changes has been a major goal of evolutionary biology. In complex organisms without sequenced genomes, de novo transcriptome assembly using a longer read sequencing technology followed by expression profiling using short reads is likely to provide comprehensive identification of adaptive variation at the expression level and sequence polymorphisms in coding regions. We performed sequencing and de novo assembly of the bank vole heart transcriptome in lines selected for high metabolism and unselected controls. RESULTS A single 454 Titanium run produced over million reads, which were assembled into 63,581 contigs. Searches against the SwissProt protein database and the ENSEMBL collection of mouse transcripts detected similarity to 11,181 and 14,051 genes, respectively. As judged by the representation of genes from the heart-related Gene Ontology categories and UniGenes detected in the mouse heart, our detection of the genes expressed in the heart was nearly complete (> 95% and almost 90% respectively). On average, 38.7% of the transcript length was covered by our sequences, with notably higher (45.0%) coverage of coding regions than of untranslated regions (24.5% of 5' and 32.7% of 3'UTRs). Lower sequence conservation between mouse and bank vole in untranslated regions was found to be partially responsible for poorer UTR representation. Our data might suggest a widespread transcription from noncoding genomic regions, a finding not reported in previous studies regarding transcriptomes in non-model organisms. We also identified over 19 thousand putative single nucleotide polymorphisms (SNPs). A much higher fraction of the SNPs than expected by chance exhibited variant frequency differences between selection regimes. CONCLUSION Longer reads and higher sequence yield per run provided by the 454 Titanium technology in comparison to earlier generations of pyrosequencing proved beneficial for the quality of assembly. An almost full representation of genes known to be expressed in the mouse heart was identified. Usage of the extensive genomic resources available for the house mouse, a moderately (20-40 mln years) divergent relative of the voles, enabled a comprehensive assessment of the transcript completeness. Transcript sequences generated in the present study allowed the identification of candidate SNPs associated with divergence of selection lines and constitute a valuable permanent resource forming a foundation for RNAseq experiments aiming at detection of adaptive changes both at the level of gene expression and sequence variants, that would facilitate studies of the genetic basis of evolutionary divergence.
Collapse
Affiliation(s)
- Wiesław Babik
- Institute of Environmental Sciences, Jagiellonian University, 30-387 Krakow, Poland.
| | | | | | | | | | | | | |
Collapse
|
131
|
Datta S, Datta S, Kim S, Chakraborty S, Gill RS. Statistical Analyses of Next Generation Sequence Data: A Partial Overview. JOURNAL OF PROTEOMICS & BIOINFORMATICS 2010; 3:183-190. [PMID: 21113236 PMCID: PMC2989618 DOI: 10.4172/jpb.1000138] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Next generation sequencing has revolutionized the status of biological research. For a long time, the gold standard of DNA sequencing was considered to be the Sanger method. However, in 2005, commercial launching of next generation sequencing has made it possible to generate massively parallel and high resolution DNA sequence data. Its usefulness in various genomic applications such as genome-wide detection of SNPs, DNA methylation profiling, mRNA expression profiling, whole-genome re-sequencing and so on are now well recognized. There are several platforms for generating next generation sequencing (NGS) data which we briefly discuss in this mini overview. With new technologies come new challenges for the data analysts. This mini review attempts to present a collection of selected topics in the current development of statistical methods dealing with these novel data types. We believe that knowing the advances and bottlenecks of this technology will help the researchers to benchmark the analytical tools dealing with these data and will pave the path for its proper application into clinical diagnostics.
Collapse
Affiliation(s)
- Susmita Datta
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
| | - Somnath Datta
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
| | - Seongho Kim
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
| | - Sutirtha Chakraborty
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA
| | - Ryan S. Gill
- Department of Mathematics, University of Louisville, Louisville, KY 40202, USA
| |
Collapse
|
132
|
Schatz MC, Delcher AL, Salzberg SL. Assembly of large genomes using second-generation sequencing. Genome Res 2010; 20:1165-73. [PMID: 20508146 DOI: 10.1101/gr.101360.109] [Citation(s) in RCA: 280] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Second-generation sequencing technology can now be used to sequence an entire human genome in a matter of days and at low cost. Sequence read lengths, initially very short, have rapidly increased since the technology first appeared, and we now are seeing a growing number of efforts to sequence large genomes de novo from these short reads. In this Perspective, we describe the issues associated with short-read assembly, the different types of data produced by second-gen sequencers, and the latest assembly algorithms designed for these data. We also review the genomes that have been assembled recently from short reads and make recommendations for sequencing strategies that will yield a high-quality assembly.
Collapse
Affiliation(s)
- Michael C Schatz
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | | | | |
Collapse
|
133
|
|
134
|
Zhang C, Xing D. Single-Molecule DNA Amplification and Analysis Using Microfluidics. Chem Rev 2010; 110:4910-47. [DOI: 10.1021/cr900081z] [Citation(s) in RCA: 115] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Chunsun Zhang
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Da Xing
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| |
Collapse
|
135
|
Rutledge RG, Stewart D. Assessing the performance capabilities of LRE-based assays for absolute quantitative real-time PCR. PLoS One 2010; 5:e9731. [PMID: 20305810 PMCID: PMC2840021 DOI: 10.1371/journal.pone.0009731] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2010] [Accepted: 02/25/2010] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Linear regression of efficiency or LRE introduced a new paradigm for conducting absolute quantification, which does not require standard curves, can generate absolute accuracies of +/-25% and has single molecule sensitivity. Derived from adapting the classic Boltzmann sigmoidal function to PCR, target quantity is calculated directly from the fluorescence readings within the central region of an amplification profile, generating 4-8 determinations from each amplification reaction. FINDINGS Based on generating a linear representation of PCR amplification, the highly visual nature of LRE analysis is illustrated by varying reaction volume and amplification efficiency, which also demonstrates how LRE can be used to model PCR. Examining the dynamic range of LRE further demonstrates that quantitative accuracy can be maintained down to a single target molecule, and that target quantification below ten molecules conforms to that predicted by Poisson distribution. Essential to the universality of optical calibration, the fluorescence intensity generated by SYBR Green I (FU/bp) is shown to be independent of GC content and amplicon size, further verifying that absolute scale can be established using a single quantitative standard. Two high-performance lambda amplicons are also introduced that in addition to producing highly precise optical calibrations, can be used as benchmarks for performance testing. The utility of limiting dilution assay for conducting platform-independent absolute quantification is also discussed, along with the utility of defining assay performance in terms of absolute accuracy. CONCLUSIONS Founded on the ability to exploit lambda gDNA as a universal quantitative standard, LRE provides the ability to conduct absolute quantification using few resources beyond those needed for sample preparation and amplification. Combined with the quantitative and quality control capabilities of LRE, this kinetic-based approach has the potential to fundamentally transform how real-time qPCR is conducted.
Collapse
Affiliation(s)
- Robert G Rutledge
- Canadian Forest Service, Natural Resources Canada, Quebec, Quebec, Canada.
| | | |
Collapse
|
136
|
Druka A, Potokina E, Luo Z, Jiang N, Chen X, Kearsey M, Waugh R. Expression quantitative trait loci analysis in plants. PLANT BIOTECHNOLOGY JOURNAL 2010; 8:10-27. [PMID: 20055957 DOI: 10.1111/j.1467-7652.2009.00460.x] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
An expression Quantitative Trait Locus or eQTL is a chromosomal region that accounts for a proportion of the variation in abundance of a mRNA transcript observed between individuals in a genetic mapping population. A single gene can have one or multiple eQTLs. Large scale mRNA profiling technologies advanced genome-wide eQTL mapping in a diverse range of organisms allowing thousands of eQTLs to be detected in a single experiment. When combined with classical or trait QTLs, correlation analyses can directly suggest candidates for genes underlying these traits. Furthermore, eQTL mapping data enables genetic regulatory networks to be modelled and potentially provide a better understanding of the underlying phenotypic variation. The mRNA profiling data sets can also be used to infer the chromosomal positions of thousands of genes, an outcome that is particularly valuable for species with unsequenced genomes where the chromosomal location of the majority of genes remains unknown. In this review we focus on eQTL studies in plants, addressing conceptual and technical aspects that include experimental design, genetic polymorphism prediction and candidate gene identification.
Collapse
Affiliation(s)
- Arnis Druka
- Genetics, Scottish Crop Research Institute, Invergowrie, Dundee, UK
| | | | | | | | | | | | | |
Collapse
|