1901
|
Kantardjieff A, Nissom PM, Chuah SH, Yusufi F, Jacob NM, Mulukutla BC, Yap M, Hu WS. Developing genomic platforms for Chinese hamster ovary cells. Biotechnol Adv 2009; 27:1028-1035. [DOI: 10.1016/j.biotechadv.2009.05.023] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
1902
|
Affiliation(s)
- Donna K Slonim
- Department of Computer Science, Tufts University, Medford, Massachusetts, USA.
| | | |
Collapse
|
1903
|
Marguerat S, Bähler J. RNA-seq: from technology to biology. CELLULAR AND MOLECULAR LIFE SCIENCES : CMLS 2009. [PMID: 19859660 DOI: 10.1007/s00018‐009‐0180‐6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Next-generation sequencing technologies are now being exploited not only to analyse static genomes, but also dynamic transcriptomes in an approach termed RNA-seq. Although these powerful and rapidly evolving technologies have only been available for a couple of years, they are already making substantial contributions to our understanding of genome expression and regulation. Here, we briefly describe technical issues accompanying RNA-seq data generation and analysis, highlighting differences to array-based approaches. We then review recent biological insight gained from applying RNA-seq and related approaches to deeply sample transcriptomes in different cell types or physiological conditions. These approaches are providing fascinating information about transcriptional and post-transcriptional gene regulation, and they are also giving unique insight into the richness of transcript structures and processing on a global scale and at unprecedented resolution.
Collapse
Affiliation(s)
- Samuel Marguerat
- Department of Genetics, Evolution and Environment, UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | | |
Collapse
|
1904
|
Wang L, Feng Z, Wang X, Wang X, Zhang X. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 2009; 26:136-8. [PMID: 19855105 DOI: 10.1093/bioinformatics/btp612] [Citation(s) in RCA: 2867] [Impact Index Per Article: 191.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Likun Wang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua, University, Beijing 100084, China
| | | | | | | | | |
Collapse
|
1905
|
Abstract
Variation in gene expression constitutes an important source of biological variability within and between populations that is likely to contribute significantly to phenotypic diversity. Recent conceptual, technical, and methodological advances have enabled the genome-scale dissection of transcriptional variation. Here, we outline common approaches for detecting gene expression quantitative trait loci, and summarize the insights gleaned from these studies regarding the genetic architecture of transcriptional variation and the nature of regulatory alleles. Particular emphasis is placed on human studies, and we discuss experimental designs that ensure that increasingly large and complex studies continue to advance our understanding of gene expression variation. We conclude by discussing the evolution of gene expression levels, and we explore prospects for leveraging new technological developments to investigate inherited variation in gene expression in even greater depth.
Collapse
Affiliation(s)
- Daniel A Skelly
- Department of Genome Sciences, University of Washington, Seattle, Washington, 98195, USA.
| | | | | |
Collapse
|
1906
|
Kosakovsky Pond S, Wadhawan S, Chiaromonte F, Ananda G, Chung WY, Taylor J, Nekrutenko A. Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Res 2009; 19:2144-53. [PMID: 19819906 DOI: 10.1101/gr.094508.109] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
How many species inhabit our immediate surroundings? A straightforward collection technique suitable for answering this question is known to anyone who has ever driven a car at highway speeds. The windshield of a moving vehicle is subjected to numerous insect strikes and can be used as a collection device for representative sampling. Unfortunately the analysis of biological material collected in that manner, as with most metagenomic studies, proves to be rather demanding due to the large number of required tools and considerable computational infrastructure. In this study, we use organic matter collected by a moving vehicle to design and test a comprehensive pipeline for phylogenetic profiling of metagenomic samples that includes all steps from processing and quality control of data generated by next-generation sequencing technologies to statistical analyses and data visualization. To the best of our knowledge, this is also the first publication that features a live online supplement providing access to exact analyses and workflows used in the article.
Collapse
|
1907
|
Gilad Y, Pritchard JK, Thornton K. Characterizing natural variation using next-generation sequencing technologies. Trends Genet 2009; 25:463-71. [PMID: 19801172 DOI: 10.1016/j.tig.2009.09.003] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2009] [Revised: 09/08/2009] [Accepted: 09/09/2009] [Indexed: 01/22/2023]
Abstract
Progress in evolutionary genomics is tightly coupled with the development of new technologies to collect high-throughput data. The availability of next-generation sequencing technologies has the potential to revolutionize genomic research and enable us to focus on a large number of outstanding questions that previously could not be addressed effectively. Indeed, we are now able to study genetic variation on a genome-wide scale, characterize gene regulatory processes at unprecedented resolution, and soon, we expect that individual laboratories might be able to rapidly sequence new genomes. However, at present, the analysis of next-generation sequencing data is challenging, in particular because most sequencing platforms provide short reads, which are difficult to align and assemble. In addition, only little is known about sources of variation that are associated with next-generation sequencing study designs. A better understanding of the sources of error and bias in sequencing data is essential, especially in the context of studies of variation at dynamic quantitative traits.
Collapse
Affiliation(s)
- Yoav Gilad
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
1908
|
Parkhomchuk D, Borodina T, Amstislavskiy V, Banaru M, Hallen L, Krobitsch S, Lehrach H, Soldatov A. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res 2009; 37:e123. [PMID: 19620212 PMCID: PMC2764448 DOI: 10.1093/nar/gkp596] [Citation(s) in RCA: 622] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Revised: 06/04/2009] [Accepted: 06/30/2009] [Indexed: 11/15/2022] Open
Abstract
High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Alexey Soldatov
- Max Planck Institute for Molecular Genetics, Department of Vertebrate Genomics, Ihnestr. 73, 14195 Berlin, Germany
| |
Collapse
|
1909
|
Ozsolak F, Platt AR, Jones DR, Reifenberger JG, Sass LE, McInerney P, Thompson JF, Bowers J, Jarosz M, Milos PM. Direct RNA sequencing. Nature 2009; 461:814-8. [PMID: 19776739 DOI: 10.1038/nature08390] [Citation(s) in RCA: 288] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Accepted: 08/05/2009] [Indexed: 01/24/2023]
Abstract
Our understanding of human biology and disease is ultimately dependent on a complete understanding of the genome and its functions. The recent application of microarray and sequencing technologies to transcriptomics has changed the simplistic view of transcriptomes to a more complicated view of genome-wide transcription where a large fraction of transcripts emanates from unannotated parts of genomes, and underlined our limited knowledge of the dynamic state of transcription. Most of this broad body of knowledge was obtained indirectly because current transcriptome analysis methods typically require RNA to be converted to complementary DNA (cDNA) before measurements, even though the cDNA synthesis step introduces multiple biases and artefacts that interfere with both the proper characterization and quantification of transcripts. Furthermore, cDNA synthesis is not particularly suitable for the analysis of short, degraded and/or small quantity RNA samples. Here we report direct single molecule RNA sequencing without prior conversion of RNA to cDNA. We applied this technology to sequence femtomole quantities of poly(A)(+) Saccharomyces cerevisiae RNA using a surface coated with poly(dT) oligonucleotides to capture the RNAs at their natural poly(A) tails and initiate sequencing by synthesis. We observed transcript 3' end heterogeneity and polyadenylated small nucleolar RNAs. This study provides a path to high-throughput and low-cost direct RNA sequencing and achieving the ultimate goal of a comprehensive and bias-free understanding of transcriptomes.
Collapse
Affiliation(s)
- Fatih Ozsolak
- Helicos BioSciences Corporation, One Kendall Square, Cambridge, Massachusetts 02139, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1910
|
ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 2009. [PMID: 19736561 DOI: 10.1038/nrg2641,+10.1038/ni0709-669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.
Collapse
|
1911
|
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.
Collapse
|
1912
|
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.
Collapse
|
1913
|
Porter S, Olson NE, Smith T. Analyzing Gene Expression Data from Microarray and Next‐Generation DNA Sequencing Transcriptome Profiling Assays Using GeneSifter Analysis Edition. ACTA ACUST UNITED AC 2009; Chapter 7:Unit 7.14 7.14.1-35. [DOI: 10.1002/0471250953.bi0714s27] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Sandra Porter
- Digital World Biology Seattle Washington
- Geospiza, Inc Seattle Washington
| | | | | |
Collapse
|
1914
|
Hegedűs Z, Zakrzewska A, Ágoston VC, Ordas A, Rácz P, Mink M, Spaink HP, Meijer AH. Deep sequencing of the zebrafish transcriptome response to mycobacterium infection. Mol Immunol 2009; 46:2918-30. [PMID: 19631987 DOI: 10.1016/j.molimm.2009.07.002] [Citation(s) in RCA: 167] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Accepted: 07/01/2009] [Indexed: 10/20/2022]
|
1915
|
Aubin-Horth N, Renn SCP. Genomic reaction norms: using integrative biology to understand molecular mechanisms of phenotypic plasticity. Mol Ecol 2009; 18:3763-80. [PMID: 19732339 DOI: 10.1111/j.1365-294x.2009.04313.x] [Citation(s) in RCA: 256] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Phenotypic plasticity is the development of different phenotypes from a single genotype, depending on the environment. Such plasticity is a pervasive feature of life, is observed for various traits and is often argued to be the result of natural selection. A thorough study of phenotypic plasticity should thus include an ecological and an evolutionary perspective. Recent advances in large-scale gene expression technology make it possible to also study plasticity from a molecular perspective, and the addition of these data will help answer long-standing questions about this widespread phenomenon. In this review, we present examples of integrative studies that illustrate the molecular and cellular mechanisms underlying plastic traits, and show how new techniques will grow in importance in the study of these plastic molecular processes. These techniques include: (i) heterologous hybridization to DNA microarrays; (ii) next generation sequencing technologies applied to transcriptomics; (iii) techniques for studying the function of noncoding small RNAs; and (iv) proteomic tools. We also present recent studies on genetic model systems that uncover how environmental cues triggering different plastic responses are sensed and integrated by the organism. Finally, we describe recent work on changes in gene expression in response to an environmental cue that persist after the cue is removed. Such long-term responses are made possible by epigenetic molecular mechanisms, including DNA methylation. The results of these current studies help us outline future avenues for the study of plasticity.
Collapse
Affiliation(s)
- Nadia Aubin-Horth
- Département de Sciences biologiques, Université de Montréal, Québec, Canada.
| | | |
Collapse
|
1916
|
Know your limits: Assumptions, constraints and interpretation in systems biology. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2009; 1794:1280-7. [DOI: 10.1016/j.bbapap.2009.05.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2009] [Accepted: 05/04/2009] [Indexed: 12/20/2022]
|
1917
|
van Vliet AHM. Next generation sequencing of microbial transcriptomes: challenges and opportunities. FEMS Microbiol Lett 2009; 302:1-7. [PMID: 19735299 DOI: 10.1111/j.1574-6968.2009.01767.x] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Over the past 15 years, microbial functional genomics has been made possible by the combined power of genome sequencing and microarray technology. However, we are now approaching the technical limits of microarray technology, and microarrays are now being superseded by transcriptomics based on high-throughput (next generation) DNA-sequencing technologies. The term RNA-seq has been coined to represent transcriptomics by next-generation sequencing. Although pioneered on eukaryotic organisms due to the relative ease of working with eukaryotic mRNA, the RNA-seq technology is now being ported to microbial systems. This review will discuss the opportunities of RNA-seq transcriptome sequencing for microorganisms, and also aims to identify challenges and pitfalls of the use of this new technology in microorganisms.
Collapse
|
1918
|
Machado HE, Pollen AA, Hofmann HA, Renn SCP. Interspecific profiling of gene expression informed by comparative genomic hybridization: A review and a novel approach in African cichlid fishes. Integr Comp Biol 2009; 49:644-59. [PMID: 21665847 DOI: 10.1093/icb/icp080] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Modern genomic approaches have facilitated great progress in our understanding of the molecular and genetic underpinnings of ecological and evolutionary processes. Analysis of gene expression through heterologous hybridization in particular has enabled genome-scale studies in many ecologically and evolutionarily interesting species. However, these studies have been hampered by the difficulty of comparing-on a common array platform-gene-expression profiles across species due to sequence divergence altering the dynamics of hybridization. All too often, comparisons of expression profiles across species were limited to contrasting lists of gene or even of just functional categories. Here we review these issues and propose a novel solution. Exploiting the diverse cichlid lineages of East Africa as our model-system, we then present results from an experimental case study that compares the neural gene-expression profiles of males and females of two species that differ in mating system. Using a single microarray platform that contains genes from one species, Astatotilapia burtoni, we conducted a total of 16 direct comparisons for neural gene-expression level between individual males and females from a pair of sister species, the polygynous Enantiopus melanogenys and the monogamous Xenotilapia flavipinnis. Next, we conducted a meta-analysis with previously published data from two different intra-specific expression studies to determine whether sex-specific neural gene expression is more closely associated with behavioral phenotype than it is with gonadal sex. Our results indicate that the gene expression profiles are species-specific to a large extent, as relatively few genes show conserved expression patterns associated with either sex. Finally, we describe how competitive genomic DNA hybridizations between the two focal species allow us to assess the degree to which divergence of sequences biases the results. We propose a masking technique that correlates interspecific expression ratios obtained with cDNA with hybridization ratios obtained with genomic DNA for the same set of species and determines threshold sequence divergence to reduce false positives. Our approach should be applicable to a wide range of interesting questions related to the evolution and ecology of gene expression.
Collapse
Affiliation(s)
- Heather E Machado
- *Department of Biology, Reed College, Portland, OR 97202, USA; Program in Neuroscience, Stanford University, Stanford, CA 94305, USA; Section of Integrative Biology, Institute for Molecular and Cellular Biology, Institute for Neuroscience, The University of Texas at Austin, Austin, TX 78712, USA
| | | | | | | |
Collapse
|
1919
|
Chen Y, Souaiaia T, Chen T. PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. ACTA ACUST UNITED AC 2009; 25:2514-21. [PMID: 19675096 PMCID: PMC2752623 DOI: 10.1093/bioinformatics/btp486] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The explosion of next-generation sequencing data has spawned the design of new algorithms and software tools to provide efficient mapping for different read lengths and sequencing technologies. In particular, ABI's sequencer (SOLiD system) poses a big computational challenge with its capacity to produce very large amounts of data, and its unique strategy of encoding sequence data into color signals. RESULTS We present the mapping software, named PerM (Periodic Seed Mapping) that uses periodic spaced seeds to significantly improve mapping efficiency for large reference genomes when compared with state-of-the-art programs. The data structure in PerM requires only 4.5 bytes per base to index the human genome, allowing entire genomes to be loaded to memory, while multiple processors simultaneously map reads to the reference. Weight maximized periodic seeds offer full sensitivity for up to three mismatches and high sensitivity for four and five mismatches while minimizing the number random hits per query, significantly speeding up the running time. Such sensitivity makes PerM a valuable mapping tool for SOLiD and Solexa reads. AVAILABILITY http://code.google.com/p/perm/
Collapse
Affiliation(s)
- Yangho Chen
- Program in Computational Biology and Bioinformatics, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089-2910, USA
| | | | | |
Collapse
|
1920
|
Lin B, Wang J, Hong X, Yan X, Hwang D, Cho JH, Yi D, Utleg AG, Fang X, Schones DE, Zhao K, Omenn GS, Hood L. Integrated expression profiling and ChIP-seq analyses of the growth inhibition response program of the androgen receptor. PLoS One 2009; 4:e6589. [PMID: 19668381 PMCID: PMC2720376 DOI: 10.1371/journal.pone.0006589] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 07/09/2009] [Indexed: 01/06/2023] Open
Abstract
Background The androgen receptor (AR) plays important roles in the development of male phenotype and in different human diseases including prostate cancers. The AR can act either as a promoter or a tumor suppressor depending on cell types. The AR proliferative response program has been well studied, but its prohibitive response program has not yet been thoroughly studied. Methodology/Principal Findings Previous studies found that PC3 cells expressing the wild-type AR inhibit growth and suppress invasion. We applied expression profiling to identify the response program of PC3 cells expressing the AR (PC3-AR) under different growth conditions (i.e. with or without androgens and at different concentration of androgens) and then applied the newly developed ChIP-seq technology to identify the AR binding regions in the PC3 cancer genome. A surprising finding was that the comparison of MOCK-transfected PC3 cells with AR-transfected cells identified 3,452 differentially expressed genes (two fold cutoff) even without the addition of androgens (i.e. in ethanol control), suggesting that a ligand independent activation or extremely low-level androgen activation of the AR. ChIP-Seq analysis revealed 6,629 AR binding regions in the cancer genome of PC3 cells with an FDR (false discovery rate) cut off of 0.05. About 22.4% (638 of 2,849) can be mapped to within 2 kb of the transcription start site (TSS). Three novel AR binding motifs were identified in the AR binding regions of PC3-AR cells, and two of them share a core consensus sequence CGAGCTCTTC, which together mapped to 27.3% of AR binding regions (1,808/6,629). In contrast, only about 2.9% (190/6,629) of AR binding sites contains the canonical AR matrix M00481, M00447 and M00962 (from the Transfac database), which is derived mostly from AR proliferative responsive genes in androgen dependent cells. In addition, we identified four top ranking co-occupancy transcription factors in the AR binding regions, which include TEF1 (Transcriptional enhancer factor), GATA (GATA transcription factors), OCT (octamer transcription factors) and PU1 (PU.1 transcription factor). Conclusions/Significance Our data provide a valuable data set in understanding the molecular basis for growth inhibition response program of the AR in prostate cancer cells, which can be exploited for developing novel prostate cancer therapeutic strategies.
Collapse
Affiliation(s)
- Biaoyang Lin
- Department of Urology, University of Washington, Seattle, WA, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1921
|
NIELSEN EINARE, HEMMER-HANSEN JAKOB, LARSEN PETERFOGED, BEKKEVOLD DORTE. Population genomics of marine fishes: identifying adaptive variation in space and time. Mol Ecol 2009; 18:3128-50. [DOI: 10.1111/j.1365-294x.2009.04272.x] [Citation(s) in RCA: 236] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
1922
|
Kerschgens J, Egener-Kuhn T, Mermod N. Protein-binding microarrays: probing disease markers at the interface of proteomics and genomics. Trends Mol Med 2009; 15:352-8. [DOI: 10.1016/j.molmed.2009.06.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Revised: 06/08/2009] [Accepted: 06/08/2009] [Indexed: 12/31/2022]
|
1923
|
Kristiansson E, Asker N, Förlin L, Larsson DGJ. Characterization of the Zoarces viviparus liver transcriptome using massively parallel pyrosequencing. BMC Genomics 2009; 10:345. [PMID: 19646242 PMCID: PMC2725146 DOI: 10.1186/1471-2164-10-345] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2009] [Accepted: 07/31/2009] [Indexed: 11/16/2022] Open
Abstract
Background The teleost Zoarces viviparus (eelpout) lives along the coasts of Northern Europe and has long been an established model organism for marine ecology and environmental monitoring. The scarce information about this species genome has however restrained the use of efficient molecular-level assays, such as gene expression microarrays. Results In the present study we present the first comprehensive characterization of the Zoarces viviparus liver transcriptome. From 400,000 reads generated by massively parallel pyrosequencing, more than 50,000 pieces of putative transcripts were assembled, annotated and functionally classified. The data was estimated to cover roughly 40% of the total transcriptome and homologues for about half of the genes of Gasterosteus aculeatus (stickleback) were identified. The sequence data was consequently used to design an oligonucleotide microarray for large-scale gene expression analysis. Conclusion Our results show that one run using a Genome Sequencer FLX from 454 Life Science/Roche generates enough genomic information for adequate de novo assembly of a large number of genes in a higher vertebrate. The generated sequence data, including the validated microarray probes, are publicly available to promote genome-wide research in Zoarces viviparus.
Collapse
|
1924
|
Glazov EA, Kongsuwan K, Assavalapsakul W, Horwood PF, Mitter N, Mahony TJ. Repertoire of bovine miRNA and miRNA-like small regulatory RNAs expressed upon viral infection. PLoS One 2009; 4:e6349. [PMID: 19633723 PMCID: PMC2713767 DOI: 10.1371/journal.pone.0006349] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Accepted: 06/17/2009] [Indexed: 12/21/2022] Open
Abstract
MicroRNA (miRNA) and other types of small regulatory RNAs play a crucial role in the regulation of gene expression in eukaryotes. Several distinct classes of small regulatory RNAs have been discovered in recent years. To extend the repertoire of small RNAs characterized in mammals and to examine relationship between host miRNA expression and viral infection we used Illumina's ultrahigh throughput sequencing approach. We sequenced three small RNA libraries prepared from cell line derived from the adult bovine kidney under normal conditions and upon infection of the cell line with Bovine herpesvirus 1. We used a bioinformatics approach to distinguish authentic mature miRNA sequences from other classes of small RNAs and short RNA fragments represented in the sequencing data. Using this approach we detected 219 out of 356 known bovine miRNAs and 115 respective miRNA* sequences. In addition we identified five new bovine orthologs of known mammalian miRNAs and discovered 268 new cow miRNAs many of which are not identifiable in other mammalian genomes and thus might be specific to the ruminant lineage. In addition we found seven new bovine mirtron candidates. We also discovered 10 small nucleolar RNA (snoRNA) loci that give rise to small RNA with possible miRNA-like function. Results presented in this study extend our knowledge of the biology and evolution of small regulatory RNAs in mammals and illuminate mechanisms of small RNA biogenesis and function. New miRNA sequences and the original sequencing data have been submitted to miRNA repository (miRBase) and NCBI GEO archive respectively. We envisage that these resources will facilitate functional annotation of the bovine genome and promote further functional and comparative genomics studies of small regulatory RNA in mammals.
Collapse
Affiliation(s)
- Evgeny A. Glazov
- Diamantina Institute for Cancer, Immunology and Metabolic Medicine, The University of Queensland, Princess Alexandra Hospital, Woolloongabba, Queensland, Australia
- * E-mail: (EAG); (TJM)
| | - Kritaya Kongsuwan
- CSIRO Livestock Industries, Queensland Bioscience Precinct, St Lucia, Queensland, Australia
| | - Wanchai Assavalapsakul
- Department of Microbiology, Faculty of Science, Chulalongkorn University, Phayathai, Bangkok, Thailand
| | - Paul F. Horwood
- Department of Primary Industries and Fisheries, Ritchie Building, Brisbane, Queensland, Australia
| | - Neena Mitter
- Department of Primary Industries and Fisheries, Ritchie Building, Brisbane, Queensland, Australia
| | - Timothy J. Mahony
- Department of Primary Industries and Fisheries, Ritchie Building, Brisbane, Queensland, Australia
- School of Veterinary Sciences, University of Queensland, St Lucia, Queensland, Australia
- * E-mail: (EAG); (TJM)
| |
Collapse
|
1925
|
Balwierz PJ, Carninci P, Daub CO, Kawai J, Hayashizaki Y, Van Belle W, Beisel C, van Nimwegen E. Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data. Genome Biol 2009; 10:R79. [PMID: 19624849 PMCID: PMC2728533 DOI: 10.1186/gb-2009-10-7-r79] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2008] [Revised: 03/02/2009] [Accepted: 07/22/2009] [Indexed: 11/10/2022] Open
Abstract
A set of methods is presented for normalization, quantification of noise and co-expression analysis for gene expression studies using deep sequencing. With the advent of ultra high-throughput sequencing technologies, increasingly researchers are turning to deep sequencing for gene expression studies. Here we present a set of rigorous methods for normalization, quantification of noise, and co-expression analysis of deep sequencing data. Using these methods on 122 cap analysis of gene expression (CAGE) samples of transcription start sites, we construct genome-wide 'promoteromes' in human and mouse consisting of a three-tiered hierarchy of transcription start sites, transcription start clusters, and transcription start regions.
Collapse
Affiliation(s)
- Piotr J Balwierz
- Biozentrum, University of Basel, and Swiss Institute of Bioinformatics, Klingelbergstrasse 50/70, 4056-CH, Basel, Switzerland
| | | | | | | | | | | | | | | |
Collapse
|
1926
|
Transcriptome of embryonic and neonatal mouse cortex by high-throughput RNA sequencing. Proc Natl Acad Sci U S A 2009; 106:12741-6. [PMID: 19617558 DOI: 10.1073/pnas.0902417106] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Brain structure and function experience dramatic changes from embryonic to postnatal development. Microarray analyses have detected differential gene expression at different stages and in disease models, but gene expression information during early brain development is limited. We have generated >27 million reads to identify mRNAs from the mouse cortex for >16,000 genes at either embryonic day 18 (E18) or postnatal day 7 (P7), a period of significant synaptogenesis for neural circuit formation. In addition, we devised strategies to detect alternative splice forms and uncovered more splice variants. We observed differential expression of 3,758 genes between the 2 stages, many with known functions or predicted to be important for neural development. Neurogenesis-related genes, such as those encoding Sox4, Sox11, and zinc-finger proteins, were more highly expressed at E18 than at P7. In contrast, the genes encoding synaptic proteins such as synaptotagmin, complexin 2, and syntaxin were up-regulated from E18 to P7. We also found that several neurological disorder-related genes were highly expressed at E18. Our transcriptome analysis may serve as a blueprint for gene expression pattern and provide functional clues of previously unknown genes and disease-related genes during early brain development.
Collapse
|
1927
|
Bauer JW, Bilgic H, Baechler EC. Gene-expression profiling in rheumatic disease: tools and therapeutic potential. Nat Rev Rheumatol 2009; 5:257-65. [PMID: 19412192 DOI: 10.1038/nrrheum.2009.50] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Gene-expression profiling is a powerful tool for the discovery of molecular fingerprints that underlie human disease. Microarray technologies allow the analysis of messenger RNA transcript levels for every gene in the genome. However, gene-expression profiling is best viewed as part of a pipeline that extends from sample collection through clinical application. Key genes and pathways identified by microarray profiling should be validated in independent sample sets and with alternative technologies. Analysis of relevant signaling pathways at the protein level is an important step towards understanding the functional consequences of aberrant gene expression. Peripheral blood is a convenient and rich source of potential biomarkers, but surveying purified cell populations and target tissues can also enhance our understanding of disease states. In rheumatic disease, probing the transcriptome of circulating immune cells has shed light on mechanisms underlying the pathogenesis of complex diseases, such as systemic lupus erythematosus. As these discoveries advance through the pipeline, a variety of clinical applications are on the horizon, including the use of molecular fingerprints to aid in diagnosis and prognosis, improved use of existing therapies, and the development of drugs that target relevant genes and pathways.
Collapse
Affiliation(s)
- Jason W Bauer
- Division of Rheumatic and Autoimmune Diseases, Department of Medicine, University of Minnesota, Minneapolis, MN 55455, USA
| | | | | |
Collapse
|
1928
|
Chimeric transcript discovery by paired-end transcriptome sequencing. Proc Natl Acad Sci U S A 2009; 106:12353-8. [PMID: 19592507 DOI: 10.1073/pnas.0904720106] [Citation(s) in RCA: 273] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Recurrent gene fusions are a prevalent class of mutations arising from the juxtaposition of 2 distinct regions, which can generate novel functional transcripts that could serve as valuable therapeutic targets in cancer. Therefore, we aim to establish a sensitive, high-throughput methodology to comprehensively catalog functional gene fusions in cancer by evaluating a paired-end transcriptome sequencing strategy. Not only did a paired-end approach provide a greater dynamic range in comparison with single read based approaches, but it clearly distinguished the high-level "driving" gene fusions, such as BCR-ABL1 and TMPRSS2-ERG, from potential lower level "passenger" gene fusions. Also, the comprehensiveness of a paired-end approach enabled the discovery of 12 previously undescribed gene fusions in 4 commonly used cell lines that eluded previous approaches. Using the paired-end transcriptome sequencing approach, we observed read-through mRNA chimeras, tissue-type restricted chimeras, converging transcripts, diverging transcripts, and overlapping mRNA transcripts. Last, we successfully used paired-end transcriptome sequencing to detect previously undescribed ETS gene fusions in prostate tumors. Together, this study establishes a highly specific and sensitive approach for accurately and comprehensively cataloguing chimeras within a sample using paired-end transcriptome sequencing.
Collapse
|
1929
|
Quon G, Morris Q. ISOLATE: a computational strategy for identifying the primary origin of cancers using high-throughput sequencing. Bioinformatics 2009; 25:2882-9. [PMID: 19542156 PMCID: PMC2781747 DOI: 10.1093/bioinformatics/btp378] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Motivation: One of the most deadly cancer diagnoses is the carcinoma of unknown primary origin. Without the knowledge of the site of origin, treatment regimens are limited in their specificity and result in high mortality rates. Though supervised classification methods have been developed to predict the site of origin based on gene expression data, they require large numbers of previously classified tumors for training, in part because they do not account for sample heterogeneity, which limits their application to well-studied cancers. Results: We present ISOLATE, a new statistical method that simultaneously predicts the primary site of origin of cancers and addresses sample heterogeneity, while taking advantage of new high-throughput sequencing technology that promises to bring higher accuracy and reproducibility to gene expression profiling experiments. ISOLATE makes predictions de novo, without having seen any training expression profiles of cancers with identified origin. Compared with previous methods, ISOLATE is able to predict the primary site of origin, de-convolve and remove the effect of sample heterogeneity and identify differentially expressed genes with higher accuracy, across both synthetic and clinical datasets. Methods such as ISOLATE are invaluable tools for clinicians faced with carcinomas of unknown primary origin. Availability: ISOLATE is available for download at: http://morrislab.med.utoronto.ca/software Contact:gerald.quon@utoronto.ca; quaid.morris@utoronto.ca Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gerald Quon
- Department of Computer Science, University of Toronto, Toronto, Canada.
| | | |
Collapse
|
1930
|
Morrissy AS, Morin RD, Delaney A, Zeng T, McDonald H, Jones S, Zhao Y, Hirst M, Marra MA. Next-generation tag sequencing for cancer gene expression profiling. Genome Res 2009; 19:1825-35. [PMID: 19541910 DOI: 10.1101/gr.094482.109] [Citation(s) in RCA: 277] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We describe a new method, Tag-seq, which employs ultra high-throughput sequencing of 21 base pair cDNA tags for sensitive and cost-effective gene expression profiling. We compared Tag-seq data to LongSAGE data and observed improved representation of several classes of rare transcripts, including transcription factors, antisense transcripts, and intronic sequences, the latter possibly representing novel exons or genes. We observed increases in the diversity, abundance, and dynamic range of such rare transcripts and took advantage of the greater dynamic range of expression to identify, in cancers and normal libraries, altered expression ratios of alternative transcript isoforms. The strand-specific information of Tag-seq reads further allowed us to detect altered expression ratios of sense and antisense (S-AS) transcripts between cancer and normal libraries. S-AS transcripts were enriched in known cancer genes, while transcript isoforms were enriched in miRNA targeting sites. We found that transcript abundance had a stronger GC-bias in LongSAGE than Tag-seq, such that AT-rich tags were less abundant than GC-rich tags in LongSAGE. Tag-seq also performed better in gene discovery, identifying >98% of genes detected by LongSAGE and profiling a distinct subset of the transcriptome characterized by AT-rich genes, which was expressed at levels below those detectable by LongSAGE. Overall, Tag-seq is sensitive to rare transcripts, has less sequence composition bias relative to LongSAGE, and allows differential expression analysis for a greater range of transcripts, including transcripts encoding important regulatory molecules.
Collapse
|
1931
|
Blencowe BJ, Ahmad S, Lee LJ. Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes. Genes Dev 2009; 23:1379-86. [DOI: 10.1101/gad.1788009] [Citation(s) in RCA: 134] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
1932
|
Sackton TB, Clark AG. Comparative profiling of the transcriptional response to infection in two species of Drosophila by short-read cDNA sequencing. BMC Genomics 2009; 10:259. [PMID: 19500410 PMCID: PMC2701966 DOI: 10.1186/1471-2164-10-259] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2008] [Accepted: 06/07/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Homology-based comparisons of the genes involved in innate immunity across many insect taxa with fully sequenced genomes has revealed a striking pattern of gene gain and loss, particularly among genes that encode proteins involved in clearing pathogens (effectors). However, limited functional annotation in non-model systems has hindered understanding of evolutionary novelties in the insect innate immune system. RESULTS We use short read sequencing technology (Illumina/Solexa) to compare the transcriptional response to infection between the well studied model system Drosophila melanogaster and the distantly related drosophilid D. virilis. We first demonstrate that Illumina/Solexa sequencing of cDNA from infected and uninfected D. melanogaster recapitulates previously published microarray studies of the transcriptional response to infection in this species, validating our approach. We then show that patterns of transcription of homologous genes differ considerably between D. melanogaster and D. virilis, and identify potential candidates for novel components of the D. virilis immune system based on transcriptional data. Finally, we use a proteomic approach to characterize the protein constituents of the D. virilis hemolymph and validate our transcriptional data. CONCLUSION These results suggest that the acquisition of novel components of the immune system, and particularly novel effector proteins, may be a common evolutionary phenomenon.
Collapse
Affiliation(s)
- Timothy B Sackton
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.
| | | |
Collapse
|
1933
|
Zeller G, Henz SR, Widmer CK, Sachsenberg T, Rätsch G, Weigel D, Laubinger S. Stress-induced changes in the Arabidopsis thaliana transcriptome analyzed using whole-genome tiling arrays. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2009; 58:1068-82. [PMID: 19222804 DOI: 10.1111/j.1365-313x.2009.03835.x] [Citation(s) in RCA: 141] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
The responses of plants to abiotic stresses are accompanied by massive changes in transcriptome composition. To provide a comprehensive view of stress-induced changes in the Arabidopsis thaliana transcriptome, we have used whole-genome tiling arrays to analyze the effects of salt, osmotic, cold and heat stress as well as application of the hormone abscisic acid (ABA), an important mediator of stress responses. Among annotated genes in the reference strain Columbia we have found many stress-responsive genes, including several transcription factor genes as well as pseudogenes and transposons that have been missed in previous analyses with standard expression arrays. In addition, we report hundreds of newly identified, stress-induced transcribed regions. These often overlap with known, annotated genes. The results are accessible through the Arabidopsis thaliana Tiling Array Express (At-TAX) homepage, which provides convenient tools for displaying expression values of annotated genes, as well as visualization of unannotated transcribed regions along each chromosome.
Collapse
Affiliation(s)
- Georg Zeller
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | | | | | | | | | | | | |
Collapse
|
1934
|
Fullwood MJ, Wei CL, Liu ET, Ruan Y. Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res 2009; 19:521-32. [PMID: 19339662 DOI: 10.1101/gr.074906.107] [Citation(s) in RCA: 230] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Comprehensive understanding of functional elements in the human genome will require thorough interrogation and comparison of individual human genomes and genomic structures. Such an endeavor will require improvements in the throughputs and costs of DNA sequencing. Next-generation sequencing platforms have impressively low costs and high throughputs but are limited by short read lengths. An immediate and widely recognized solution to this critical limitation is the paired-end tag (PET) sequencing for various applications, collectively called the PET sequencing strategy, in which short and paired tags are extracted from the ends of long DNA fragments for ultra-high-throughput sequencing. The PET sequences can be accurately mapped to the reference genome, thus demarcating the genomic boundaries of PET-represented DNA fragments and revealing the identities of the target DNA elements. PET protocols have been developed for the analyses of transcriptomes, transcription factor binding sites, epigenetic sites such as histone modification sites, and genome structures. The exclusive advantage of the PET technology is its ability to uncover linkages between the two ends of DNA fragments. Using this unique feature, unconventional fusion transcripts, genome structural variations, and even molecular interactions between distant genomic elements can be unraveled by PET analysis. Extensive use of PET data could lead to efficient assembly of individual human genomes, transcriptomes, and interactomes, enabling new biological and clinical insights. With its versatile and powerful nature for DNA analysis, the PET sequencing strategy has a bright future ahead.
Collapse
Affiliation(s)
- Melissa J Fullwood
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore 138672, Singapore
| | | | | | | |
Collapse
|
1935
|
Turner DJ, Keane TM, Sudbery I, Adams DJ. Next-generation sequencing of vertebrate experimental organisms. Mamm Genome 2009; 20:327-38. [PMID: 19452216 PMCID: PMC2714443 DOI: 10.1007/s00335-009-9187-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2009] [Accepted: 04/21/2009] [Indexed: 12/22/2022]
Abstract
Next-generation sequencing technologies are revolutionizing biology by allowing for genome-wide transcription factor binding-site profiling, transcriptome sequencing, and more recently, whole-genome resequencing. While it is currently not possible to generate complete de novo assemblies of higher-vertebrate genomes using next-generation sequencing, improvements in sequence read lengths and throughput, coupled with new assembly algorithms for large data sets, will soon make this a reality. These developments will in turn spawn a revolution in how genomic data are used to understand genetics and how model organisms are used for disease gene discovery. This review provides an overview of the current next-generation sequencing platforms and the newest computational tools for the analysis of next-generation sequencing data. We also describe how next-generation sequencing may be applied in the context of vertebrate model organism genetics.
Collapse
Affiliation(s)
- Daniel J Turner
- Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | | | | | | |
Collapse
|
1936
|
Bloom JS, Khan Z, Kruglyak L, Singh M, Caudy AA. Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics 2009; 10:221. [PMID: 19435513 PMCID: PMC2686739 DOI: 10.1186/1471-2164-10-221] [Citation(s) in RCA: 126] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2008] [Accepted: 05/12/2009] [Indexed: 12/01/2022] Open
Abstract
Background High-throughput cDNA synthesis and sequencing of poly(A)-enriched RNA is rapidly emerging as a technology competing to replace microarrays as a quantitative platform for measuring gene expression. Results Consequently, we compared full length cDNA sequencing to 2-channel gene expression microarrays in the context of measuring differential gene expression. Because of its comparable cost to a gene expression microarray, our study focused on the data obtainable from a single lane of an Illumina 1 G sequencer. We compared sequencing data to a highly replicated microarray experiment profiling two divergent strains of S. cerevisiae. Conclusion Using a large number of quantitative PCR (qPCR) assays, more than previous studies, we found that neither technology is decisively better at measuring differential gene expression. Further, we report sequencing results from a diploid hybrid of two strains of S. cerevisiae that indicate full length cDNA sequencing can discover heterozygosity and measure quantitative allele-specific expression simultaneously.
Collapse
Affiliation(s)
- Joshua S Bloom
- Lewis-Sigler Institute of Integrative Genomics, Princeton University, New Jersey, USA.
| | | | | | | | | |
Collapse
|
1937
|
Fujisawa H, Horiuchi Y, Harushima Y, Takada T, Eguchi S, Mochizuki T, Sakaguchi T, Shiroishi T, Kurata N. SNEP: Simultaneous detection of nucleotide and expression polymorphisms using Affymetrix GeneChip. BMC Bioinformatics 2009; 10:131. [PMID: 19419536 PMCID: PMC2706822 DOI: 10.1186/1471-2105-10-131] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Accepted: 05/06/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-density short oligonucleotide microarrays are useful tools for studying biodiversity, because they can be used to investigate both nucleotide and expression polymorphisms. However, when different strains (or species) produce different signal intensities after mRNA hybridization, it is not easy to determine whether the signal intensities were affected by nucleotide or expression polymorphisms. To overcome this difficulty, nucleotide and expression polymorphisms are currently examined separately. RESULTS We have developed SNEP, a new method that allows simultaneous detection of both nucleotide and expression polymorphisms. SNEP involves a robust statistical procedure based on the idea that a nucleotide polymorphism observed at the probe level can be regarded as an outlier, because the nucleotide polymorphism can reduce the hybridization signal intensity. To investigate the performance of SNEP, we used three species: barley, rice and mice. In addition to the publicly available barley data, we obtained new rice and mouse data from the strains with available genome sequences. The sensitivity and false positive rate of nucleotide polymorphism detection were estimated based on the sequence information. The robustness of expression polymorphism detection against nucleotide polymorphisms was also investigated. CONCLUSION SNEP performed well regardless of the genome size and showed a better performance for nucleotide polymorphism detection, when compared with other previously proposed methods. The R-software 'SNEP' is available at http://www.ism.ac.jp/~fujisawa/SNEP/.
Collapse
|
1938
|
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. BIOINFORMATICS (OXFORD, ENGLAND) 2009; 25:1105-1111. [PMID: 19289445 DOI: 10.1093/bioin-formatics/btp120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
MOTIVATION A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or 'reads', can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. RESULTS We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20,000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. AVAILABILITY TopHat is free, open-source software available from http://tophat.cbcb.umd.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cole Trapnell
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.
| | | | | |
Collapse
|
1939
|
Fu X, Fu N, Guo S, Yan Z, Xu Y, Hu H, Menzel C, Chen W, Li Y, Zeng R, Khaitovich P. Estimating accuracy of RNA-Seq and microarrays with proteomics. BMC Genomics 2009; 10:161. [PMID: 19371429 PMCID: PMC2676304 DOI: 10.1186/1471-2164-10-161] [Citation(s) in RCA: 206] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2008] [Accepted: 04/16/2009] [Indexed: 12/11/2022] Open
Abstract
Background Microarrays revolutionized biological research by enabling gene expression comparisons on a transcriptome-wide scale. Microarrays, however, do not estimate absolute expression level accurately. At present, high throughput sequencing is emerging as an alternative methodology for transcriptome studies. Although free of many limitations imposed by microarray design, its potential to estimate absolute transcript levels is unknown. Results In this study, we evaluate relative accuracy of microarrays and transcriptome sequencing (RNA-Seq) using third methodology: proteomics. We find that RNA-Seq provides a better estimate of absolute expression levels. Conclusion Our result shows that in terms of overall technical performance, RNA-Seq is the technique of choice for studies that require accurate estimation of absolute transcript levels.
Collapse
Affiliation(s)
- Xing Fu
- Key lab of Systems Biology, Shanghai Institutes for Biological Sciences, China Academy of Sciences, Shanghai, 200031, PR China.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1940
|
Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 2009; 4:14. [PMID: 19371405 PMCID: PMC2678084 DOI: 10.1186/1745-6150-4-14] [Citation(s) in RCA: 330] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2009] [Accepted: 04/16/2009] [Indexed: 12/19/2022] Open
Abstract
Background Several recent studies have demonstrated the effectiveness of deep sequencing for transcriptome analysis (RNA-seq) in mammals. As RNA-seq becomes more affordable, whole genome transcriptional profiling is likely to become the platform of choice for species with good genomic sequences. As yet, a rigorous analysis methodology has not been developed and we are still in the stages of exploring the features of the data. Results We investigated the effect of transcript length bias in RNA-seq data using three different published data sets. For standard analyses using aggregated tag counts for each gene, the ability to call differentially expressed genes between samples is strongly associated with the length of the transcript. Conclusion Transcript length bias for calling differentially expressed genes is a general feature of current protocols for RNA-seq technology. This has implications for the ranking of differentially expressed genes, and in particular may introduce bias in gene set testing for pathway analysis and other multi-gene systems biology analyses. Reviewers This article was reviewed by Rohan Williams (nominated by Gavin Huttley), Nicole Cloonan (nominated by Mark Ragan) and James Bullard (nominated by Sandrine Dudoit).
Collapse
Affiliation(s)
- Alicia Oshlack
- Walter and Eliza Hall Institute of Medical Research, Parkville, Vic, Australia.
| | | |
Collapse
|
1941
|
Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, Lao K, Surani MA. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 2009; 6:377-82. [PMID: 19349980 DOI: 10.1038/nmeth.1315] [Citation(s) in RCA: 2214] [Impact Index Per Article: 147.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2008] [Accepted: 03/02/2009] [Indexed: 02/06/2023]
Abstract
Next-generation sequencing technology is a powerful tool for transcriptome analysis. However, under certain conditions, only a small amount of material is available, which requires more sensitive techniques that can preferably be used at the single-cell level. Here we describe a single-cell digital gene expression profiling assay. Using our mRNA-Seq assay with only a single mouse blastomere, we detected the expression of 75% (5,270) more genes than microarray techniques and identified 1,753 previously unknown splice junctions called by at least 5 reads. Moreover, 8-19% of the genes with multiple known transcript isoforms expressed at least two isoforms in the same blastomere or oocyte, which unambiguously demonstrated the complexity of the transcript variants at whole-genome scale in individual cells. Finally, for Dicer1(-/-) and Ago2(-/-) (Eif2c2(-/-)) oocytes, we found that 1,696 and 1,553 genes, respectively, were abnormally upregulated compared to wild-type controls, with 619 genes in common.
Collapse
Affiliation(s)
- Fuchou Tang
- Wellcome Trust-Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, University of Cambridge, Cambridge, UK
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1942
|
Lister R, Gregory BD, Ecker JR. Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond. CURRENT OPINION IN PLANT BIOLOGY 2009; 12:107-18. [PMID: 19157957 PMCID: PMC2723731 DOI: 10.1016/j.pbi.2008.11.004] [Citation(s) in RCA: 138] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2008] [Revised: 11/17/2008] [Accepted: 11/20/2008] [Indexed: 05/18/2023]
Abstract
The sudden availability of DNA sequencing technologies that rapidly produce vast amounts of sequence information has triggered a paradigm shift in genomics, enabling massively parallel surveying of complex nucleic acid populations. The diversity of applications to which these technologies have already been applied demonstrates the immense range of cellular processes and properties that can now be studied at the single-base resolution. These include genome resequencing and polymorphism discovery, mutation mapping, DNA methylation, histone modifications, transcriptome sequencing, gene discovery, alternative splicing identification, small RNA profiling, DNA-protein, and possibly even protein-protein interactions. Thus, these deep sequencing technologies offer plant biologists unprecedented opportunities to increase the understanding of the functions and dynamics of plant cells and populations.
Collapse
Affiliation(s)
- Ryan Lister
- Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Brian D. Gregory
- Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Joseph R. Ecker
- Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
- Corresponding author: Joseph R. Ecker, Plant Biology Laboratory and Genomic Analysis Laboratory, The Salk Institute for Biological Studies, 10010 N. Torrey Pines Rd., La Jolla, CA 92037, Telephone: (858) 453-4100 x1795, Fax: (858) 558-6379, E-mail:
| |
Collapse
|
1943
|
Application of 'next-generation' sequencing technologies to microbial genetics. Nat Rev Microbiol 2009; 7:287-96. [PMID: 19287448 DOI: 10.1038/nrmicro2122] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
New sequencing methods generate data that can allow the assembly of microbial genome sequences in days. With such revolutionary advances in technology come new challenges in methodologies and informatics. In this article, we review the capabilities of high-throughput sequencing technologies and discuss the many options for getting useful information from the data.
Collapse
|
1944
|
Wilhelm BT, Landry JR. RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods 2009; 48:249-57. [PMID: 19336255 DOI: 10.1016/j.ymeth.2009.03.016] [Citation(s) in RCA: 309] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Revised: 03/14/2009] [Accepted: 03/17/2009] [Indexed: 01/20/2023] Open
Abstract
The ability to quantitatively survey the global behavior of transcriptomes has been a key milestone in the field of systems biology, enabled by the advent of DNA microarrays. While this approach has literally transformed our vision and approach to cellular physiology, microarray technology has always been limited by the requirement to decide, a priori, what regions of the genome to examine. While very high density tiling arrays have reduced this limitation for simpler organisms, it remains an obstacle for larger, more complex, eukaryotic genomes. The recent development of "next-generation" massively parallel sequencing (MPS) technologies by companies such as Roche (454 GS FLX), Illumina (Genome Analyzer II), and ABI (AB SOLiD) has completely transformed the way in which quantitative transcriptomics can be done. These new technologies have reduced both the cost-per-reaction and time required by orders of magnitude, making the use of sequencing a cost-effective option for many experimental approaches. One such method that has recently been developed uses MPS technology to directly survey the RNA content of cells, without requiring any of the traditional cloning associated with EST sequencing. This approach, called "RNA-seq", can generate quantitative expression scores that are comparable to microarrays, with the added benefit that the entire transcriptome is surveyed without the requirement of a priori knowledge of transcribed regions. The important advantage of this technique is that not only can quantitative expression measures be made, but transcript structures including alternatively spliced transcript isoforms, can also be identified. This article discusses the experimental approach for both sample preparation and data analysis for the technique of RNA-seq.
Collapse
Affiliation(s)
- Brian T Wilhelm
- Laboratory of Molecular Genetics of Stem Cells, C.P. 6128 Succursale Centre-Ville, Montréal, Que. H3C3J7, Canada.
| | | |
Collapse
|
1945
|
Power KA, McRedmond JP, de Stefani A, Gallagher WM, Ó Gaora P. High-throughput proteomics detection of novel splice isoforms in human platelets. PLoS One 2009; 4:e5001. [PMID: 19308253 PMCID: PMC2654914 DOI: 10.1371/journal.pone.0005001] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Accepted: 02/20/2009] [Indexed: 12/16/2022] Open
Abstract
Alternative splicing (AS) is an intrinsic regulatory mechanism of all metazoans. Recent findings suggest that 100% of multiexonic human genes give rise to splice isoforms. AS can be specific to tissue type, environment or developmentally regulated. Splice variants have also been implicated in various diseases including cancer. Detection of these variants will enhance our understanding of the complexity of the human genome and provide disease-specific and prognostic biomarkers. We adopted a proteomics approach to identify exon skip events - the most common form of AS. We constructed a database harboring the peptide sequences derived from all hypothetical exon skip junctions in the human genome. Searching tandem mass spectrometry (MS/MS) data against the database allows the detection of exon skip events, directly at the protein level. Here we describe the application of this approach to human platelets, including the mRNA-based verification of novel splice isoforms of ITGA2, NPEPPS and FH. This methodology is applicable to all new or existing MS/MS datasets.
Collapse
Affiliation(s)
- Karen A. Power
- UCD Conway Institute and UCD School of Biomolecular & Biomedical Sciences, UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | - James P. McRedmond
- UCD Conway Institute and UCD School of Biomolecular & Biomedical Sciences, UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | | | - William M. Gallagher
- UCD Conway Institute and UCD School of Biomolecular & Biomedical Sciences, UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | - Peadar Ó Gaora
- UCD Conway Institute and UCD School of Medicine & Medical Sciences, UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland
- * E-mail:
| |
Collapse
|
1946
|
Abstract
Although gene expression has been studied in bacteria for decades, many aspects of the bacterial transcriptome remain poorly understood. Transcript structure, operon linkages, and information on absolute abundance all provide valuable insights into gene function and regulation, but none has ever been determined on a genome-wide scale for any bacterium. Indeed, these aspects of the prokaryotic transcriptome have been explored on a large scale in only a few instances, and consequently little is known about the absolute composition of the mRNA population within a bacterial cell. Here we report the use of a high-throughput sequencing-based approach in assembling the first comprehensive, single-nucleotide resolution view of a bacterial transcriptome. We sampled the Bacillus anthracis transcriptome under a variety of growth conditions and showed that the data provide an accurate and high-resolution map of transcript start sites and operon structure throughout the genome. Further, the sequence data identified previously nonannotated regions with significant transcriptional activity and enhanced the accuracy of existing genome annotations. Finally, our data provide estimates of absolute transcript abundance and suggest that there is significant transcriptional heterogeneity within a clonal, synchronized bacterial population. Overall, our results offer an unprecedented view of gene expression and regulation in a bacterial cell.
Collapse
|
1947
|
Abstract
Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact:cole@cs.umd.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cole Trapnell
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.
| | | | | |
Collapse
|
1948
|
Mixed-species genomic microarray analysis of fecal samples reveals differential transcriptional responses of bifidobacteria in breast- and formula-fed infants. Appl Environ Microbiol 2009; 75:2668-76. [PMID: 19286790 DOI: 10.1128/aem.02492-08] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Although their exact function remains enigmatic, bifidobacteria are among the first colonizers of the newborn infant gut and further develop into abundant communities, notably in response to diet. Therefore, the transcriptional responses of bifidobacteria in rapidly processed fecal samples from young infants that were fed either breast milk or a formula containing a mixture of galacto- and fructo-oligosaccharides were studied. The presence and diversity of the bifidobacterial fecal communities were determined using PCR-denaturing gradient gel electrophoresis and quantitative real-time PCR for specific species. Changes in the total number of bifidobacteria as well as in species diversity were observed, indicating the metabolic activities of the bifidobacteria within the infant gut. In addition, total RNAs isolated from infant feces were labeled and hybridized to a bifidobacterium-specific microarray comprising approximately 6,000 clones of the major bifidobacterial species of the human gut. Approximately 270 clones that showed the most prominent hybridization with the samples were sequenced. Fewer than 10% of the hybridizing clones contained rRNA genes, whereas the vast majority of the inserts showed matches with protein-encoding genes predicted to originate from bifidobacteria. Although a wide range of functional groups was covered by the obtained sequences, the largest fraction (14%) of the transcribed genes assigned to a functional category were predicted to be involved in carbohydrate metabolism, while some were also implicated in exopolysaccharide production or folate production. A total of three of the above-described protein-encoding genes were selected for quantitative PCR and sequence analyses, which confirmed the expression of the corresponding genes and the expected nucleotide sequences. In conclusion, the results of this study show the feasibility of obtaining insight into the transcriptional responses of intestinal bifidobacteria by analyzing fecal RNA and highlight the in vivo expression of bifidobacterial genes implicated in host-related functions.
Collapse
|
1949
|
Chipping away at diagnostics for neurodegenerative diseases. Neurobiol Dis 2009; 35:148-56. [PMID: 19285134 DOI: 10.1016/j.nbd.2009.02.016] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2008] [Revised: 02/16/2009] [Accepted: 02/19/2009] [Indexed: 12/15/2022] Open
Abstract
Biomarkers are needed to overcome critical roadblocks in the development of disease-modifying therapeutics for neurodegenerative diseases. Evolving genome-wide expression technologies can comprehensively search for molecular biomarkers and allow fascinating insights into the expanding complexity of the human transcriptome. The technology has matured to the point where some applications are deemed reliable enough for use in patient care. In the neurosciences, it has led to the discoveries of osteopontin in multiple sclerosis and SORL1/LR11 in Alzheimer's, and recent studies indicate its potential for identifying neurogenomic biomarkers. Advances in pre-analytical and analytical methods are improving search efficiency and reproducibility and may lead to a pipeline of biomarker candidates suitable for development into future neurologic diagnostics.
Collapse
|
1950
|
Xie C, Tammi MT. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 2009; 10:80. [PMID: 19267900 PMCID: PMC2667514 DOI: 10.1186/1471-2105-10-80] [Citation(s) in RCA: 395] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Accepted: 03/06/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. RESULTS Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads. CONCLUSION Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%. We also show the results for assessment of CNV between two individual human genomes.
Collapse
Affiliation(s)
- Chao Xie
- Department of Biological Sciences, National University of Singapore, Singapore.
| | | |
Collapse
|