11201
|
Loss of cerebral cavernous malformation 3 (Ccm3) in neuroglia leads to CCM and vascular pathology. Proc Natl Acad Sci U S A 2011; 108:3737-42. [PMID: 21321212 DOI: 10.1073/pnas.1012617108] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Communication between neural cells and the vasculature is integral to the proper development and later function of the central nervous system. A mechanistic understanding of the interactions between components of the neurovascular unit has implications for various disorders, including cerebral cavernous malformations (CCMs) in which focal vascular lesions form throughout the central nervous system. Loss of function mutations in three genes with proven endothelial cell autonomous roles, CCM1/krev1 interaction trapped gene 1, CCM2, and CCM3/programmed cell death 10, cause familial CCM. By using neural specific conditional mouse mutants, we show that Ccm3 has both neural cell autonomous and nonautonomous functions. Gfap- or Emx1-Cre-mediated Ccm3 neural deletion leads to increased proliferation, increased survival, and activation of astrocytes through cell autonomous mechanisms involving activated Akt signaling. In addition, loss of neural CCM3 results in a vascular phenotype characterized by diffusely dilated and simplified cerebral vasculature along with formation of multiple vascular lesions that closely resemble human cavernomas through cell nonautonomous mechanisms. RNA sequencing of the vascular lesions shows abundant expression of molecules involved in cytoskeletal remodeling, including protein kinase A and Rho-GTPase signaling. Our findings implicate neural cells in the pathogenesis of CCMs, showing the importance of this pathway in neural/vascular interactions within the neurovascular unit.
Collapse
|
11202
|
Deng N, Puetter A, Zhang K, Johnson K, Zhao Z, Taylor C, Flemington EK, Zhu D. Isoform-level microRNA-155 target prediction using RNA-seq. Nucleic Acids Res 2011; 39:e61. [PMID: 21317189 PMCID: PMC3089486 DOI: 10.1093/nar/gkr042] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Computational prediction of microRNA targets remains a challenging problem. The existing rule-based, data-driven and expression profiling approaches to target prediction are mostly approached from the gene-level. The increasing availability of RNA-seq data provides a new perspective for microRNA target prediction on the isoform-level. We hypothesize that the splicing isoform is the ultimate effector in microRNA targeting and that the proposed isoform-level approach is capable of predicting non-dominant isoform targets as well as their targeting regions that are otherwise invisible to many existing approaches. To test the hypothesis, we used an iterative expectation maximization (EM) algorithm to quantify transcriptomes at the isoform-level. The performance of the EM algorithm in transcriptome quantification was examined in simulation studies using FluxSimulator. We used joint evidence from isoform-level down-regulation and seed enrichment to predict microRNA-155 targets. We validated our computational approach using results from 149 in-house performed in vitro 3′-UTR assays. We also augmented the splicing database using exon–exon junction evidence, and applied the EM algorithm to predict and quantify 1572 cell line specific novel isoforms. Combined with seed enrichment analysis, we predicted 51 novel microRNA-155 isoform targets. Our work is among the first computational studies advocating the isoform-level microRNA target prediction.
Collapse
Affiliation(s)
- Nan Deng
- Department of Computer Science, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA 70148, USA
| | | | | | | | | | | | | | | |
Collapse
|
11203
|
Turro E, Su SY, Gonçalves Â, Coin LJM, Richardson S, Lewin A. Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Biol 2011; 12:R13. [PMID: 21310039 PMCID: PMC3188795 DOI: 10.1186/gb-2011-12-2-r13] [Citation(s) in RCA: 180] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2010] [Revised: 11/17/2010] [Accepted: 02/10/2011] [Indexed: 11/11/2022] Open
Abstract
We present a novel pipeline and methodology for simultaneously estimating isoform expression and allelic imbalance in diploid organisms using RNA-seq data. We achieve this by modeling the expression of haplotype-specific isoforms. If unknown, the two parental isoform sequences can be individually reconstructed. A new statistical method, MMSEQ, deconvolves the mapping of reads to multiple transcripts (isoforms or haplotype-specific isoforms). Our software can take into account non-uniform read generation and works with paired-end reads.
Collapse
Affiliation(s)
- Ernest Turro
- Department of Epidemiology and Biostatistics, Imperial College London, Norfolk Place, London, W2 1PG, UK.
| | | | | | | | | | | |
Collapse
|
11204
|
Lott SE, Villalta JE, Schroth GP, Luo S, Tonkin LA, Eisen MB. Noncanonical compensation of zygotic X transcription in early Drosophila melanogaster development revealed through single-embryo RNA-seq. PLoS Biol 2011; 9:e1000590. [PMID: 21346796 PMCID: PMC3035605 DOI: 10.1371/journal.pbio.1000590] [Citation(s) in RCA: 153] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2010] [Accepted: 12/22/2010] [Indexed: 01/15/2023] Open
Abstract
Mmany genes from the X chromosome are expressed at the same level in female and male embryos during early Drosophila development, prior to the establishment of MSL-mediated dosage compensation, suggesting the existence of a novel mechanism. When Drosophila melanogaster embryos initiate zygotic transcription around mitotic cycle 10, the dose-sensitive expression of specialized genes on the X chromosome triggers a sex-determination cascade that, among other things, compensates for differences in sex chromosome dose by hypertranscribing the single X chromosome in males. However, there is an approximately 1 hour delay between the onset of zygotic transcription and the establishment of canonical dosage compensation near the end of mitotic cycle 14. During this time, zygotic transcription drives segmentation, cellularization, and other important developmental events. Since many of the genes involved in these processes are on the X chromosome, we wondered whether they are transcribed at higher levels in females and whether this might lead to sex-specific early embryonic patterning. To investigate this possibility, we developed methods to precisely stage, sex, and characterize the transcriptomes of individual embryos. We measured genome-wide mRNA abundance in male and female embryos at eight timepoints, spanning mitotic cycle 10 through late cycle 14, using polymorphisms between parental lines to distinguish maternal and zygotic transcription. We found limited sex-specific zygotic transcription, with a weak tendency for genes on the X to be expressed at higher levels in females. However, transcripts derived from the single X chromosome in males were more abundant that those derived from either X chromosome in females, demonstrating that there is widespread dosage compensation prior to the activation of the canonical MSL-mediated dosage compensation system. Crucially, this new system of early zygotic dosage compensation results in nearly identical transcript levels for key X-linked developmental regulators, including giant (gt), brinker (brk), buttonhead (btd), and short gastrulation (sog), in male and female embryos. Variation in gene dose can have profound effects on animal development. Yet every generation, animals must cope with differences in sex chromosome numbers. Drosophila compensate for the difference in X chromosome dosage (two in females, one in males) with a mechanism that allows for more transcription of the single X chromosome in males. But this mechanism is not established until over an hour after the embryo begins transcription, during which time a number of important events in development occur such as cellularization and segmentation. Here we use an mRNA sequencing method to characterize gene expression in individual female and male embryos before the onset of the previously characterized dosage compensation system. While we find more transcripts from X chromosomal genes in females, we also find many genes with equal transcript levels in males and females. These results indicate that there is an alternate mechanism to compensate for dosage acting earlier in development, prior to the onset of the previously characterized dosage compensation system.
Collapse
Affiliation(s)
- Susan E Lott
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America.
| | | | | | | | | | | |
Collapse
|
11205
|
Lister R, Pelizzola M, Kida YS, Hawkins RD, Nery JR, Hon G, Antosiewicz-Bourget J, O'Malley R, Castanon R, Klugman S, Downes M, Yu R, Stewart R, Ren B, Thomson JA, Evans RM, Ecker JR. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 2011; 471:68-73. [PMID: 21289626 DOI: 10.1038/nature09798] [Citation(s) in RCA: 1124] [Impact Index Per Article: 80.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2010] [Accepted: 01/11/2011] [Indexed: 11/09/2022]
Abstract
Induced pluripotent stem cells (iPSCs) offer immense potential for regenerative medicine and studies of disease and development. Somatic cell reprogramming involves epigenomic reconfiguration, conferring iPSCs with characteristics similar to embryonic stem (ES) cells. However, it remains unknown how complete the reestablishment of ES-cell-like DNA methylation patterns is throughout the genome. Here we report the first whole-genome profiles of DNA methylation at single-base resolution in five human iPSC lines, along with methylomes of ES cells, somatic cells, and differentiated iPSCs and ES cells. iPSCs show significant reprogramming variability, including somatic memory and aberrant reprogramming of DNA methylation. iPSCs share megabase-scale differentially methylated regions proximal to centromeres and telomeres that display incomplete reprogramming of non-CG methylation, and differences in CG methylation and histone modifications. Lastly, differentiation of iPSCs into trophoblast cells revealed that errors in reprogramming CG methylation are transmitted at a high frequency, providing an iPSC reprogramming signature that is maintained after differentiation.
Collapse
Affiliation(s)
- Ryan Lister
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, California 92037, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11206
|
Tisserant E, Da Silva C, Kohler A, Morin E, Wincker P, Martin F. Deep RNA sequencing improved the structural annotation of the Tuber melanosporum transcriptome. THE NEW PHYTOLOGIST 2011; 189:883-891. [PMID: 21223284 DOI: 10.1111/j.1469-8137.2010.03597.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
• The functional complexity of the Tuber melanosporum transcriptome has not yet been fully elucidated. Here, we applied high-throughput Illumina RNA-sequencing (RNA-Seq) to the transcriptome of T. melanosporum at different major developmental stages, that is free-living mycelium, fruiting body and ectomycorrhiza. • Sequencing of cDNA libraries generated a total of c. 24 million sequence reads representing > 882 Mb of sequence data. To construct a coverage signal profile across the genome, all reads were then aligned to the reference genome assembly of T. melanosporum Mel28. • We were able to identify a substantial number of novel transcripts, antisense transcripts, new exons, untranslated regions (UTRs), alternative upstream initiation codons and upstream open reading frames. • This RNA-Seq analysis allowed us to improve the genome annotation. It also provided us with a genome-wide view of the transcriptional and post-transcriptional mechanisms generating an increased number of transcript isoforms during major developmental transitions in T. melanosporum.
Collapse
Affiliation(s)
- E Tisserant
- INRA, UMR INRA/Nancy Université'Interactions Arbres/Micro-Organismes', INRA-Nancy, 54280 Champenoux, France
| | - C Da Silva
- CEA, IG, Genoscope, 2 rue Gaston Crémieux CP5702, F-91057 Evry, France
| | - A Kohler
- INRA, UMR INRA/Nancy Université'Interactions Arbres/Micro-Organismes', INRA-Nancy, 54280 Champenoux, France
| | - E Morin
- INRA, UMR INRA/Nancy Université'Interactions Arbres/Micro-Organismes', INRA-Nancy, 54280 Champenoux, France
| | - P Wincker
- CEA, IG, Genoscope, 2 rue Gaston Crémieux CP5702, F-91057 Evry, France
| | - F Martin
- INRA, UMR INRA/Nancy Université'Interactions Arbres/Micro-Organismes', INRA-Nancy, 54280 Champenoux, France
| |
Collapse
|
11207
|
Abstract
In the few years since its initial application, massively parallel cDNA sequencing, or RNA-seq, has allowed many advances in the characterization and quantification of transcriptomes. Recently, several developments in RNA-seq methods have provided an even more complete characterization of RNA transcripts. These developments include improvements in transcription start site mapping, strand-specific measurements, gene fusion detection, small RNA characterization and detection of alternative splicing events. Ongoing developments promise further advances in the application of RNA-seq, particularly direct RNA sequencing and approaches that allow RNA quantification from very small amounts of cellular materials.
Collapse
Affiliation(s)
- Fatih Ozsolak
- Helicos BioSciences Corporation, One Kendall Square, Cambridge, Massachusetts 02139, USA.
| | | |
Collapse
|
11208
|
Abstract
Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.
Collapse
Affiliation(s)
- Julia Salzman
- Research Associate in the Department of Statistics and Biochemistry, Stanford University, Stanford, California 94305, USA
| | | | | |
Collapse
|
11209
|
Battke F, Nieselt K. Mayday SeaSight: combined analysis of deep sequencing and microarray data. PLoS One 2011; 6:e16345. [PMID: 21305015 PMCID: PMC3031553 DOI: 10.1371/journal.pone.0016345] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2010] [Accepted: 12/11/2010] [Indexed: 11/18/2022] Open
Abstract
Recently emerged deep sequencing technologies offer new high-throughput methods to quantify gene expression, epigenetic modifications and DNA-protein binding. From a computational point of view, the data is very different from that produced by the already established microarray technology, providing a new perspective on the samples under study and complementing microarray gene expression data. Software offering the integrated analysis of data from different technologies is of growing importance as new data emerge in systems biology studies. Mayday is an extensible platform for visual data exploration and interactive analysis and provides many methods for dissecting complex transcriptome datasets. We present Mayday SeaSight, an extension that allows to integrate data from different platforms such as deep sequencing and microarrays. It offers methods for computing expression values from mapped reads and raw microarray data, background correction and normalization and linking microarray probes to genomic coordinates. It is now possible to use Mayday's wealth of methods to analyze sequencing data and to combine data from different technologies in one analysis.
Collapse
Affiliation(s)
- Florian Battke
- Center for Bioinformatics, University of Tübingen, Tübingen, Germany.
| | | |
Collapse
|
11210
|
CULLUM R, ALDER O, HOODLESS PA. The next generation: Using new sequencing technologies to analyse gene regulation. Respirology 2011; 16:210-22. [DOI: 10.1111/j.1440-1843.2010.01899.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
11211
|
Twine NA, Janitz K, Wilkins MR, Janitz M. Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer's disease. PLoS One 2011; 6:e16266. [PMID: 21283692 PMCID: PMC3025006 DOI: 10.1371/journal.pone.0016266] [Citation(s) in RCA: 219] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2010] [Accepted: 12/08/2010] [Indexed: 11/18/2022] Open
Abstract
Recent studies strongly indicate that aberrations in the control of gene expression might contribute to the initiation and progression of Alzheimer's disease (AD). In particular, alternative splicing has been suggested to play a role in spontaneous cases of AD. Previous transcriptome profiling of AD models and patient samples using microarrays delivered conflicting results. This study provides, for the first time, transcriptomic analysis for distinct regions of the AD brain using RNA-Seq next-generation sequencing technology. Illumina RNA-Seq analysis was used to survey transcriptome profiles from total brain, frontal and temporal lobe of healthy and AD post-mortem tissue. We quantified gene expression levels, splicing isoforms and alternative transcript start sites. Gene Ontology term enrichment analysis revealed an overrepresentation of genes associated with a neuron's cytological structure and synapse function in AD brain samples. Analysis of the temporal lobe with the Cufflinks tool revealed that transcriptional isoforms of the apolipoprotein E gene, APOE-001, -002 and -005, are under the control of different promoters in normal and AD brain tissue. We also observed differing expression levels of APOE-001 and -002 splice variants in the AD temporal lobe. Our results indicate that alternative splicing and promoter usage of the APOE gene in AD brain tissue might reflect the progression of neurodegeneration.
Collapse
Affiliation(s)
- Natalie A. Twine
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
- New South Wales Systems Biology Initiative, University of New South Wales, Sydney, New South Wales, Australia
| | - Karolina Janitz
- Ramaciotti Centre for Gene Function Analysis, University of New South Wales, Sydney, New South Wales, Australia
| | - Marc R. Wilkins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
- New South Wales Systems Biology Initiative, University of New South Wales, Sydney, New South Wales, Australia
- Ramaciotti Centre for Gene Function Analysis, University of New South Wales, Sydney, New South Wales, Australia
| | - Michal Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
- * E-mail:
| |
Collapse
|
11212
|
Booker M, Samsonova AA, Kwon Y, Flockhart I, Mohr SE, Perrimon N. False negative rates in Drosophila cell-based RNAi screens: a case study. BMC Genomics 2011; 12:50. [PMID: 21251254 PMCID: PMC3036618 DOI: 10.1186/1471-2164-12-50] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Accepted: 01/20/2011] [Indexed: 01/13/2023] Open
Abstract
Background High-throughput screening using RNAi is a powerful gene discovery method but is often complicated by false positive and false negative results. Whereas false positive results associated with RNAi reagents has been a matter of extensive study, the issue of false negatives has received less attention. Results We performed a meta-analysis of several genome-wide, cell-based Drosophila RNAi screens, together with a more focused RNAi screen, and conclude that the rate of false negative results is at least 8%. Further, we demonstrate how knowledge of the cell transcriptome can be used to resolve ambiguous results and how the number of false negative results can be reduced by using multiple, independently-tested RNAi reagents per gene. Conclusions RNAi reagents that target the same gene do not always yield consistent results due to false positives and weak or ineffective reagents. False positive results can be partially minimized by filtering with transcriptome data. RNAi libraries with multiple reagents per gene also reduce false positive and false negative outcomes when inconsistent results are disambiguated carefully.
Collapse
Affiliation(s)
- Matthew Booker
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | | | | | | | | | | |
Collapse
|
11213
|
Sutherland GT, Janitz M, Kril JJ. Understanding the pathogenesis of Alzheimer's disease: will RNA-Seq realize the promise of transcriptomics? J Neurochem 2011; 116:937-46. [PMID: 21175619 DOI: 10.1111/j.1471-4159.2010.07157.x] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The prevalence of Alzheimer's disease (AD) is increasing rapidly in the western world and is poised to have a significant economic and societal impact. Current treatments do not alter the underlying disease processes meaning new treatments are required if this imminent epidemic is to be averted. The clinical manifestations of AD are secondary to a substantial loss of cortical neurons. To be effective, neuroprotective strategies will need to be implemented prior to this cell loss. However, this requires the discovery of both pre-clinical markers to identify susceptible patients and the early pathogenic mechanisms to serve as therapeutic targets. Although the biomarkers and pathogenic mechanisms may overlap, it is likely that new approaches are required to identify novel elements of the disease. Transcriptomic analyses, that assume no a priori etiological hypotheses, promise much in elucidating the pathogenesis of complex diseases like AD. Microarrays are the most popular platform for transcriptomic analysis and have been applied across AD models, patient samples and postmortem brain tissue. The results of these studies have been largely discordant which could, to some extent, reflect the limitations of this probe-hybridization-based methodology. In comparison, whole transcriptome sequencing (RNA-Seq) utilizes a highly efficient, next-generation DNA sequencing method with improved dynamic range and scope of transcript detection. RNA-Seq is not only highly suited to investigations of the genomically complex human brain tissue but it can potentially overcome technical issues inherent to case-control comparisons of postmortem brain tissue in neurodegenerative diseases. The volume of data generated by this platform looms as the major logistical hurdle and a systematic experimental approach will be required to maximise the detection of pathogenically relevant signals. Nevertheless, RNA-Seq looks set to deliver a quantum leap forward in our understanding of AD pathogenesis.
Collapse
Affiliation(s)
- Greg T Sutherland
- Discipline of Pathology, Sydney Medical School, University of Sydney, Sydney, NSW, Australia.
| | | | | |
Collapse
|
11214
|
Goncalves A, Tikhonov A, Brazma A, Kapushesky M. A pipeline for RNA-seq data processing and quality assessment. ACTA ACUST UNITED AC 2011; 27:867-9. [PMID: 21233166 PMCID: PMC3051320 DOI: 10.1093/bioinformatics/btr012] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Summary: We present an R based pipeline, ArrayExpressHTS, for pre-processing, expression estimation and data quality assessment of high-throughput sequencing transcriptional profiling (RNA-seq) datasets. The pipeline starts from raw sequence files and produces standard Bioconductor R objects containing gene or transcript measurements for downstream analysis along with web reports for data quality assessment. It may be run locally on a user's own computer or remotely on a distributed R-cloud farm at the European Bioinformatics Institute. It can be used to analyse user's own datasets or public RNA-seq datasets from the ArrayExpress Archive. Availability: The R package is available at www.ebi.ac.uk/tools/rcloud with online documentation at www.ebi.ac.uk/Tools/rwiki/, also available as supplementary material. Contact:angela.goncalves@ebi.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Angela Goncalves
- EMBL Outstation-Hinxton, European Bioinformatics Institute, Cambridge, UK.
| | | | | | | |
Collapse
|
11215
|
Next-generation sequence analysis. Nat Biotechnol 2011. [DOI: 10.1038/nbt0111-45b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
11216
|
|
11217
|
Nicolae M, Măndoiu I. Accurate Estimation of Gene Expression Levels from DGE Sequencing Data. BIOINFORMATICS RESEARCH AND APPLICATIONS 2011. [DOI: 10.1007/978-3-642-21260-4_37] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
11218
|
Saxena A, Carninci P. Whole transcriptome analysis: what are we still missing? WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2010; 3:527-43. [PMID: 21197667 DOI: 10.1002/wsbm.135] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
New technologies such as tag-based sequencing and tiling arrays have provided unique insights into the transcriptional output of cells. Many new RNA classes have been uncovered in the past decade, despite limitations in current technologies. Even as the repertoire of known functional elements of the transcriptome increases and contemporary technologies become mainstream, inadequacies in conventional protocols for library preparation, sequencing and mapping continue to hamper revelation of the entire transcriptome of cells. In this article, we review current protocols and outline their deficiencies. We also provide our view on what we may be overlooking in the transcriptome, despite exhaustive investigations, and indicate future areas of technological development and research.
Collapse
Affiliation(s)
- Alka Saxena
- Omics Science Center, RIKEN Yokohama Institute, Tsurumi, Japan
| | | |
Collapse
|
11219
|
Abstract
In the few years since its initial application, massively parallel cDNA sequencing, or RNA-seq, has allowed many advances in the characterization and quantification of transcriptomes. Recently, several developments in RNA-seq methods have provided an even more complete characterization of RNA transcripts. These developments include improvements in transcription start site mapping, strand-specific measurements, gene fusion detection, small RNA characterization and detection of alternative splicing events. Ongoing developments promise further advances in the application of RNA-seq, particularly direct RNA sequencing and approaches that allow RNA quantification from very small amounts of cellular materials.
Collapse
Affiliation(s)
- Fatih Ozsolak
- Helicos BioSciences Corporation, One Kendall Square, Cambridge, Massachusetts 02139, USA.
| | | |
Collapse
|
11220
|
Abstract
Drosophila melanogaster is one of the most well studied genetic model organisms, nonetheless its genome still contains unannotated coding and non-coding genes, transcripts, exons, and RNA editing sites. Full discovery and annotation are prerequisites for understanding how the regulation of transcription, splicing, and RNA editing directs development of this complex organism. We used RNA-Seq, tiling microarrays, and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. Together, these data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
Collapse
|
11221
|
Abstract
Many methods and tools are available for preprocessing high-throughput RNA sequencing data and detecting differential expression.
Collapse
Affiliation(s)
- Alicia Oshlack
- Bioinformatics Division, Walter and Eliza Hall Institute, 1G Royal Parade, Parkville 3052, Australia.
| | | | | |
Collapse
|
11222
|
Wu Z, Wang X, Zhang X. Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq. ACTA ACUST UNITED AC 2010; 27:502-8. [PMID: 21169371 DOI: 10.1093/bioinformatics/btq696] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION RNA-Seq technology based on next-generation sequencing provides the unprecedented ability of studying transcriptomes at high resolution and accuracy, and the potential of measuring expression of multiple isoforms from the same gene at high precision. Solved by maximum likelihood estimation, isoform expression can be inferred in RNA-Seq using statistical models based on the assumption that sequenced reads are distributed uniformly along transcripts. Modification of the model is needed when considering situations where RNA-Seq data do not follow uniform distribution. RESULTS We proposed two curves, the global bias curve (GBC) and the local bias curves (LBCs), to describe the non-uniformity of read distributions for all genes in a transcriptome and for each gene, respectively. Incorporating the bias curves into the uniform read distribution (URD) model, we introduced non-URD (N-URD) models to infer isoform expression levels. On a series of systematic simulation studies, the proposed models outperform the original model in recovering major isoforms and the expression ratio of alternative isoforms. We also applied the new model to real RNA-Seq datasets and found that its inferences on expression ratios of alternative isoforms are more reasonable. The experiments indicate that incorporating N-URD information can improve the accuracy in modeling and inferring isoform expression in RNA-Seq.
Collapse
Affiliation(s)
- Zhengpeng Wu
- TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
11223
|
Huang W, Khatib H. Comparison of transcriptomic landscapes of bovine embryos using RNA-Seq. BMC Genomics 2010; 11:711. [PMID: 21167046 PMCID: PMC3019235 DOI: 10.1186/1471-2164-11-711] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2010] [Accepted: 12/17/2010] [Indexed: 11/10/2022] Open
Abstract
Background Advances in sequencing technologies have opened a new era of high throughput investigations. Although RNA-seq has been demonstrated in many organisms, no study has provided a comprehensive investigation of the bovine transcriptome using RNA-seq. Results In this study, we provide a deep survey of the bovine embryonic transcriptomes, the first application of RNA-seq in cattle. Embryos cultured in vitro were used as models to study early embryonic development in cattle. RNA amplified from limited amounts of starting total RNA were sequenced and mapped to the reference genome to obtain digital gene expression at single base resolution. In particular, gene expression estimates from more than 1.6 million unannotated bases in 1785 novel transcribed units were obtained. We compared the transcriptomes of embryos showing distinct developmental statuses and found genes that showed differential overall expression as well as alternative splicing. Conclusion Our study demonstrates the power of RNA-seq and provides further understanding of bovine preimplantation embryonic development at a fine scale.
Collapse
Affiliation(s)
- Wen Huang
- Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA
| | | |
Collapse
|
11224
|
Blythe MJ, Kao D, Malla S, Rowsell J, Wilson R, Evans D, Jowett J, Hall A, Lemay V, Lam S, Aboobaker AA. A dual platform approach to transcript discovery for the planarian Schmidtea mediterranea to establish RNAseq for stem cell and regeneration biology. PLoS One 2010; 5:e15617. [PMID: 21179477 PMCID: PMC3001875 DOI: 10.1371/journal.pone.0015617] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2010] [Accepted: 11/13/2010] [Indexed: 12/30/2022] Open
Abstract
The use of planarians as a model system is expanding and the mechanisms that control planarian regeneration are being elucidated. The planarian Schmidtea mediterranea in particular has become a species of choice. Currently the planarian research community has access to this whole genome sequencing project and over 70,000 expressed sequence tags. However, the establishment of massively parallel sequencing technologies has provided the opportunity to define genetic content, and in particular transcriptomes, in unprecedented detail. Here we apply this approach to the planarian model system. We have sequenced, mapped and assembled 581,365 long and 507,719,814 short reads from RNA of intact and mixed stages of the first 7 days of planarian regeneration. We used an iterative mapping approach to identify and define de novo splice sites with short reads and increase confidence in our transcript predictions. We more than double the number of transcripts currently defined by publicly available ESTs, resulting in a collection of 25,053 transcripts described by combining platforms. We also demonstrate the utility of this collection for an RNAseq approach to identify potential transcripts that are enriched in neoblast stem cells and their progeny by comparing transcriptome wide expression levels between irradiated and intact planarians. Our experiments have defined an extensive planarian transcriptome that can be used as a template for RNAseq and can also help to annotate the S. mediterranea genome. We anticipate that suites of other 'omic approaches will also be facilitated by building on this comprehensive data set including RNAseq across many planarian regenerative stages, scenarios, tissues and phenotypes generated by RNAi.
Collapse
Affiliation(s)
- Martin J. Blythe
- Deep Seq, Faculty of Medicine and Health Sciences, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
| | - Damian Kao
- Evolutionary Developmental Biology Laboratory, Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
| | - Sunir Malla
- Deep Seq, Faculty of Medicine and Health Sciences, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
| | - Joanna Rowsell
- Deep Seq, Faculty of Medicine and Health Sciences, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
| | - Ray Wilson
- Deep Seq, Faculty of Medicine and Health Sciences, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
| | - Deborah Evans
- Evolutionary Developmental Biology Laboratory, Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
| | - Jamie Jowett
- Evolutionary Developmental Biology Laboratory, Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
| | - Amy Hall
- Evolutionary Developmental Biology Laboratory, Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
| | - Virginie Lemay
- Evolutionary Developmental Biology Laboratory, Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
| | - Sabrina Lam
- Evolutionary Developmental Biology Laboratory, Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
| | - A. Aziz Aboobaker
- Deep Seq, Faculty of Medicine and Health Sciences, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
- Evolutionary Developmental Biology Laboratory, Centre for Genetics and Genomics, Queen's Medical Centre, University of Nottingham, Nottingham, United Kingdom
- * E-mail:
| |
Collapse
|
11225
|
Pickrell JK, Pai AA, Gilad Y, Pritchard JK. Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet 2010; 6:e1001236. [PMID: 21151575 PMCID: PMC3000347 DOI: 10.1371/journal.pgen.1001236] [Citation(s) in RCA: 213] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2010] [Accepted: 11/03/2010] [Indexed: 11/18/2022] Open
Abstract
While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution. Most human genes are split into pieces, such that the protein-coding parts (exons) are separated in the genome by large tracts of non-coding DNA (introns) that must be transcribed and spliced out to create a functional transcript. Variation in splicing reactions can create multiple transcripts from the same gene, yet the function for many of these alternative transcripts is unknown. In this study, we show that many of these transcripts are due to splicing errors which are not preserved over evolutionary time. We estimate that the error rate in the splicing of an intron is about 0.7% and demonstrate that there are two major types of splicing error: errors in the recognition of exons and errors in the precise choice of splice site. These results raise the possibility that variation in levels of alternative splicing across species may in part be to variation in splicing error rate.
Collapse
Affiliation(s)
- Joseph K. Pickrell
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (JK Pickrell); (AA Pai); (Y Gilad); (JK Pritchard)
| | - Athma A. Pai
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (JK Pickrell); (AA Pai); (Y Gilad); (JK Pritchard)
| | - Yoav Gilad
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (JK Pickrell); (AA Pai); (Y Gilad); (JK Pritchard)
| | - Jonathan K. Pritchard
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
- Howard Hughes Medical Institute, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (JK Pickrell); (AA Pai); (Y Gilad); (JK Pritchard)
| |
Collapse
|
11226
|
Habegger L, Sboner A, Gianoulis TA, Rozowsky J, Agarwal A, Snyder M, Gerstein M. RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries. ACTA ACUST UNITED AC 2010; 27:281-3. [PMID: 21134889 PMCID: PMC3018817 DOI: 10.1093/bioinformatics/btq643] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. AVAILABILITY AND IMPLEMENTATION RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.
Collapse
Affiliation(s)
- Lukas Habegger
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
| | | | | | | | | | | | | |
Collapse
|
11227
|
Mizuno H, Kawahara Y, Sakai H, Kanamori H, Wakimoto H, Yamagata H, Oono Y, Wu J, Ikawa H, Itoh T, Matsumoto T. Massive parallel sequencing of mRNA in identification of unannotated salinity stress-inducible transcripts in rice (Oryza sativa L.). BMC Genomics 2010; 11:683. [PMID: 21122150 PMCID: PMC3016417 DOI: 10.1186/1471-2164-11-683] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2010] [Accepted: 12/02/2010] [Indexed: 12/14/2022] Open
Abstract
Background Microarray technology is limited to monitoring the expression of previously annotated genes that have corresponding probes on the array. Computationally annotated genes have not fully been validated, because ESTs and full-length cDNAs cannot cover entire transcribed regions. Here, mRNA-Seq (an Illumina cDNA sequencing application) was used to monitor whole mRNAs of salinity stress-treated rice tissues. Results Thirty-six-base-pair reads from whole mRNAs were mapped to the rice genomic sequence: 72.0% to 75.2% were mapped uniquely to the genome, and 5.0% to 5.7% bridged exons. From the piling up of short reads mapped on the genome, a series of programs (Bowtie, TopHat, and Cufflinks) comprehensively predicted 51,301 (shoot) and 54,491 (root) transcripts, including 2,795 (shoot) and 3,082 (root) currently unannotated in the Rice Annotation Project database. Of these unannotated transcripts, 995 (shoot) and 1,052 (root) had ORFs similar to those encoding the amino acid sequences of functional proteins in a BLASTX search against UniProt and RefSeq databases. Among the unannotated genes, 213 (shoot) and 436 (root) were differentially expressed in response to salinity stress. Sequence-based and array-based measurements of the expression ratios of previously annotated genes were highly correlated. Conclusion Unannotated transcripts were identified on the basis of the piling up of mapped reads derived from mRNAs in rice. Some of these unannotated transcripts encoding putative functional proteins were expressed differentially in response to salinity stress.
Collapse
Affiliation(s)
- Hiroshi Mizuno
- National Institute of Agrobiological Sciences (NIAS), Division of Genome and Biodiversity Research, 1-2 Kannondai 2-chome, Tsukuba, Ibaraki 305-8602, Japan
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11228
|
Crawford JE, Guelbeogo WM, Sanou A, Traoré A, Vernick KD, Sagnon N, Lazzaro BP. De novo transcriptome sequencing in Anopheles funestus using Illumina RNA-seq technology. PLoS One 2010; 5:e14202. [PMID: 21151993 PMCID: PMC2996306 DOI: 10.1371/journal.pone.0014202] [Citation(s) in RCA: 118] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2010] [Accepted: 11/10/2010] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Anopheles funestus is one of the primary vectors of human malaria, which causes a million deaths each year in sub-Saharan Africa. Few scientific resources are available to facilitate studies of this mosquito species and relatively little is known about its basic biology and evolution, making development and implementation of novel disease control efforts more difficult. The An. funestus genome has not been sequenced, so in order to facilitate genome-scale experimental biology, we have sequenced the adult female transcriptome of An. funestus from a newly founded colony in Burkina Faso, West Africa, using the Illumina GAIIx next generation sequencing platform. METHODOLOGY/PRINCIPAL FINDINGS We assembled short Illumina reads de novo using a novel approach involving iterative de novo assemblies and "target-based" contig clustering. We then selected a conservative set of 15,527 contigs through comparisons to four Dipteran transcriptomes as well as multiple functional and conserved protein domain databases. Comparison to the Anopheles gambiae immune system identified 339 contigs as putative immune genes, thus identifying a large portion of the immune system that can form the basis for subsequent studies of this important malaria vector. We identified 5,434 1:1 orthologues between An. funestus and An. gambiae and found that among these 1:1 orthologues, the protein sequence of those with putative immune function were significantly more diverged than the transcriptome as a whole. Short read alignments to the contig set revealed almost 367,000 genetic polymorphisms segregating in the An. funestus colony and demonstrated the utility of the assembled transcriptome for use in RNA-seq based measurements of gene expression. CONCLUSIONS/SIGNIFICANCE We developed a pipeline that makes de novo transcriptome sequencing possible in virtually any organism at a very reasonable cost ($6,300 in sequencing costs in our case). We anticipate that our approach could be used to develop genomic resources in a diversity of systems for which full genome sequence is currently unavailable. Our An. funestus contig set and analytical results provide a valuable resource for future studies in this non-model, but epidemiologically critical, vector insect.
Collapse
Affiliation(s)
- Jacob E Crawford
- Department of Entomology, Cornell University, Ithaca, New York, United States of America.
| | | | | | | | | | | | | |
Collapse
|
11229
|
Miller R, Wu G, Deshpande RR, Vieler A, Gärtner K, Li X, Moellering ER, Zäuner S, Cornish AJ, Liu B, Bullard B, Sears BB, Kuo MH, Hegg EL, Shachar-Hill Y, Shiu SH, Benning C. Changes in transcript abundance in Chlamydomonas reinhardtii following nitrogen deprivation predict diversion of metabolism. PLANT PHYSIOLOGY 2010; 154:1737-52. [PMID: 20935180 PMCID: PMC2996024 DOI: 10.1104/pp.110.165159] [Citation(s) in RCA: 355] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Accepted: 10/07/2010] [Indexed: 05/17/2023]
Abstract
Like many microalgae, Chlamydomonas reinhardtii forms lipid droplets rich in triacylglycerols when nutrient deprived. To begin studying the mechanisms underlying this process, nitrogen (N) deprivation was used to induce triacylglycerol accumulation and changes in developmental programs such as gametogenesis. Comparative global analysis of transcripts under induced and noninduced conditions was applied as a first approach to studying molecular changes that promote or accompany triacylglycerol accumulation in cells encountering a new nutrient environment. Towards this goal, high-throughput sequencing technology was employed to generate large numbers of expressed sequence tags of eight biologically independent libraries, four for each condition, N replete and N deprived, allowing a statistically sound comparison of expression levels under the two tested conditions. As expected, N deprivation activated a subset of control genes involved in gametogenesis while down-regulating protein biosynthesis. Genes for components of photosynthesis were also down-regulated, with the exception of the PSBS gene. N deprivation led to a marked redirection of metabolism: the primary carbon source, acetate, was no longer converted to cell building blocks by the glyoxylate cycle and gluconeogenesis but funneled directly into fatty acid biosynthesis. Additional fatty acids may be produced by membrane remodeling, a process that is suggested by the changes observed in transcript abundance of putative lipase genes. Inferences on metabolism based on transcriptional analysis are indirect, but biochemical experiments supported some of these deductions. The data provided here represent a rich source for the exploration of the mechanism of oil accumulation in microalgae.
Collapse
|
11230
|
Osorio FG, Varela I, Lara E, Puente XS, Espada J, Santoro R, Freije JMP, Fraga MF, López-Otín C. Nuclear envelope alterations generate an aging-like epigenetic pattern in mice deficient in Zmpste24 metalloprotease. Aging Cell 2010; 9:947-57. [PMID: 20961378 DOI: 10.1111/j.1474-9726.2010.00621.x] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Mutations in the nuclear envelope protein lamin A or in its processing protease ZMPSTE24 cause human accelerated aging syndromes, including Hutchinson-Gilford progeria syndrome. Similarly, Zmpste24-deficient mice accumulate unprocessed prelamin A and develop multiple progeroid symptoms, thus representing a valuable animal model for the study of these syndromes. Zmpste24-deficient mice also show marked transcriptional alterations associated with chromatin disorganization, but the molecular links between both processes are unknown. We report herein that Zmpste24-deficient mice show a hypermethylation of rDNA that reduces the transcription of ribosomal genes, being this reduction reversible upon treatment with DNA methyltransferase inhibitors. This alteration has been previously described during physiological aging in rodents, suggesting its potential role in the development of the progeroid phenotypes. We also show that Zmpste24-deficient mice present global hypoacetylation of histones H2B and H4. By using a combination of RNA sequencing and chromatin immunoprecipitation assays, we demonstrate that these histone modifications are associated with changes in the expression of several genes involved in the control of cell proliferation and metabolic processes, which may contribute to the plethora of progeroid symptoms exhibited by Zmpste24-deficient mice. The identification of these altered genes may help to clarify the molecular mechanisms underlying aging and progeroid syndromes as well as to define new targets for the treatment of these dramatic diseases.
Collapse
Affiliation(s)
- Fernando G Osorio
- Departamento de Bioquímica y Biología Molecular Unidad de Epigenética, Instituto Universitario de Oncología, Universidad de Oviedo, 33006-Oviedo, Spain
| | | | | | | | | | | | | | | | | |
Collapse
|
11231
|
Mizrachi E, Hefer CA, Ranik M, Joubert F, Myburg AA. De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq. BMC Genomics 2010; 11:681. [PMID: 21122097 PMCID: PMC3053591 DOI: 10.1186/1471-2164-11-681] [Citation(s) in RCA: 128] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2010] [Accepted: 12/01/2010] [Indexed: 12/03/2022] Open
Abstract
Background De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. Results We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions. Conclusions De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants.
Collapse
Affiliation(s)
- Eshchar Mizrachi
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, 0002, South Africa
| | | | | | | | | |
Collapse
|
11232
|
Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 2010; 7:1009-15. [PMID: 21057496 PMCID: PMC3037023 DOI: 10.1038/nmeth.1528] [Citation(s) in RCA: 969] [Impact Index Per Article: 64.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Accepted: 10/08/2010] [Indexed: 12/31/2022]
Abstract
Through alternative splicing, most human genes express multiple isoforms that often differ in function. To infer isoform regulation from high-throughput sequencing of cDNA fragments (RNA-seq), we developed the mixture-of-isoforms (MISO) model, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates. Incorporation of mRNA fragment length distribution in paired-end RNA-seq greatly improved estimation of alternative-splicing levels. MISO also detects differentially regulated exons or isoforms. Application of MISO implicated the RNA splicing factor hnRNP H1 in the regulation of alternative cleavage and polyadenylation, a role that was supported by UV cross-linking-immunoprecipitation sequencing (CLIP-seq) analysis in human cells. Our results provide a probabilistic framework for RNA-seq analysis, give functional insights into pre-mRNA processing and yield guidelines for the optimal design of RNA-seq experiments for studies of gene and isoform expression.
Collapse
Affiliation(s)
- Yarden Katz
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA,Department of Biology, MIT, Cambridge, Massachusetts, USA
| | - Eric T Wang
- Department of Biology, MIT, Cambridge, Massachusetts, USA,Harvard-MIT Division of Health Sciences and Technology, Cambridge, Massachusetts, USA
| | - Edoardo M Airoldi
- Department of Statistics and FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Christopher B Burge
- Department of Biology, MIT, Cambridge, Massachusetts, USA,Department of Biological Engineering, MIT, Cambridge, Massachusetts, USA
| |
Collapse
|
11233
|
Majewski J, Pastinen T. The study of eQTL variations by RNA-seq: from SNPs to phenotypes. Trends Genet 2010; 27:72-9. [PMID: 21122937 DOI: 10.1016/j.tig.2010.10.006] [Citation(s) in RCA: 161] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Revised: 10/25/2010] [Accepted: 10/28/2010] [Indexed: 01/13/2023]
Abstract
Common DNA variants alter the expression levels and patterns of many human genes. Loci responsible for this genetic control are known as expression quantitative trait loci (eQTLs). The resulting variation of gene expression across individuals has been postulated to be a determinant of phenotypic variation and susceptibility to complex disease. In the past, the application of expression microarray and genetic variation data to study populations enabled the rapid identification of eQTLs in model organisms and humans. Now, a new technology promises to revolutionize the field. Massively parallel RNA sequencing (RNA-seq) provides unprecedented resolution, allowing us to accurately monitor not only the expression output of each genomic locus but also reconstruct and quantify alternatively spliced transcripts. RNA-seq also provides new insights into the regulatory mechanisms underlying eQTLs. Here, we discuss the major advances introduced by RNA-seq and summarize current progress towards understanding the role of eQTLs in determining human phenotypic diversity.
Collapse
Affiliation(s)
- Jacek Majewski
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, 740 Dr. Penfield Avenue, Rm 7210, Montreal, Quebec, H3A 1A4, Canada.
| | | |
Collapse
|
11234
|
Martin J, Bruno VM, Fang Z, Meng X, Blow M, Zhang T, Sherlock G, Snyder M, Wang Z. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics 2010; 11:663. [PMID: 21106091 PMCID: PMC3152782 DOI: 10.1186/1471-2164-11-663] [Citation(s) in RCA: 143] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Accepted: 11/24/2010] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied. RESULTS Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95%) and reconstruct full-length genes for the majority of the existing gene models (54.3%). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics. CONCLUSIONS These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.
Collapse
Affiliation(s)
- Jeffrey Martin
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
11235
|
Kahramanoglou C, Seshasayee ASN, Prieto AI, Ibberson D, Schmidt S, Zimmermann J, Benes V, Fraser GM, Luscombe NM. Direct and indirect effects of H-NS and Fis on global gene expression control in Escherichia coli. Nucleic Acids Res 2010; 39:2073-91. [PMID: 21097887 PMCID: PMC3064808 DOI: 10.1093/nar/gkq934] [Citation(s) in RCA: 215] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Nucleoid-associated proteins (NAPs) are global regulators of gene expression in Escherichia coli, which affect DNA conformation by bending, wrapping and bridging the DNA. Two of these--H-NS and Fis--bind to specific DNA sequences and structures. Because of their importance to global gene expression, the binding of these NAPs to the DNA was previously investigated on a genome-wide scale using ChIP-chip. However, variation in their binding profiles across the growth phase and the genome-scale nature of their impact on gene expression remain poorly understood. Here, we present a genome-scale investigation of H-NS and Fis binding to the E. coli chromosome using chromatin immunoprecipitation combined with high-throughput sequencing (ChIP-seq). By performing our experiments under multiple time-points during growth in rich media, we show that the binding regions of the two proteins are mutually exclusive under our experimental conditions. H-NS binds to significantly longer tracts of DNA than Fis, consistent with the linear spread of H-NS binding from high- to surrounding lower-affinity sites; the length of binding regions is associated with the degree of transcriptional repression imposed by H-NS. For Fis, a majority of binding events do not lead to differential expression of the proximal gene; however, it has a significant indirect effect on gene expression partly through its effects on the expression of other transcription factors. We propose that direct transcriptional regulation by Fis is associated with the interaction of tandem arrays of Fis molecules to the DNA and possible DNA bending, particularly at operon-upstream regions. Our study serves as a proof-of-principle for the use of ChIP-seq for global DNA-binding proteins in bacteria, which should become significantly more economical and feasible with the development of multiplexing techniques.
Collapse
|
11236
|
Lee S, Seo CH, Lim B, Yang JO, Oh J, Kim M, Lee S, Lee B, Kang C, Lee S. Accurate quantification of transcriptome from RNA-Seq data by effective length normalization. Nucleic Acids Res 2010; 39:e9. [PMID: 21059678 PMCID: PMC3025570 DOI: 10.1093/nar/gkq1015] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We propose a novel, efficient and intuitive approach of estimating mRNA abundances from the whole transcriptome shotgun sequencing (RNA-Seq) data. Our method, NEUMA (Normalization by Expected Uniquely Mappable Area), is based on effective length normalization using uniquely mappable areas of gene and mRNA isoform models. Using the known transcriptome sequence model such as RefSeq, NEUMA pre-computes the numbers of all possible gene-wise and isoform-wise informative reads: the former being sequences mapped to all mRNA isoforms of a single gene exclusively and the latter uniquely mapped to a single mRNA isoform. The results are used to estimate the effective length of genes and transcripts, taking experimental distributions of fragment size into consideration. Quantitative RT-PCR based on 27 randomly selected genes in two human cell lines and computer simulation experiments demonstrated superior accuracy of NEUMA over other recently developed methods. NEUMA covers a large proportion of genes and mRNA isoforms and offers a measure of consistency ('consistency coefficient') for each gene between an independently measured gene-wise level and the sum of the isoform levels. NEUMA is applicable to both paired-end and single-end RNA-Seq data. We propose that NEUMA could make a standard method in quantifying gene transcript levels from RNA-Seq data.
Collapse
Affiliation(s)
- Soohyun Lee
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Yuseong-gu, Daejeon, Korea
| | | | | | | | | | | | | | | | | | | |
Collapse
|
11237
|
Dimon MT, Sorber K, DeRisi JL. HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. PLoS One 2010; 5:e13875. [PMID: 21079731 PMCID: PMC2975632 DOI: 10.1371/journal.pone.0013875] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 09/16/2010] [Indexed: 02/01/2023] Open
Abstract
Background High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. Methodology/Principal Findings Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity. Conclusions/Significance HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3′ splice sites and 1.4% of 5′ splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer.
Collapse
Affiliation(s)
- Michelle T. Dimon
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America
- Biological and Medical Informatics Program, University of California San Francisco, San Francisco, California, United States of America
| | - Katherine Sorber
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America
| | - Joseph L. DeRisi
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America
- Howard Hughes Medical Institute, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
11238
|
Matkovich SJ, Van Booven DJ, Eschenbacher WH, Dorn GW. RISC RNA sequencing for context-specific identification of in vivo microRNA targets. Circ Res 2010; 108:18-26. [PMID: 21030712 DOI: 10.1161/circresaha.110.233528] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
RATIONALE MicroRNAs (miRs) are expanding our understanding of cardiac disease and have the potential to transform cardiovascular therapeutics. One miR can target hundreds of individual mRNAs, but existing methodologies are not sufficient to accurately and comprehensively identify these mRNA targets in vivo. OBJECTIVE To develop methods permitting identification of in vivo miR targets in an unbiased manner, using massively parallel sequencing of mouse cardiac transcriptomes in combination with sequencing of mRNA associated with mouse cardiac RNA-induced silencing complexes (RISCs). METHODS AND RESULTS We optimized techniques for expression profiling small amounts of RNA without introducing amplification bias and applied this to anti-Argonaute 2 immunoprecipitated RISCs (RISC-Seq) from mouse hearts. By comparing RNA-sequencing results of cardiac RISC and transcriptome from the same individual hearts, we defined 1645 mRNAs consistently targeted to mouse cardiac RISCs. We used this approach in hearts overexpressing miRs from Myh6 promoter-driven precursors (programmed RISC-Seq) to identify 209 in vivo targets of miR-133a and 81 in vivo targets of miR-499. Consistent with the fact that miR-133a and miR-499 have widely differing "seed" sequences and belong to different miR families, only 6 targets were common to miR-133a- and miR-499-programmed hearts. CONCLUSIONS RISC-sequencing is a highly sensitive method for general RISC profiling and individual miR target identification in biological context and is applicable to any tissue and any disease state.
Collapse
Affiliation(s)
- Scot J Matkovich
- Center for Pharmacogenomics, Department of Medicine, Washington University School of Medicine, St Louis, MO 63110, USA
| | | | | | | |
Collapse
|
11239
|
Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T, Butterfield YS, Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu AL, Tam A, Zhao Y, Moore RA, Hirst M, Marra MA, Jones SJM, Hoodless PA, Birol I. De novo assembly and analysis of RNA-seq data. Nat Methods 2010; 7:909-12. [PMID: 20935650 DOI: 10.1038/nmeth.1517] [Citation(s) in RCA: 627] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2010] [Accepted: 09/13/2010] [Indexed: 12/20/2022]
Abstract
We describe Trans-ABySS, a de novo short-read transcriptome assembly and analysis pipeline that addresses variation in local read densities by assembling read substrings with varying stringencies and then merging the resulting contigs before analysis. Analyzing 7.4 gigabases of 50-base-pair paired-end Illumina reads from an adult mouse liver poly(A) RNA library, we identified known, new and alternative structures in expressed transcripts, and achieved high sensitivity and specificity relative to reference-based assembly methods.
Collapse
Affiliation(s)
- Gordon Robertson
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11240
|
Brooks AN, Yang L, Duff MO, Hansen KD, Park JW, Dudoit S, Brenner SE, Graveley BR. Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res 2010; 21:193-202. [PMID: 20921232 DOI: 10.1101/gr.108662.110] [Citation(s) in RCA: 165] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Alternative splicing is generally controlled by proteins that bind directly to regulatory sequence elements and either activate or repress splicing of adjacent splice sites in a target pre-mRNA. Here, we have combined RNAi and mRNA-seq to identify exons that are regulated by Pasilla (PS), the Drosophila melanogaster ortholog of mammalian NOVA1 and NOVA2. We identified 405 splicing events in 323 genes that are significantly affected upon depletion of ps, many of which were annotated as being constitutively spliced. The sequence regions upstream and within PS-repressed exons and downstream from PS-activated exons are enriched for YCAY repeats, and these are consistent with the location of these motifs near NOVA-regulated exons in mammals. Thus, the RNA regulatory map of PS and NOVA1/2 is highly conserved between insects and mammals despite the fact that the target gene orthologs regulated by PS and NOVA1/2 are almost entirely nonoverlapping. This observation suggests that the regulatory codes of individual RNA binding proteins may be nearly immutable, yet the regulatory modules controlled by these proteins are highly evolvable.
Collapse
Affiliation(s)
- Angela N Brooks
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
| | | | | | | | | | | | | | | |
Collapse
|
11241
|
Zambelli F, Pavesi G, Gissi C, Horner DS, Pesole G. Assessment of orthologous splicing isoforms in human and mouse orthologous genes. BMC Genomics 2010; 11:534. [PMID: 20920313 PMCID: PMC3091683 DOI: 10.1186/1471-2164-11-534] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2010] [Accepted: 10/01/2010] [Indexed: 11/22/2022] Open
Abstract
Background Recent discoveries have highlighted the fact that alternative splicing and alternative transcripts are the rule, rather than the exception, in metazoan genes. Since multiple transcript and protein variants expressed by the same gene are, by definition, structurally distinct and need not to be functionally equivalent, the concept of gene orthology should be extended to the transcript level in order to describe evolutionary relationships between structurally similar transcript variants. In other words, the identification of true orthology relationships between gene products now should progress beyond primary sequence and "splicing orthology", consisting in ancestrally shared exon-intron structures, is required to define orthologous isoforms at transcript level. Results As a starting step in this direction, in this work we performed a large scale human- mouse gene comparison with a twofold goal: first, to assess if and to which extent traditional gene annotations such as RefSeq capture genuine splicing orthology; second, to provide a more detailed annotation and quantification of true human-mouse orthologous transcripts defined as transcripts of orthologous genes exhibiting the same splicing patterns. Conclusions We observed an identical exon/intron structure for 32% of human and mouse orthologous genes. This figure increases to 87% using less stringent criteria for gene structure similarity, thus implying that for about 13% of the human RefSeq annotated genes (and about 25% of the corresponding transcripts) we could not identify any mouse transcript showing sufficient similarity to be confidently assigned as a splicing ortholog. Our data suggest that current gene and transcript data may still be rather incomplete - with several splicing variants still unknown. The observation that alternative splicing produces large numbers of alternative transcripts and proteins, some of them conserved across species and others truly species-specific, suggests that, still maintaining the conventional definition of gene orthology, a new concept of "splicing orthology" can be defined at transcript level.
Collapse
Affiliation(s)
- Federico Zambelli
- Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano, Milano, Italia
| | | | | | | | | |
Collapse
|
11242
|
Zimmermann B, Bilusic I, Lorenz C, Schroeder R. Genomic SELEX: a discovery tool for genomic aptamers. Methods 2010; 52:125-32. [PMID: 20541015 PMCID: PMC2954320 DOI: 10.1016/j.ymeth.2010.06.004] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2010] [Accepted: 06/03/2010] [Indexed: 11/29/2022] Open
Abstract
Genomic SELEX is a discovery tool for genomic aptamers, which are genomically encoded functional domains in nucleic acid molecules that recognize and bind specific ligands. When combined with genomic libraries and using RNA-binding proteins as baits, Genomic SELEX used with high-throughput sequencing enables the discovery of genomic RNA aptamers and the identification of RNA-protein interaction networks. Here we describe how to construct and analyze genomic libraries, how to choose baits for selections, how to perform the selection procedure and finally how to analyze the enriched sequences derived from deep sequencing. As a control procedure, we recommend performing a "Neutral" SELEX experiment in parallel to the selection, omitting the selection step. This control experiment provides a background signal for comparison with the positively selected pool. We also recommend deep sequencing the initial library in order to facilitate the final in silico analysis of enrichment with respect to the initial levels. Counter selection procedures, using modified or inactive baits, allow strengthening the binding specificity of the winning selected sequences.
Collapse
Affiliation(s)
| | | | | | - Renée Schroeder
- Department of Biochemistry and Cell Biology, Max F. Perutz Laboratories, University of Vienna, Austria
| |
Collapse
|
11243
|
|
11244
|
Statistical Issues in the Analysis of ChIP-Seq and RNA-Seq Data. Genes (Basel) 2010; 1:317-34. [PMID: 24710049 PMCID: PMC3954086 DOI: 10.3390/genes1020317] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 09/20/2010] [Indexed: 11/29/2022] Open
Abstract
The recent arrival of ultra-high throughput, next generation sequencing (NGS) technologies has revolutionized the genetics and genomics fields by allowing rapid and inexpensive sequencing of billions of bases. The rapid deployment of NGS in a variety of sequencing-based experiments has resulted in fast accumulation of massive amounts of sequencing data. To process this new type of data, a torrent of increasingly sophisticated algorithms and software tools are emerging to help the analysis stage of the NGS applications. In this article, we strive to comprehensively identify the critical challenges that arise from all stages of NGS data analysis and provide an objective overview of what has been achieved in existing works. At the same time, we highlight selected areas that need much further research to improve our current capabilities to delineate the most information possible from NGS data. The article focuses on applications dealing with ChIP-Seq and RNA-Seq.
Collapse
|
11245
|
Sun H, Wu J, Wickramasinghe P, Pal S, Gupta R, Bhattacharyya A, Agosto-Perez FJ, Showe LC, Huang THM, Davuluri RV. Genome-wide mapping of RNA Pol-II promoter usage in mouse tissues by ChIP-seq. Nucleic Acids Res 2010; 39:190-201. [PMID: 20843783 PMCID: PMC3017616 DOI: 10.1093/nar/gkq775] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Alternative promoters that are differentially used in various cellular contexts and tissue types add to the transcriptional complexity in mammalian genome. Identification of alternative promoters and the annotation of their activity in different tissues is one of the major challenges in understanding the transcriptional regulation of the mammalian genes and their isoforms. To determine the use of alternative promoters in different tissues, we performed ChIP-seq experiments using antibody against RNA Pol-II, in five adult mouse tissues (brain, liver, lung, spleen and kidney). Our analysis identified 38 639 Pol-II promoters, including 12 270 novel promoters, for both protein coding and non-coding mouse genes. Of these, 6384 promoters are tissue specific which are CpG poor and we find that only 34% of the novel promoters are located in CpG-rich regions, suggesting that novel promoters are mostly tissue specific. By identifying the Pol-II bound promoter(s) of each annotated gene in a given tissue, we found that 37% of the protein coding genes use alternative promoters in the five mouse tissues. The promoter annotations and ChIP-seq data presented here will aid ongoing efforts of characterizing gene regulatory regions in mammalian genomes.
Collapse
Affiliation(s)
- Hao Sun
- Center for Systems and Computational Biology, Molecular and Cellular Oncogenesis Program, The Wistar Institute, Philadelphia, PA 19104, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
11246
|
Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou YC, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li HI, McDonald H, Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJM, Tai IT, Marra MA. Alternative expression analysis by RNA sequencing. Nat Methods 2010; 7:843-7. [PMID: 20835245 DOI: 10.1038/nmeth.1503] [Citation(s) in RCA: 203] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2010] [Accepted: 08/20/2010] [Indexed: 12/17/2022]
Abstract
In alternative expression analysis by sequencing (ALEXA-seq), we developed a method to analyze massively parallel RNA sequence data to catalog transcripts and assess differential and alternative expression of known and predicted mRNA isoforms in cells and tissues. As proof of principle, we used the approach to compare fluorouracil-resistant and -nonresistant human colorectal cancer cell lines. We assessed the sensitivity and specificity of the approach by comparison to exon tiling and splicing microarrays and validated the results with reverse transcription-PCR, quantitative PCR and Sanger sequencing. We observed global disruption of splicing in fluorouracil-resistant cells characterized by expression of new mRNA isoforms resulting from exon skipping, alternative splice site usage and intron retention. Alternative expression annotation databases, source code, a data viewer and other resources to facilitate analysis are available at http://www.alexaplatform.org/alexa_seq/.
Collapse
Affiliation(s)
- Malachi Griffith
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11247
|
|
11248
|
Courtney E, Kornfeld S, Janitz K, Janitz M. Transcriptome profiling in neurodegenerative disease. J Neurosci Methods 2010; 193:189-202. [PMID: 20800617 DOI: 10.1016/j.jneumeth.2010.08.018] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2010] [Revised: 07/29/2010] [Accepted: 08/20/2010] [Indexed: 02/02/2023]
Abstract
Changes in gene expression and splicing patterns (that occur prior to the onset and during the progression of complex diseases) have become a major focus of neurodegenerative disease research. These signature patterns of gene expression provide clues about the mechanisms involved in the molecular pathogenesis of neurodegenerative disease and may facilitate the discovery of novel therapeutic drugs. With the development of array technologies and the very recent RNA-seq technique, our understanding of the pathogenesis of neurodegenerative disease is expanding exponentially. Here, we review the technologies involved in gene expression and splicing analysis and the related literature on three common neurodegenerative diseases: Alzheimer's disease, Parkinson's disease and Huntington's disease.
Collapse
Affiliation(s)
- Eliza Courtney
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW, Australia
| | | | | | | |
Collapse
|
11249
|
Ponting CP, Belgard TG. Transcribed dark matter: meaning or myth? Hum Mol Genet 2010; 19:R162-8. [PMID: 20798109 DOI: 10.1093/hmg/ddq362] [Citation(s) in RCA: 220] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Genomic tiling arrays, cDNA sequencing and, more recently, RNA-Seq have provided initial insights into the extent and depth of transcribed sequence across human and other genomes. These methods have led to greatly improved annotations of protein-coding genes, but have also identified transcription outside of annotated exons. One resultant issue that has aroused dispute is the balance of transcription of known exons against transcription outside of known exons. While non-genic 'dark matter' transcription was found by tiling arrays to be pervasive, it was seen to contribute only a small percentage of the polyadenylated transcriptome in some RNA-Seq experiments. This apparent contradiction has been compounded by a lack of clarity about what exactly constitutes a protein-coding gene. It remains unclear, for example, whether or not all transcripts that overlap on either strand within a genomic locus should be assigned to a single gene locus, including those that fail to share promoters, exons and splice junctions. The inability of tiling arrays and RNA-Seq to count transcripts, rather than exons or exon pairs, adds to these difficulties. While there is agreement that thousands of apparently non-coding loci are present outside of protein-coding genes in the human genome, there is vigorous debate of what constitutes evidence for their functionality. These issues will only be resolved upon the demonstration, or otherwise, that organismal or cellular phenotypes frequently result when non-coding RNA loci are disrupted.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford, UK.
| | | |
Collapse
|
11250
|
Abstract
Cellular homeostasis is achieved by the proper balance of regulatory networks that if disrupted can lead to cellular transformation. These cell circuits are fine-tuned and maintained by the coordinated function of proteins and non-coding RNAs (ncRNAs). In addition to the well-characterized protein coding and microRNAs constituents, large ncRNAs are also emerging as important regulatory molecules in tumor-suppressor and oncogenic pathways. Recent studies have revealed mechanistic insight of large ncRNAs regulating key cancer pathways at a transcriptional, post-transcriptional and epigenetic level. Here we synthesize these latest advances within the context of their mechanistic roles in regulating and maintaining cellular equilibrium. We posit that similar to protein-coding genes, large ncRNAs are a newly emerging class of oncogenic and tumor-suppressor genes. Our growing knowledge of the role of large ncRNAs in cellular transformation is pointing towards their potential use as biomarkers and targets for novel therapeutic approaches in the future.
Collapse
Affiliation(s)
- Maite Huarte
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | |
Collapse
|