3201
|
Kadakkuzha BM, Puthanveettil SV. Genomics and proteomics in solving brain complexity. MOLECULAR BIOSYSTEMS 2013; 9:1807-21. [PMID: 23615871 PMCID: PMC6425491 DOI: 10.1039/c3mb25391k] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The human brain is extraordinarily complex, composed of billions of neurons and trillions of synaptic connections. Neurons are organized into circuit assemblies that are modulated by specific interneurons and non-neuronal cells, such as glia and astrocytes. Data on human genome sequences predicts that each of these cells in the human brain has the potential of expressing ∼20 000 protein coding genes and tens of thousands of noncoding RNAs. A major challenge in neuroscience is to determine (1) how individual neurons and circuitry utilize this potential during development and maturation of the nervous system, and for higher brain functions such as cognition, and (2) how this potential is altered in neurological and psychiatric disorders. In this review, we will discuss how recent advances in next generation sequencing, proteomics and bioinformatics have transformed our understanding of gene expression and the functions of neural circuitry, memory storage, and disorders of cognition.
Collapse
Affiliation(s)
- Beena M Kadakkuzha
- Department of Neuroscience, The Scripps Research Institute, Scripps Florida 130 Scripps Way, Jupiter, FL 33458, USA
| | | |
Collapse
|
3202
|
Virmani A, Pinto L, Binienda Z, Ali S. Food, nutrigenomics, and neurodegeneration--neuroprotection by what you eat! Mol Neurobiol 2013; 48:353-62. [PMID: 23813102 DOI: 10.1007/s12035-013-8498-3] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Accepted: 06/16/2013] [Indexed: 02/08/2023]
Abstract
Diet in human health is no longer simple nutrition, but in light of recent research, especially nutrigenomics, it is linked via evolution and genetics to cell health status capable of modulating apoptosis, detoxification, and appropriate gene response. Nutritional deficiency and disease especially lack of vitamins and minerals is well known, but more recently, epidemiological studies suggest a role of fruits and vegetables, as well as essential fatty acids and even red wine (French paradox), in protection against disease. In the early 1990s, various research groups started considering the use of antioxidants (e.g., melatonin, resveratrol, green tea, lipoic acid) and metabolic compounds (e.g., nicotinamide, acetyl-L-carnitine, creatine, coenzyme Q10) as possible candidates in neuroprotection. They were of course considered on par with snake oil salesman (women) at the time. The positive actions of nutritional supplements, minerals, and plant extracts in disease prevention are now mainstream and commercial health claims being made are subject to regulation in most countries. Apart from efficacy and finding, the right dosages, the safety, and especially the level of purification and lack of contamination are all issues that are important as their use becomes widespread. From the mechanistic point of view, most of the time these substances replenish the body's deficiency and restore normal function. However, they also exert actions that are not sensu stricto nutritive and could be considered pharmacological especially that, at times, higher intake than recommended (RDA) is needed to see these effects. Free radicals and neuroinflammation processes underlie many neurodegenerative conditions, even Parkinson's disease and Alzheimer's disease. Curcumin, carotenoids, acetyl-L-carnitine, coenzyme Q10, vitamin D, and polyphenols and other nutraceuticals have the potential to target multiple pathways in these conditions. In summary, augmenting neuroprotective pathways using diet and finding new natural substances that can be more efficacious, i.e., induction of health-promoting genes and reduction of the expression of disease-promoting genes, could be incorporated into neuroprotective strategies of the future.
Collapse
Affiliation(s)
- Ashraf Virmani
- Research, Innovation and Development, Sigma-tau SpA, Via Pontina km 30,400, 00040, Pomezia, Rome, Italy,
| | | | | | | |
Collapse
|
3203
|
Krasileva KV, Buffalo V, Bailey P, Pearce S, Ayling S, Tabbita F, Soria M, Wang S, Akhunov E, Uauy C, Dubcovsky J. Separating homeologs by phasing in the tetraploid wheat transcriptome. Genome Biol 2013; 14:R66. [PMID: 23800085 PMCID: PMC4053977 DOI: 10.1186/gb-2013-14-6-r66] [Citation(s) in RCA: 115] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2013] [Accepted: 06/25/2013] [Indexed: 11/10/2022] Open
Abstract
Background The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing. Conclusions Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.
Collapse
|
3204
|
Cao WJ, Wu HL, He BS, Zhang YS, Zhang ZY. Analysis of long non-coding RNA expression profiles in gastric cancer. World J Gastroenterol 2013; 19:3658-3664. [PMID: 23801869 PMCID: PMC3691033 DOI: 10.3748/wjg.v19.i23.3658] [Citation(s) in RCA: 163] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Revised: 03/20/2013] [Accepted: 05/10/2013] [Indexed: 02/06/2023] Open
Abstract
AIM: To investigate the expression patterns of long non-coding RNAs (lncRNAs) in gastric cancer.
METHODS: Two publicly available human exon arrays for gastric cancer and data for the corresponding normal tissue were downloaded from the Gene Expression Omnibus (GEO). We re-annotated the probes of the human exon arrays and retained the probes uniquely mapping to lncRNAs at the gene level. LncRNA expression profiles were generated by using robust multi-array average method in affymetrix power tools. The normalized data were then analyzed with a Bioconductor package linear models for microarray data and genes with adjusted P-values below 0.01 were considered differentially expressed. An independent data set was used to validate the results.
RESULTS: With the computational pipeline established to re-annotate over 6.5 million probes of the Affymetrix Human Exon 1.0 ST array, we identified 136053 probes uniquely mapping to lncRNAs at the gene level. These probes correspond to 9294 lncRNAs, covering nearly 76% of the GENCODE lncRNA data set. By analyzing GSE27342 consisting of 80 paired gastric cancer and normal adjacent tissue samples, we identified 88 lncRNAs that were differentially expressed in gastric cancer, some of which have been reported to play a role in cancer, such as LINC00152, taurine upregulated 1, urothelial cancer associated 1, Pvt1 oncogene, small nucleolar RNA host gene 1 and LINC00261. In the validation data set GSE33335, 59% of these differentially expressed lncRNAs showed significant expression changes (adjusted P-value < 0.01) with the same direction.
CONCLUSION: We identified a set of lncRNAs differentially expressed in gastric cancer, providing useful information for discovery of new biomarkers and therapeutic targets in gastric cancer.
Collapse
|
3205
|
Hangauer MJ, Vaughn IW, McManus MT. Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genet 2013; 9:e1003569. [PMID: 23818866 PMCID: PMC3688513 DOI: 10.1371/journal.pgen.1003569] [Citation(s) in RCA: 547] [Impact Index Per Article: 49.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Accepted: 05/01/2013] [Indexed: 01/01/2023] Open
Abstract
Known protein coding gene exons compose less than 3% of the human genome. The remaining 97% is largely uncharted territory, with only a small fraction characterized. The recent observation of transcription in this intergenic territory has stimulated debate about the extent of intergenic transcription and whether these intergenic RNAs are functional. Here we directly observed with a large set of RNA-seq data covering a wide array of human tissue types that the majority of the genome is indeed transcribed, corroborating recent observations by the ENCODE project. Furthermore, using de novo transcriptome assembly of this RNA-seq data, we found that intergenic regions encode far more long intergenic noncoding RNAs (lincRNAs) than previously described, helping to resolve the discrepancy between the vast amount of observed intergenic transcription and the limited number of previously known lincRNAs. In total, we identified tens of thousands of putative lincRNAs expressed at a minimum of one copy per cell, significantly expanding upon prior lincRNA annotation sets. These lincRNAs are specifically regulated and conserved rather than being the product of transcriptional noise. In addition, lincRNAs are strongly enriched for trait-associated SNPs suggesting a new mechanism by which intergenic trait-associated regions may function. These findings will enable the discovery and interrogation of novel intergenic functional elements.
Collapse
Affiliation(s)
- Matthew J. Hangauer
- Diabetes Center, Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
| | - Ian W. Vaughn
- Diabetes Center, Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
| | - Michael T. McManus
- Diabetes Center, Department of Microbiology and Immunology, University of California, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
3206
|
Chen G, Chen J, Shi C, Shi L, Tong W, Shi T. Dissecting the Characteristics and Dynamics of Human Protein Complexes at Transcriptome Cascade Using RNA-Seq Data. PLoS One 2013; 8:e66521. [PMID: 23824284 PMCID: PMC3688907 DOI: 10.1371/journal.pone.0066521] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 05/06/2013] [Indexed: 11/19/2022] Open
Abstract
Human protein complexes play crucial roles in various biological processes as the functional module. However, the expression features of human protein complexes at the transcriptome cascade are poorly understood. Here, we used the RNA-Seq data from 16 disparate tissues and four types of human cancers to explore the characteristics and dynamics of human protein complexes. We observed that many individual components of human protein complexes can be generated by multiple distinct transcripts. Similar with yeast, the human protein complex constituents are inclined to co-express in diverse tissues. The dominant isoform of the genes involved in protein complexes tend to encode the complex constituents in each tissue. Our results indicate that the protein complex dynamics not only correlate with the presence or absence of complexes, but may also be related to the major isoform switching for complex subunits. Between any two cancers of breast, colon, lung and prostate, we found that only a few of the differentially expressed transcripts associated with complexes were identical, but 5-10 times more protein complexes involved in differentially expressed transcripts were common. Collectively, our study reveals novel properties and dynamics of human protein complexes at the transcriptome cascade in diverse normal tissues and different cancers.
Collapse
Affiliation(s)
- Geng Chen
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Jiwei Chen
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Caiping Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Leming Shi
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, United States of America
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, United States of America
| | - Tieliu Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
- * E-mail:
| |
Collapse
|
3207
|
Yang J, Mitra A, Dojer N, Fu S, Rowicka M, Brasier AR. A probabilistic approach to learn chromatin architecture and accurate inference of the NF-κB/RelA regulatory network using ChIP-Seq. Nucleic Acids Res 2013; 41:7240-59. [PMID: 23771139 PMCID: PMC3753626 DOI: 10.1093/nar/gkt493] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Using nuclear factor-κB (NF-κB) ChIP-Seq data, we present a framework for iterative learning of regulatory networks. For every possible transcription factor-binding site (TFBS)-putatively regulated gene pair, the relative distance and orientation are calculated to learn which TFBSs are most likely to regulate a given gene. Weighted TFBS contributions to putative gene regulation are integrated to derive an NF-κB gene network. A de novo motif enrichment analysis uncovers secondary TFBSs (AP1, SP1) at characteristic distances from NF-κB/RelA TFBSs. Comparison with experimental ENCODE ChIP-Seq data indicates that experimental TFBSs highly correlate with predicted sites. We observe that RelA-SP1-enriched promoters have distinct expression profiles from that of RelA-AP1 and are enriched in introns, CpG islands and DNase accessible sites. Sixteen novel NF-κB/RelA-regulated genes and TFBSs were experimentally validated, including TANK, a negative feedback gene whose expression is NF-κB/RelA dependent and requires a functional interaction with the AP1 TFBSs. Our probabilistic method yields more accurate NF-κB/RelA-regulated networks than a traditional, distance-based approach, confirmed by both analysis of gene expression and increased informativity of Genome Ontology annotations. Our analysis provides new insights into how co-occurring TFBSs and local chromatin context orchestrate activation of NF-κB/RelA sub-pathways differing in biological function and temporal expression patterns.
Collapse
Affiliation(s)
- Jun Yang
- Department of Internal Medicine, The University of Texas Medical Branch, 301 University Boulevard, Galveston, TX 77555-1060, USA, Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, 301 University Boulevard, Galveston, TX 77555-1060, USA, Institute for Translational Sciences, The University of Texas Medical Branch, 301 University Boulevard, Galveston, TX 77555-1060, USA, Institute of Informatics, University of Warsaw, Banacha 2, 02-097, Warsaw, Poland and Sealy Center for Molecular Medicine, The University of Texas Medical Branch, 301 University Boulevard, Galveston, TX 77555-1060, USA
| | | | | | | | | | | |
Collapse
|
3208
|
Abstract
Proteogenomic searching is a useful method for identifying novel proteins, annotating genes and detecting peptides unique to an individual genome. The approach, however, can be laborious, as it often requires search segmentation and the use of several unintegrated tools. Furthermore, many proteogenomic efforts have been limited to small genomes, as large genomes can prove impractical due to the required amount of computer memory and computation time. We present Peppy, a software tool designed to perform every necessary task of proteogenomic searches quickly, accurately and automatically. The software generates a peptide database from a genome, tracks peptide loci, matches peptides to MS/MS spectra and assigns confidence values to those matches. Peppy automatically performs a decoy database generation, search and analysis to return identifications at the desired false discovery rate threshold. Written in Java for cross-platform execution, the software is fully multithreaded for enhanced speed. The program can run on regular desktop computers, opening the doors of proteogenomic searching to a wider audience of proteomics and genomics researchers. Peppy is available at http://geneffects.com/peppy .
Collapse
Affiliation(s)
- Brian A Risk
- Department of Biochemistry & Biophysics, UNC School of Medicine, Chapel Hill, North Carolina 27599, United States.
| | | | | |
Collapse
|
3209
|
Steward CA, Gonzalez JM, Trevanion S, Sheppard D, Kerry G, Gilbert JGR, Wicker LS, Rogers J, Harrow JL. The non-obese diabetic mouse sequence, annotation and variation resource: an aid for investigating type 1 diabetes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat032. [PMID: 23729657 PMCID: PMC3668384 DOI: 10.1093/database/bat032] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Model organisms are becoming increasingly important for the study of complex diseases such as type 1 diabetes (T1D). The non-obese diabetic (NOD) mouse is an experimental model for T1D having been bred to develop the disease spontaneously in a process that is similar to humans. Genetic analysis of the NOD mouse has identified around 50 disease loci, which have the nomenclature Idd for insulin-dependent diabetes, distributed across at least 11 different chromosomes. In total, 21 Idd regions across 6 chromosomes, that are major contributors to T1D susceptibility or resistance, were selected for finished sequencing and annotation at the Wellcome Trust Sanger Institute. Here we describe the generation of 40.4 mega base-pairs of finished sequence from 289 bacterial artificial chromosomes for the NOD mouse. Manual annotation has identified 738 genes in the diabetes sensitive NOD mouse and 765 genes in homologous regions of the diabetes resistant C57BL/6J reference mouse across 19 candidate Idd regions. This has allowed us to call variation consequences between homologous exonic sequences for all annotated regions in the two mouse strains. We demonstrate the importance of this resource further by illustrating the technical difficulties that regions of inter-strain structural variation between the NOD mouse and the C57BL/6J reference mouse can cause for current next generation sequencing and assembly techniques. Furthermore, we have established that the variation rate in the Idd regions is 2.3 times higher than the mean found for the whole genome assembly for the NOD/ShiLtJ genome, which we suggest reflects the fact that positive selection for functional variation in immune genes is beneficial in regard to host defence. In summary, we provide an important resource, which aids the analysis of potential causative genes involved in T1D susceptibility. Database URLs:http://www.sanger.ac.uk/resources/mouse/nod/; http://vega-previous.sanger.ac.uk/info/data/mouse_regions.html
Collapse
Affiliation(s)
- Charles A Steward
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
3210
|
Cheng JC, McBrayer SK, Coarfa C, Dalva-Aydemir S, Gunaratne PH, Carpten JD, Keats JK, Rosen ST, Shanmugam M. Expression and phosphorylation of the AS160_v2 splice variant supports GLUT4 activation and the Warburg effect in multiple myeloma. Cancer Metab 2013; 1:14. [PMID: 24280290 PMCID: PMC4178207 DOI: 10.1186/2049-3002-1-14] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 05/24/2013] [Indexed: 12/22/2022] Open
Abstract
Background Multiple myeloma (MM) is a fatal plasma cell malignancy exhibiting enhanced glucose consumption associated with an aerobic glycolytic phenotype (i.e., the Warburg effect). We have previously demonstrated that myeloma cells exhibit constitutive plasma membrane (PM) localization of GLUT4, consistent with the dependence of MM cells on this transporter for maintenance of glucose consumption rates, proliferative capacity, and viability. The purpose of this study was to investigate the molecular basis of constitutive GLUT4 plasma membrane localization in MM cells. Findings We have elucidated a novel mechanism through which myeloma cells achieve constitutive GLUT4 activation involving elevated expression of the Rab-GTPase activating protein AS160_v2 splice variant to promote the Warburg effect. AS160_v2-positive MM cell lines display constitutive Thr642 phosphorylation, known to be required for inactivation of AS160 Rab-GAP activity. Importantly, we show that enforced expression of AS160_v2 is required for GLUT4 PM translocation and activation in these select MM lines. Furthermore, we demonstrate that ectopic expression of a full-length, phospho-deficient AS160 mutant is sufficient to impair constitutive GLUT4 cell surface residence, which is characteristic of MM cells. Conclusions This is the first study to tie AS160 de-regulation to increased glucose consumption rates and the Warburg effect in cancer. Future studies investigating connections between the insulin/IGF-1/AS160_v2/GLUT4 axis and FDG-PET positivity in myeloma patients are warranted and could provide rationale for therapeutically targeting this pathway in MM patients with advanced disease.
Collapse
Affiliation(s)
- Javelin C Cheng
- Robert H, Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, 303 E, Superior Street, Lurie Building 3-250, Chicago, IL 606011, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
3211
|
Qu H, Fang X. A brief review on the Human Encyclopedia of DNA Elements (ENCODE) project. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:135-41. [PMID: 23722115 PMCID: PMC4357814 DOI: 10.1016/j.gpb.2013.05.001] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Revised: 05/15/2013] [Accepted: 05/18/2013] [Indexed: 12/18/2022]
Abstract
The ENCyclopedia Of DNA Elements (ENCODE) project is an international research consortium that aims to identify all functional elements in the human genome sequence. The second phase of the project comprised 1640 datasets from 147 different cell types, yielding a set of 30 publications across several journals. These data revealed that 80.4% of the human genome displays some functionality in at least one cell type. Many of these regulatory elements are physically associated with one another and further form a network or three-dimensional conformation to affect gene expression. These elements are also related to sequence variants associated with diseases or traits. All these findings provide us new insights into the organization and regulation of genes and genome, and serve as an expansive resource for understanding human health and disease.
Collapse
|
3212
|
Ghosal S, Das S, Chakrabarti J. Long noncoding RNAs: new players in the molecular mechanism for maintenance and differentiation of pluripotent stem cells. Stem Cells Dev 2013; 22:2240-53. [PMID: 23528033 DOI: 10.1089/scd.2013.0014] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Maintenance of the pluripotent state or differentiation of the pluripotent state into any germ layer depends on the factors that orchestrate expression of thousands of genes through epigenetic, transcriptional, and post-transcriptional regulation. Long noncoding RNAs (lncRNAs) are implicated in the complex molecular circuitry in the developmental processes. The ENCODE project has opened up new avenues for studying these lncRNA transcripts with the availability of new datasets for lncRNA annotation and regulation. Expression studies identified hundreds of long noncoding RNAs differentially expressed in the pluripotent state, and many of these lncRNAs are found to control the pluripotency and stemness in embryonic and induced pluripotent stem cells or, in the reverse way, promote differentiation of pluripotent cells. They are generally transcriptionally activated or repressed by pluripotency-associated transcription factors and function as molecular mediators of gene expression that determine the pluripotent state of the cell. They can act as molecular scaffolds or guides for the chromatin-modifying complexes to direct them to bind into specific genomic loci to impart a repressive or activating effect on gene expression, or they can transcriptionally or post-transcriptionally regulate gene expression by diverse molecular mechanisms. This review focuses on recent findings on the regulatory role of lncRNAs in two main aspects of pluripotency, namely, self renewal and differentiation into any lineage, and elucidates the underlying molecular mechanisms that are being uncovered lately.
Collapse
Affiliation(s)
- Suman Ghosal
- Indian Association for the Cultivation of Science, Kolkata, India
| | | | | |
Collapse
|
3213
|
Sheynkman GM, Shortreed MR, Frey BL, Smith LM. Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol Cell Proteomics 2013; 12:2341-53. [PMID: 23629695 DOI: 10.1074/mcp.o113.028142] [Citation(s) in RCA: 105] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Human proteomic databases required for MS peptide identification are frequently updated and carefully curated, yet are still incomplete because it has been challenging to acquire every protein sequence from the diverse assemblage of proteoforms expressed in every tissue and cell type. In particular, alternative splicing has been shown to be a major source of this cell-specific proteomic variation. Many new alternative splice forms have been detected at the transcript level using next generation sequencing methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of next generation sequencing methods, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splice-junction peptides. Eighty million paired-end Illumina reads and ∼500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching. Based on the RefSeq gene models, we detected 136,123 annotated and 144,818 unannotated transcript junctions. Of those, 24,834 unannotated junctions passed various quality filters (e.g. minimum read depth) and these entries were translated into 33,589 polypeptide sequences and used for database searching. We discovered 57 splice junction peptides not present in the Uniprot-Trembl proteomic database comprising an array of different splicing events, including skipped exons, alternative donors and acceptors, and noncanonical transcriptional start sites. To our knowledge this is the first example of using sample-specific RNA-Seq data to create a splice-junction database and discover new peptides resulting from alternative splicing.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave., Madison, Wisconsin 53706, USA
| | | | | | | |
Collapse
|
3214
|
Regulatory Roles for Long ncRNA and mRNA. Cancers (Basel) 2013; 5:462-90. [PMID: 24216986 PMCID: PMC3730338 DOI: 10.3390/cancers5020462] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Revised: 04/05/2013] [Accepted: 04/19/2013] [Indexed: 01/31/2023] Open
Abstract
Recent advances in high-throughput sequencing technology have identified the transcription of a much larger portion of the genome than previously anticipated. Especially in the context of cancer it has become clear that aberrant transcription of both protein-coding and long non-coding RNAs (lncRNAs) are frequent events. The current dogma of RNA function describes mRNA to be responsible for the synthesis of proteins, whereas non-coding RNA can have regulatory or epigenetic functions. However, this distinction between protein coding and regulatory ability of transcripts may not be that strict. Here, we review the increasing body of evidence for the existence of multifunctional RNAs that have both protein-coding and trans-regulatory roles. Moreover, we demonstrate that coding transcripts bind to components of the Polycomb Repressor Complex 2 (PRC2) with similar affinities as non-coding transcripts, revealing potential epigenetic regulation by mRNAs. We hypothesize that studies on the regulatory ability of disease-associated mRNAs will form an important new field of research.
Collapse
|
3215
|
Kapusta A, Kronenberg Z, Lynch VJ, Zhuo X, Ramsay L, Bourque G, Yandell M, Feschotte C. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet 2013; 9:e1003470. [PMID: 23637635 PMCID: PMC3636048 DOI: 10.1371/journal.pgen.1003470] [Citation(s) in RCA: 458] [Impact Index Per Article: 41.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 03/07/2013] [Indexed: 12/22/2022] Open
Abstract
Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (∼30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non–TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ∼30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ∼35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires. An unexpected layer of complexity in the genomes of humans and other vertebrates lies in the abundance of genes that do not appear to encode proteins but produce a variety of non-coding RNAs. In particular, the human genome is currently predicted to contain 5,000–10,000 independent gene units generating long (>200 nucleotides) noncoding RNAs (lncRNAs). While there is growing evidence that a large fraction of these lncRNAs have cellular functions, notably to regulate protein-coding gene expression, almost nothing is known on the processes underlying the evolutionary origins and diversification of lncRNA genes. Here we show that transposable elements, through their capacity to move and spread in genomes in a lineage-specific fashion, as well as their ability to introduce regulatory sequences upon chromosomal insertion, represent a major force shaping the lncRNA repertoire of humans, mice, and zebrafish. Not only do TEs make up a substantial fraction of mature lncRNA transcripts, they are also enriched in the vicinity of lncRNA genes, where they frequently contribute to their transcriptional regulation. Through specific examples we provide evidence that some TE sequences embedded in lncRNAs are critical for the biogenesis of lncRNAs and likely important for their function.
Collapse
Affiliation(s)
- Aurélie Kapusta
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
| | - Zev Kronenberg
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
| | - Vincent J. Lynch
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Xiaoyu Zhuo
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
| | - LeeAnn Ramsay
- McGill University and Genome Quebec Innovation Center, Montréal, Canada
| | - Guillaume Bourque
- McGill University and Genome Quebec Innovation Center, Montréal, Canada
| | - Mark Yandell
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
| | - Cédric Feschotte
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
- * E-mail:
| |
Collapse
|
3216
|
Ling MHT, Ban Y, Wen H, Wang SM, Ge SX. Conserved expression of natural antisense transcripts in mammals. BMC Genomics 2013; 14:243. [PMID: 23577827 PMCID: PMC3635984 DOI: 10.1186/1471-2164-14-243] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Accepted: 03/06/2013] [Indexed: 02/03/2023] Open
Abstract
Background Recent studies had found thousands of natural antisense transcripts originating from the same genomic loci of protein coding genes but from the opposite strand. It is unclear whether the majority of antisense transcripts are functional or merely transcriptional noise. Results Using the Affymetrix Exon array with a modified cDNA synthesis protocol that enables genome-wide detection of antisense transcription, we conducted large-scale expression analysis of antisense transcripts in nine corresponding tissues from human, mouse and rat. We detected thousands of antisense transcripts, some of which show tissue-specific expression that could be subjected to further study for their potential function in the corresponding tissues/organs. The expression patterns of many antisense transcripts are conserved across species, suggesting selective pressure on these transcripts. When compared to protein-coding genes, antisense transcripts show a lesser degree of expression conservation. We also found a positive correlation between the sense and antisense expression across tissues. Conclusion Our results suggest that natural antisense transcripts are subjected to selective pressure but to a lesser degree compared to sense transcripts in mammals.
Collapse
Affiliation(s)
- Maurice H T Ling
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD 57007, USA
| | | | | | | | | |
Collapse
|
3217
|
Tzadok S, Caspin Y, Hachmo Y, Canaani D, Dotan I. Directionality of noncoding human RNAs: how to avoid artifacts. Anal Biochem 2013; 439:23-9. [PMID: 23583907 DOI: 10.1016/j.ab.2013.03.031] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Revised: 03/13/2013] [Accepted: 03/20/2013] [Indexed: 01/03/2023]
Abstract
Inactivation of tumor suppressor and metastasis suppressor genes via epigenetic silencing is a frequent event in human cancers. Recent work has shown new mechanisms of epigenetic silencing, based on the occurrence of long noncoding promoter-spanning antisense and/or sense RNAs (lncRNAs), which constitute part of chromatin silencing complexes. Using reverse transcription polymerase chain reaction (RT-PCR), we have started to scan "triple negative" and Her2-overexpressing breast cancer cell lines for directional/bidirectional transcription through promoters of tumor suppressor and metastasis suppressor genes known to be epigenetically silenced in vivo. Surprisingly, we found that RT-PCR-amplified products were obtained at high frequency in the absence of exogenous primers. These amplified products resulted from RT priming via transcripts originating from promoter or upstream spanning regions. Consequently, this priming overruled directionality determination and led to false detection-identification of such lncRNAs. We show that this prevalent "no primer" artifact can be eliminated by treating the RNA preparations with periodate, performing RT reactions at highly elevated temperatures, or a combination of both. These experimental improvements enabled determination of the presence and directionality of individual promoter-spanning long noncoding RNAs with certainty. Examples for the BRMS1 metastasis suppressor gene, as well as RAR-β2 and CST6 human tumor suppressor genes, in breast carcinoma cell lines are presented.
Collapse
Affiliation(s)
- Sivan Tzadok
- Department of Biochemistry and Molecular Biology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | | | |
Collapse
|
3218
|
Shen H, Li J, Zhang J, Xu C, Jiang Y, Wu Z, Zhao F, Liao L, Chen J, Lin Y, Tian Q, Papasian CJ, Deng HW. Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty four Caucasians. PLoS One 2013; 8:e59494. [PMID: 23577066 PMCID: PMC3618277 DOI: 10.1371/journal.pone.0059494] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2012] [Accepted: 02/14/2013] [Indexed: 12/14/2022] Open
Abstract
Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely "knock out" the corresponding genes. Across all the 44 genomes, a total of 182 genes were "knocked-out" in at least one individual genome, among which 46 genes were "knocked out" in over 30% of our samples, suggesting that a number of genes are commonly "knocked-out" in general populations. Gene ontology analysis suggested that these commonly "knocked-out" genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.
Collapse
Affiliation(s)
- Hui Shen
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
| | - Jian Li
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
| | - Jigang Zhang
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
| | - Chao Xu
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
- Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, P. R. China
| | - Yan Jiang
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
- Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, P. R. China
| | - Zikai Wu
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
- Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, P. R. China
| | - Fuping Zhao
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
| | - Li Liao
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
| | - Jun Chen
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
| | - Yong Lin
- Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, P. R. China
| | - Qing Tian
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
| | - Christopher J. Papasian
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
| | - Hong-Wen Deng
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
- Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, P. R. China
| |
Collapse
|
3219
|
Light S, Elofsson A. The impact of splicing on protein domain architecture. Curr Opin Struct Biol 2013; 23:451-8. [PMID: 23562110 DOI: 10.1016/j.sbi.2013.02.013] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Revised: 02/22/2013] [Accepted: 02/28/2013] [Indexed: 10/27/2022]
Abstract
Many proteins are composed of protein domains, functional units of common descent. Multidomain forms are common in all eukaryotes making up more than half of the proteome and the evolution of novel domain architecture has been accelerated in metazoans. It is also becoming increasingly clear that alternative splicing is prevalent among vertebrates. Given that protein domains are defined as structurally, functionally and evolutionarily distinct units, one may speculate that some alternative splicing events may lead to clean excisions of protein domains, thus generating a number of different domain architectures from one gene template. However, recent findings indicate that smaller alternative splicing events, in particular in disordered regions, might be more prominent than domain architectural changes. The problem of identifying protein isoforms is, however, still not resolved. Clearly, many splice forms identified through detection of mRNA sequences appear to produce 'nonfunctional' proteins, such as proteins with missing internal secondary structure elements. Here, we review the state of the art methods for identification of functional isoforms and present a summary of what is known, thus far, about alternative splicing with regard to protein domain architectures.
Collapse
Affiliation(s)
- Sara Light
- Science for Life Laboratory, Stockholm University, Box 1031 SE-171 21 Solna, Sweden
| | | |
Collapse
|
3220
|
Miller DFB, Yan PS, Buechlein A, Rodriguez BA, Yilmaz AS, Goel S, Lin H, Collins-Burow B, Rhodes LV, Braun C, Pradeep S, Rupaimoole R, Dalkilic M, Sood AK, Burow ME, Tang H, Huang TH, Liu Y, Rusch DB, Nephew KP. A new method for stranded whole transcriptome RNA-seq. Methods 2013; 63:126-34. [PMID: 23557989 DOI: 10.1016/j.ymeth.2013.03.023] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2013] [Revised: 03/21/2013] [Accepted: 03/23/2013] [Indexed: 11/18/2022] Open
Abstract
This report describes an improved protocol to generate stranded, barcoded RNA-seq libraries to capture the whole transcriptome. By optimizing the use of duplex specific nuclease (DSN) to remove ribosomal RNA reads from stranded barcoded libraries, we demonstrate improved efficiency of multiplexed next generation sequencing (NGS). This approach detects expression profiles of all RNA types, including miRNA (microRNA), piRNA (Piwi-interacting RNA), snoRNA (small nucleolar RNA), lincRNA (long non-coding RNA), mtRNA (mitochondrial RNA) and mRNA (messenger RNA) without the use of gel electrophoresis. The improved protocol generates high quality data that can be used to identify differential expression in known and novel coding and non-coding transcripts, splice variants, mitochondrial genes and SNPs (single nucleotide polymorphisms).
Collapse
Affiliation(s)
- David F B Miller
- Medical Sciences, Indiana University School of Medicine, 1001 East 3rd St., Bloomington, IN 47405, United States.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3221
|
Chen G, Wang C, Shi L, Qu X, Chen J, Yang J, Shi C, Chen L, Zhou P, Ning B, Tong W, Shi T. Incorporating the human gene annotations in different databases significantly improved transcriptomic and genetic analyses. RNA (NEW YORK, N.Y.) 2013; 19:479-89. [PMID: 23431329 PMCID: PMC3677258 DOI: 10.1261/rna.037473.112] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 01/14/2013] [Indexed: 05/18/2023]
Abstract
Human gene annotation is crucial for conducting transcriptomic and genetic studies; however, the impacts of human gene annotations in diverse databases on related studies have been less evaluated. To enable full use of various human annotation resources and better understand the human transcriptome, here we systematically compare the human annotations present in RefSeq, Ensembl (GENCODE), and AceView on diverse transcriptomic and genetic analyses. We found that the human gene annotations in the three databases are far from complete. Although Ensembl and AceView annotated more genes than RefSeq, more than 15,800 genes from Ensembl (or AceView) are within the intergenic and intronic regions of AceView (or Ensembl) annotation. The human transcriptome annotations in RefSeq, Ensembl, and AceView had distinct effects on short-read mapping, gene and isoform expression profiling, and differential expression calling. Furthermore, our findings indicate that the integrated annotation of these databases can obtain a more complete gene set and significantly enhance those transcriptomic analyses. We also observed that many more known SNPs were located within genes annotated in Ensembl and AceView than in RefSeq. In particular, 1033 of 3041 trait/disease-associated SNPs involved in about 200 human traits/diseases that were previously reported to be in RefSeq intergenic regions could be relocated within Ensembl and AceView genes. Our findings illustrate that a more complete transcriptome generated by incorporating human gene annotations in diverse databases can strikingly improve the overall results of transcriptomic and genetic studies.
Collapse
Affiliation(s)
- Geng Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Charles Wang
- Functional Genomics Core, Beckman Research Institute, City of Hope Comprehensive Cancer Center, Duarte, California 91010, USA
| | - Leming Shi
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas 72079, USA
| | - Xiongfei Qu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Jiwei Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Jianmin Yang
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Caiping Shi
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Long Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Peiying Zhou
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Baitang Ning
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas 72079, USA
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas 72079, USA
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
- Corresponding authorE-mail
| |
Collapse
|
3222
|
Dauncey MJ. Genomic and epigenomic insights into nutrition and brain disorders. Nutrients 2013; 5:887-914. [PMID: 23503168 PMCID: PMC3705325 DOI: 10.3390/nu5030887] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Revised: 02/28/2013] [Accepted: 03/08/2013] [Indexed: 12/22/2022] Open
Abstract
Considerable evidence links many neuropsychiatric, neurodevelopmental and neurodegenerative disorders with multiple complex interactions between genetics and environmental factors such as nutrition. Mental health problems, autism, eating disorders, Alzheimer's disease, schizophrenia, Parkinson's disease and brain tumours are related to individual variability in numerous protein-coding and non-coding regions of the genome. However, genotype does not necessarily determine neurological phenotype because the epigenome modulates gene expression in response to endogenous and exogenous regulators, throughout the life-cycle. Studies using both genome-wide analysis of multiple genes and comprehensive analysis of specific genes are providing new insights into genetic and epigenetic mechanisms underlying nutrition and neuroscience. This review provides a critical evaluation of the following related areas: (1) recent advances in genomic and epigenomic technologies, and their relevance to brain disorders; (2) the emerging role of non-coding RNAs as key regulators of transcription, epigenetic processes and gene silencing; (3) novel approaches to nutrition, epigenetics and neuroscience; (4) gene-environment interactions, especially in the serotonergic system, as a paradigm of the multiple signalling pathways affected in neuropsychiatric and neurological disorders. Current and future advances in these four areas should contribute significantly to the prevention, amelioration and treatment of multiple devastating brain disorders.
Collapse
|
3223
|
Li W, Yang W, Wang XJ. Pseudogenes: pseudo or real functional elements? J Genet Genomics 2013; 40:171-7. [PMID: 23618400 DOI: 10.1016/j.jgg.2013.03.003] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Revised: 03/04/2013] [Accepted: 03/04/2013] [Indexed: 11/24/2022]
Abstract
Pseudogenes are genomic remnants of ancient protein-coding genes which have lost their coding potentials through evolution. Although broadly existed, pseudogenes used to be considered as junk or relics of genomes which have not drawn enough attentions of biologists until recent years. With the broad applications of high-throughput experimental techniques, growing lines of evidence have strongly suggested that some pseudogenes possess special functions, including regulating parental gene expression and participating in the regulation of many biological processes. In this review, we summarize some basic features of pseudogenes and their functions in regulating development and diseases. All of these observations indicate that pseudogenes are not purely dead fossils of genomes, but warrant further exploration in their distribution, expression regulation and functions. A new nomenclature is desirable for the currently called 'pseudogenes' to better describe their functions.
Collapse
Affiliation(s)
- Wen Li
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | | | | |
Collapse
|
3224
|
Abstract
The gene expression programs that establish and maintain specific cell states in humans are controlled by thousands of transcription factors, cofactors, and chromatin regulators. Misregulation of these gene expression programs can cause a broad range of diseases. Here, we review recent advances in our understanding of transcriptional regulation and discuss how these have provided new insights into transcriptional misregulation in disease.
Collapse
Affiliation(s)
- Tong Ihn Lee
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
| | - Richard A. Young
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts
| |
Collapse
|
3225
|
Kim T, Reitmair A. Non-Coding RNAs: Functional Aspects and Diagnostic Utility in Oncology. Int J Mol Sci 2013; 14:4934-68. [PMID: 23455466 PMCID: PMC3634484 DOI: 10.3390/ijms14034934] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2012] [Revised: 02/09/2013] [Accepted: 02/18/2013] [Indexed: 02/06/2023] Open
Abstract
Noncoding RNAs (ncRNAs) have been found to have roles in a large variety of biological processes. Recent studies indicate that ncRNAs are far more abundant and important than initially imagined, holding great promise for use in diagnostic, prognostic, and therapeutic applications. Within ncRNAs, microRNAs (miRNAs) are the most widely studied and characterized. They have been implicated in initiation and progression of a variety of human malignancies, including major pathologies such as cancers, arthritis, neurodegenerative disorders, and cardiovascular diseases. Their surprising stability in serum and other bodily fluids led to their rapid ascent as a novel class of biomarkers. For example, several properties of stable miRNAs, and perhaps other classes of ncRNAs, make them good candidate biomarkers for early cancer detection and for determining which preneoplastic lesions are likely to progress to cancer. Of particular interest is the identification of biomarker signatures, which may include traditional protein-based biomarkers, to improve risk assessment, detection, and prognosis. Here, we offer a comprehensive review of the ncRNA biomarker literature and discuss state-of-the-art technologies for their detection. Furthermore, we address the challenges present in miRNA detection and quantification, and outline future perspectives for development of next-generation biodetection assays employing multicolor alternating-laser excitation (ALEX) fluorescence spectroscopy.
Collapse
Affiliation(s)
- Taiho Kim
- Nesher Technologies, Inc., 2100 W. 3rd St. Los Angeles, CA 90057, USA.
| | | |
Collapse
|
3226
|
Accurate identification and analysis of human mRNA isoforms using deep long read sequencing. G3-GENES GENOMES GENETICS 2013; 3:387-97. [PMID: 23450794 PMCID: PMC3583448 DOI: 10.1534/g3.112.004812] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2012] [Accepted: 12/26/2012] [Indexed: 01/22/2023]
Abstract
Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.
Collapse
|
3227
|
Epigenetics in Friedreich's Ataxia: Challenges and Opportunities for Therapy. GENETICS RESEARCH INTERNATIONAL 2013; 2013:852080. [PMID: 23533785 PMCID: PMC3590757 DOI: 10.1155/2013/852080] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2012] [Accepted: 01/10/2013] [Indexed: 11/17/2022]
Abstract
Friedreich's ataxia (FRDA) is an autosomal recessive neurodegenerative disorder caused by homozygous expansion of a GAA·TTC trinucleotide repeat within the first intron of the FXN gene, leading to reduced FXN transcription and decreased levels of frataxin protein. Recent advances in FRDA research have revealed the presence of several epigenetic modifications that are either directly or indirectly involved in this FXN gene silencing. Although epigenetic marks may be inherited from one generation to the next, modifications of DNA and histones can be reversed, indicating that they are suitable targets for epigenetic-based therapy. Unlike other trinucleotide repeat disorders, such as Huntington disease, the large expansions of GAA·TTC repeats in FRDA do not produce a change in the frataxin amino acid sequence, but they produce reduced levels of normal frataxin. Therefore, transcriptional reactivation of the FXN gene provides a good therapeutic option. The present paper will initially focus on the epigenetic changes seen in FRDA patients and their role in the silencing of FXN gene and will be concluded by considering the potential epigenetic therapies.
Collapse
|
3228
|
Gronau I, Arbiza L, Mohammed J, Siepel A. Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Mol Biol Evol 2013; 30:1159-71. [PMID: 23386628 DOI: 10.1093/molbev/mst019] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Complete genome sequences contain valuable information about natural selection, but this information is difficult to access for short, widely scattered noncoding elements such as transcription factor binding sites or small noncoding RNAs. Here, we introduce a new computational method, called Inference of Natural Selection from Interspersed Genomically coHerent elemenTs (INSIGHT), for measuring the influence of natural selection on such elements. INSIGHT uses a generative probabilistic model to contrast patterns of polymorphism and divergence in the elements of interest with those in flanking neutral sites, pooling weak information from many short elements in a manner that accounts for variation among loci in mutation rates and coalescent times. The method is able to disentangle the contributions of weak negative, strong negative, and positive selection based on their distinct effects on patterns of polymorphism and divergence. It obtains information about divergence from multiple outgroup genomes using a general statistical phylogenetic approach. The INSIGHT model is efficiently fitted to genome-wide data using an approximate expectation maximization algorithm. Using simulations, we show that the method can accurately estimate the parameters of interest even in complex demographic scenarios, and that it significantly improves on methods based on summary statistics describing polymorphism and divergence. To demonstrate the usefulness of INSIGHT, we apply it to several classes of human noncoding RNAs and to GATA2-binding sites in the human genome.
Collapse
Affiliation(s)
- Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, USA
| | | | | | | |
Collapse
|
3229
|
Gerasimova A, Chavez L, Li B, Seumois G, Greenbaum J, Rao A, Vijayanand P, Peters B. Predicting cell types and genetic variations contributing to disease by combining GWAS and epigenetic data. PLoS One 2013; 8:e54359. [PMID: 23382893 PMCID: PMC3559682 DOI: 10.1371/journal.pone.0054359] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Accepted: 12/11/2012] [Indexed: 12/22/2022] Open
Abstract
Genome-wide association studies (GWASs) identify single nucleotide polymorphisms (SNPs) that are enriched in individuals suffering from a given disease. Most disease-associated SNPs fall into non-coding regions, so that it is not straightforward to infer phenotype or function; moreover, many SNPs are in tight genetic linkage, so that a SNP identified as associated with a particular disease may not itself be causal, but rather signify the presence of a linked SNP that is functionally relevant to disease pathogenesis. Here, we present an analysis method that takes advantage of the recent rapid accumulation of epigenomics data to address these problems for some SNPs. Using asthma as a prototypic example; we show that non-coding disease-associated SNPs are enriched in genomic regions that function as regulators of transcription, such as enhancers and promoters. Identifying enhancers based on the presence of the histone modification marks such as H3K4me1 in different cell types, we show that the location of enhancers is highly cell-type specific. We use these findings to predict which SNPs are likely to be directly contributing to disease based on their presence in regulatory regions, and in which cell types their effect is expected to be detectable. Moreover, we can also predict which cell types contribute to a disease based on overlap of the disease-associated SNPs with the locations of enhancers present in a given cell type. Finally, we suggest that it will be possible to re-analyze GWAS studies with much higher power by limiting the SNPs considered to those in coding or regulatory regions of cell types relevant to a given disease.
Collapse
Affiliation(s)
- Anna Gerasimova
- La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America.
| | | | | | | | | | | | | | | |
Collapse
|
3230
|
Abstract
This issue of Genome Research presents new results, methods, and tools from The ENCODE Project (ENCyclopedia of DNA Elements), which collectively represents an important step in moving beyond a parts list of the genome and promises to shape the future of genomic research. This collection sheds light on basic biological questions and frames the current debate over the optimization of tools and methodological challenges necessary to compare and interpret large complex data sets focused on how the genome is organized and regulated. In a number of instances, the authors have highlighted the strengths and limitations of current computational and technical approaches, providing the community with useful standards, which should stimulate development of new tools. In many ways, these papers will ripple through the scientific community, as those in pursuit of understanding the “regulatory genome” will heavily traverse the maps and tools. Similarly, the work should have a substantive impact on how genetic variation contributes to specific diseases and traits by providing a compendium of functional elements for follow-up study. The success of these papers should not only be measured by the scope of the scientific insights and tools but also by their ability to attract new talent to mine existing and future data.
Collapse
Affiliation(s)
- Stephen Chanock
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Advanced Technology Center, Bethesda, Maryland 20892-4605, USA.
| |
Collapse
|
3231
|
Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome Res 2013; 22:1735-47. [PMID: 22955985 PMCID: PMC3431490 DOI: 10.1101/gr.136366.111] [Citation(s) in RCA: 135] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.
Collapse
|
3232
|
Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigó R. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 2013; 22:1775-89. [PMID: 22955988 PMCID: PMC3431493 DOI: 10.1101/gr.132159.111] [Citation(s) in RCA: 3779] [Impact Index Per Article: 343.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences—particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.
Collapse
Affiliation(s)
- Thomas Derrien
- Bioinformatics and Genomics, Centre for Genomic Regulation and UPF, 08003 Barcelona, Catalonia, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3233
|
Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. Linking disease associations with regulatory information in the human genome. Genome Res 2013; 22:1748-59. [PMID: 22955986 PMCID: PMC3431491 DOI: 10.1101/gr.136127.111] [Citation(s) in RCA: 536] [Impact Index Per Article: 48.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify “functional SNPs” that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.
Collapse
Affiliation(s)
- Marc A Schaub
- Department of Computer Science, Stanford University, Stanford, California 94305, USA
| | | | | | | | | |
Collapse
|
3234
|
Cheng C, Alexander R, Min R, Leng J, Yip KY, Rozowsky J, Yan KK, Dong X, Djebali S, Ruan Y, Davis CA, Carninci P, Lassman T, Gingeras TR, Guigó R, Birney E, Weng Z, Snyder M, Gerstein M. Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res 2013; 22:1658-67. [PMID: 22955978 PMCID: PMC3431483 DOI: 10.1101/gr.136838.111] [Citation(s) in RCA: 138] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.
Collapse
Affiliation(s)
- Chao Cheng
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3235
|
Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S, Cherry JM, Snyder M. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res 2013; 22:1790-7. [PMID: 22955989 PMCID: PMC3431494 DOI: 10.1101/gr.137323.112] [Citation(s) in RCA: 1990] [Impact Index Per Article: 180.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.
Collapse
Affiliation(s)
- Alan P Boyle
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3236
|
Abstract
In its first production phase, The ENCODE Project Consortium (ENCODE) has generated thousands of genome-scale data sets, resulting in a genomic “parts list” that encompasses transcripts, sites of transcription factor binding, and other functional features that now number in the millions of distinct elements. These data are reshaping many long-held beliefs concerning the information content of the human and other complex genomes, including the very definition of the gene. Here I discuss and place in context many of the leading findings of ENCODE, as well as trends that are shaping the generation and interpretation of ENCODE data. Finally, I consider prospects for the future, including maximizing the accuracy, completeness, and utility of ENCODE data for the community.
Collapse
Affiliation(s)
- John A Stamatoyannopoulos
- Departments of Genome Sciences and Medicine, University of Washington School of Medicine, Seattle, Washington 98195, USA.
| |
Collapse
|
3237
|
Affiliation(s)
- Kelly A Frazer
- Moores UCSD Cancer Center, Department of Pediatrics and Rady Children's Hospital, University of California at San Diego, La Jolla, California 92093, USA.
| |
Collapse
|
3238
|
Knauss JL, Sun T. Regulatory mechanisms of long noncoding RNAs in vertebrate central nervous system development and function. Neuroscience 2013; 235:200-14. [PMID: 23337534 DOI: 10.1016/j.neuroscience.2013.01.022] [Citation(s) in RCA: 119] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Revised: 12/28/2012] [Accepted: 01/09/2013] [Indexed: 01/22/2023]
Abstract
Long noncoding RNAs (lncRNAs) have emerged as an important class of molecules that regulate gene expression at epigenetic, transcriptional, and post-transcriptional levels through a wide array of mechanisms. This regulation is of particular importance in the central nervous system (CNS), where precise modulation of gene expression is required for proper neuronal and glial production, connection and function. There are relatively few functional studies that characterize lncRNA mechanisms, but possible functions can often be inferred based on existing examples and the lncRNA's relative genomic position. In this review, we will discuss mechanisms of lncRNAs as predicted by genomic contexts and the possible impact on CNS development, function, and disease pathogenesis. There is no doubt that investigation of the mechanistic role of lncRNAs will open a new and exciting direction in studying CNS development and function.
Collapse
Affiliation(s)
- J L Knauss
- Department of Cell and Developmental Biology, Weill Medical College of Cornell University, New York, NY, United States.
| | | |
Collapse
|
3239
|
Abstract
Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.
Collapse
Affiliation(s)
- Elaine R Mardis
- The Genome Institute at Washington University School of Medicine, St. Louis, Missouri 63108, USA.
| |
Collapse
|
3240
|
Lachance V, Degrandmaison J, Marois S, Robitaille M, Génier S, Nadeau S, Angers S, Parent JL. Ubiquitination and activation of a Rab GTPase promoted by a β2-Adrenergic Receptor/HACE1 complex. J Cell Sci 2013; 127:111-23. [DOI: 10.1242/jcs.132944] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
We and others have shown that trafficking of G protein-coupled receptors is regulated by Rab GTPases. Cargo-mediated regulation of vesicular transport has received great attention lately. Rab GTPases, forming the largest branch of the Ras GTPase superfamily, regulate almost every step of vesicle-mediated trafficking. Rab GTPases are well-recognized targets of human diseases but their regulation and the mechanisms connecting them to cargo proteins are still poorly understood. Herein, we show by overexpression/depletion studies that HACE1, a HECT domain-containing ubiquitin ligase, promotes the recycling of the β2-adrenergic receptor (β2AR), a prototypical G protein-coupled receptor, through a Rab11a-dependent mechanism. Interestingly, the β2AR in conjunction with HACE1 triggered ubiquitination of Rab11a, as observed by Western blot analysis. LC-MS/MS experiments determined that Rab11a is ubiquitnatied on Lys145. A Rab11a-K145R mutant failed to undergo β2AR/HACE1-induced ubiquitination and inhibited the HACE1-mediated recycling of the β2AR. Rab11a, but not Rab11a-K145R, was activated by β2AR/HACE1 indicating that ubiquitination of Lys145 is involved in Rab11a activation. β2AR/HACE1 co-expression also potentiated ubiquitination of Rab6a and Rab8a, but not of other Rab GTPases that were tested. We report a novel regulatory mechanism of Rab GTPases by their ubiquitination with associated functional effects demonstrated on Rab11a. This partakes into a new pathway whereby a cargo protein, like a G protein-coupled receptor, can regulate its own trafficking by inducing the ubiquitination and activation of a Rab GTPase.
Collapse
|
3241
|
Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R, Heitner SG, Lee BT, Barber GP, Harte RA, Diekhans M, Long JC, Wilder SP, Zweig AS, Karolchik D, Kuhn RM, Haussler D, Kent WJ. ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res 2013; 41:D56-63. [PMID: 23193274 PMCID: PMC3531152 DOI: 10.1093/nar/gks1172] [Citation(s) in RCA: 614] [Impact Index Per Article: 55.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Revised: 10/26/2012] [Accepted: 10/28/2012] [Indexed: 02/07/2023] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE), http://encodeproject.org, has completed its fifth year of scientific collaboration to create a comprehensive catalog of functional elements in the human genome, and its third year of investigations in the mouse genome. Since the last report in this journal, the ENCODE human data repertoire has grown by 898 new experiments (totaling 2886), accompanied by a major integrative analysis. In the mouse genome, results from 404 new experiments became available this year, increasing the total to 583, collected during the course of the project. The University of California, Santa Cruz, makes this data available on the public Genome Browser http://genome.ucsc.edu for visual browsing and data mining. Download of raw and processed data files are all supported. The ENCODE portal provides specialized tools and information about the ENCODE data sets.
Collapse
Affiliation(s)
- Kate R Rosenbloom
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3242
|
Paraskevopoulou MD, Georgakilas G, Kostoulas N, Reczko M, Maragkakis M, Dalamagas TM, Hatzigeorgiou AG. DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs. Nucleic Acids Res 2012. [PMID: 23193281 PMCID: PMC3531175 DOI: 10.1093/nar/gks1246] [Citation(s) in RCA: 290] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Recently, the attention of the research community has been focused on long non-coding RNAs (lncRNAs) and their physiological/pathological implications. As the number of experiments increase in a rapid rate and transcriptional units are better annotated, databases indexing lncRNA properties and function gradually become essential tools to this process. Aim of DIANA-LncBase (www.microrna.gr/LncBase) is to reinforce researchers’ attempts and unravel microRNA (miRNA)–lncRNA putative functional interactions. This study provides, for the first time, a comprehensive annotation of miRNA targets on lncRNAs. DIANA-LncBase hosts transcriptome-wide experimentally verified and computationally predicted miRNA recognition elements (MREs) on human and mouse lncRNAs. The analysis performed includes an integration of most of the available lncRNA resources, relevant high-throughput HITS-CLIP and PAR-CLIP experimental data as well as state-of-the-art in silico target predictions. The experimentally supported entries available in DIANA-LncBase correspond to >5000 interactions, while the computationally predicted interactions exceed 10 million. DIANA-LncBase hosts detailed information for each miRNA–lncRNA pair, such as external links, graphic plots of transcripts’ genomic location, representation of the binding sites, lncRNA tissue expression as well as MREs conservation and prediction scores.
Collapse
|
3243
|
Rodriguez JM, Maietta P, Ezkurdia I, Pietrelli A, Wesselink JJ, Lopez G, Valencia A, Tress ML. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res 2012; 41:D110-7. [PMID: 23161672 PMCID: PMC3531113 DOI: 10.1093/nar/gks1058] [Citation(s) in RCA: 153] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. APPRIS has been designed to provide value to manual annotations of the human genome by adding reliable protein structural and functional data and information from cross-species conservation. The visual representation of the annotations provided by APPRIS for each gene allows annotators and researchers alike to easily identify functional changes brought about by splicing events. In addition to collecting, integrating and analyzing reliable predictions of the effect of splicing events, APPRIS also selects a single reference sequence for each gene, here termed the principal isoform, based on the annotations of structure, function and conservation for each transcript. APPRIS identifies a principal isoform for 85% of the protein-coding genes in the GENCODE 7 release for ENSEMBL. Analysis of the APPRIS data shows that at least 70% of the alternative (non-principal) variants would lose important functional or structural information relative to the principal isoform.
Collapse
|
3244
|
|
3245
|
Micale L, Loviglio MN, Manzoni M, Fusco C, Augello B, Migliavacca E, Cotugno G, Monti E, Borsani G, Reymond A, Merla G. A fish-specific transposable element shapes the repertoire of p53 target genes in zebrafish. PLoS One 2012; 7:e46642. [PMID: 23118857 PMCID: PMC3485254 DOI: 10.1371/journal.pone.0046642] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Accepted: 09/03/2012] [Indexed: 12/04/2022] Open
Abstract
Transposable elements, as major components of most eukaryotic organisms' genomes, define their structural organization and plasticity. They supply host genomes with functional elements, for example, binding sites of the pleiotropic master transcription factor p53 were identified in LINE1, Alu and LTR repeats in the human genome. Similarly, in this report we reveal the role of zebrafish (Danio rerio) EnSpmN6_DR non-autonomous DNA transposon in shaping the repertoire of the p53 target genes. The multiple copies of EnSpmN6_DR and their embedded p53 responsive elements drive in several instances p53-dependent transcriptional modulation of the adjacent gene, whose human orthologs were frequently previously annotated as p53 targets. These transposons define predominantly a set of target genes whose human orthologs contribute to neuronal morphogenesis, axonogenesis, synaptic transmission and the regulation of programmed cell death. Consistent with these biological functions the orthologs of the EnSpmN6_DR-colonized loci are enriched for genes expressed in the amygdala, the hippocampus and the brain cortex. Our data pinpoint a remarkable example of convergent evolution: the exaptation of lineage-specific transposons to shape p53-regulated neuronal morphogenesis-related pathways in both a hominid and a teleost fish.
Collapse
Affiliation(s)
- Lucia Micale
- Medical Genetics Unit, IRCCS “Casa Sollievo della Sofferenza”, San Giovanni Rotondo, Italy
| | - Maria Nicla Loviglio
- Medical Genetics Unit, IRCCS “Casa Sollievo della Sofferenza”, San Giovanni Rotondo, Italy
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Marta Manzoni
- Department of Biomedical Science and Biotechnology, University of Brescia, Brescia, Italy
| | - Carmela Fusco
- Medical Genetics Unit, IRCCS “Casa Sollievo della Sofferenza”, San Giovanni Rotondo, Italy
| | - Bartolomeo Augello
- Medical Genetics Unit, IRCCS “Casa Sollievo della Sofferenza”, San Giovanni Rotondo, Italy
| | - Eugenia Migliavacca
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Grazia Cotugno
- Medical Genetics Unit, IRCCS “Casa Sollievo della Sofferenza”, San Giovanni Rotondo, Italy
| | - Eugenio Monti
- Department of Biomedical Science and Biotechnology, University of Brescia, Brescia, Italy
| | - Giuseppe Borsani
- Department of Biomedical Science and Biotechnology, University of Brescia, Brescia, Italy
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Giuseppe Merla
- Medical Genetics Unit, IRCCS “Casa Sollievo della Sofferenza”, San Giovanni Rotondo, Italy
| |
Collapse
|
3246
|
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. ACTA ACUST UNITED AC 2012; 29:15-21. [PMID: 23104886 DOI: 10.1093/bioinformatics/bts635] [Citation(s) in RCA: 27498] [Impact Index Per Article: 2291.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
MOTIVATION Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. RESULTS To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. AVAILABILITY AND IMPLEMENTATION STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
Collapse
|
3247
|
O'Reilly D, Dienstbier M, Cowley SA, Vazquez P, Drozdz M, Taylor S, James WS, Murphy S. Differentially expressed, variant U1 snRNAs regulate gene expression in human cells. Genome Res 2012; 23:281-91. [PMID: 23070852 PMCID: PMC3561869 DOI: 10.1101/gr.142968.112] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Human U1 small nuclear (sn)RNA, required for splicing of pre-mRNA, is encoded by genes on chromosome 1 (1p36). Imperfect copies of these U1 snRNA genes, also located on chromosome 1 (1q12-21), were thought to be pseudogenes. However, many of these "variant" (v)U1 snRNA genes produce fully processed transcripts. Using antisense oligonucleotides to block the activity of a specific vU1 snRNA in HeLa cells, we have identified global transcriptome changes following interrogation of the Affymetrix Human Exon ST 1.0 array. Our results indicate that this vU1 snRNA regulates expression of a subset of target genes at the level of pre-mRNA processing. This is the first indication that variant U1 snRNAs have a biological function in vivo. Furthermore, some vU1 snRNAs are packaged into unique ribonucleoproteins (RNPs), and many vU1 snRNA genes are differentially expressed in human embryonic stem cells (hESCs) and HeLa cells, suggesting developmental control of RNA processing through expression of different sets of vU1 snRNPs.
Collapse
Affiliation(s)
- Dawn O'Reilly
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, United Kingdom
| | | | | | | | | | | | | | | |
Collapse
|
3248
|
Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, Harte R, Balasubramanian S, Tanzer A, Diekhans M, Reymond A, Hubbard TJ, Harrow J, Gerstein MB. The GENCODE pseudogene resource. Genome Biol 2012; 13:R51. [PMID: 22951037 PMCID: PMC3491395 DOI: 10.1186/gb-2012-13-9-r51] [Citation(s) in RCA: 253] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2012] [Revised: 05/30/2012] [Accepted: 06/25/2012] [Indexed: 12/11/2022] Open
Abstract
Background Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data. Results As part of the GENCODE annotation of the human genome, we present the first genome-wide pseudogene assignment for protein-coding genes, based on both large-scale manual annotation and in silico pipelines. A key aspect of this coupled approach is that it allows us to identify pseudogenes in an unbiased fashion as well as untangle complex events through manual evaluation. We integrate the pseudogene annotations with the extensive ENCODE functional genomics information. In particular, we determine the expression level, transcription-factor and RNA polymerase II binding, and chromatin marks associated with each pseudogene. Based on their distribution, we develop simple statistical models for each type of activity, which we validate with large-scale RT-PCR-Seq experiments. Finally, we compare our pseudogenes with conservation and variation data from primate alignments and the 1000 Genomes project, producing lists of pseudogenes potentially under selection. Conclusions At one extreme, some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs. The activity data of each pseudogene are stored in an associated resource, psiDR, which will be useful for the initial identification of potentially functional pseudogenes.
Collapse
Affiliation(s)
- Baikang Pei
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3249
|
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, Garg K, Sandstrom R, Bates D, Canfield TK, Diegel M, Dunn D, Ebersol AK, Frum T, Giste E, Harding L, Johnson AK, Johnson EM, Kutyavin T, Lajoie B, Lee BK, Lee K, London D, Lotakis D, Neph S, Neri F, Nguyen ED, Reynolds AP, Roach V, Safi A, Sanchez ME, Sanyal A, Shafer A, Simon JM, Song L, Vong S, Weaver M, Zhang Z, Zhang Z, Lenhard B, Tewari M, Dorschner MO, Hansen RS, Navas PA, Stamatoyannopoulos G, Iyer VR, Lieb JD, Sunyaev SR, Akey JM, Sabo PJ, Kaul R, Furey TS, Dekker J, Crawford GE, Stamatoyannopoulos JA. The accessible chromatin landscape of the human genome. Nature 2012; 489:75-82. [PMID: 22955617 PMCID: PMC3721348 DOI: 10.1038/nature11232] [Citation(s) in RCA: 1917] [Impact Index Per Article: 159.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Accepted: 05/15/2012] [Indexed: 02/07/2023]
Abstract
DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect ∼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.
Collapse
Affiliation(s)
- Robert E. Thurman
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Eric Rynes
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Richard Humbert
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Jeff Vierstra
- Department of Genome Sciences, University of Washington, Seattle, WA
| | | | - Eric Haugen
- Department of Genome Sciences, University of Washington, Seattle, WA
| | | | | | - Hao Wang
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Kavita Garg
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Richard Sandstrom
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Daniel Bates
- Department of Genome Sciences, University of Washington, Seattle, WA
| | | | - Morgan Diegel
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Douglas Dunn
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Abigail K. Ebersol
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Tristan Frum
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Erika Giste
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Lisa Harding
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Audra K. Johnson
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Ericka M. Johnson
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Tanya Kutyavin
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Bryan Lajoie
- Program in Gene Function, University of Massachusetts Medical School, Worcester, MA
| | - Bum-Kyu Lee
- Institute for Cellular and Molecular Biology, University of Texas, Austin, TX
| | - Kristen Lee
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Darin London
- Institute for Genome Sciences and Policy, Duke University, Durham, NC
| | - Dimitra Lotakis
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Shane Neph
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Fidencio Neri
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Eric D. Nguyen
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Alex P. Reynolds
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Vaughn Roach
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Alexias Safi
- Institute for Genome Sciences and Policy, Duke University, Durham, NC
| | - Minerva E. Sanchez
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Amartya Sanyal
- Program in Gene Function, University of Massachusetts Medical School, Worcester, MA
| | - Anthony Shafer
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Jeremy M. Simon
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Lingyun Song
- Institute for Genome Sciences and Policy, Duke University, Durham, NC
| | - Shinny Vong
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Molly Weaver
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Zhancheng Zhang
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Zhuzhu Zhang
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Boris Lenhard
- Bergen Center for Computational Science, University of Bergen, Bergen, Norway
| | - Muneesh Tewari
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Michael O. Dorschner
- Dept. of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA
| | - R. Scott Hansen
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Patrick A. Navas
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | | | - Vishwanath R. Iyer
- Institute for Cellular and Molecular Biology, University of Texas, Austin, TX
| | - Jason D. Lieb
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Shamil R. Sunyaev
- Dept. of Medicine, Division of Genetics, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA
| | - Joshua M. Akey
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Peter J. Sabo
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Rajinder Kaul
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Terrence S. Furey
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Job Dekker
- Program in Gene Function, University of Massachusetts Medical School, Worcester, MA
| | | | - John A. Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Seattle, WA
- Department of Medicine, Division of Oncology, University of Washington, Seattle, WA
| |
Collapse
|
3250
|
Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi AM, Tanzer A, Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA, Zaleski C, Rozowsky J, Röder M, Kokocinski F, Abdelhamid RF, Alioto T, Antoshechkin I, Baer MT, Bar NS, Batut P, Bell K, Bell I, Chakrabortty S, Chen X, Chrast J, Curado J, Derrien T, Drenkow J, Dumais E, Dumais J, Duttagupta R, Falconnet E, Fastuca M, Fejes-Toth K, Ferreira P, Foissac S, Fullwood MJ, Gao H, Gonzalez D, Gordon A, Gunawardena H, Howald C, Jha S, Johnson R, Kapranov P, King B, Kingswood C, Luo OJ, Park E, Persaud K, Preall JB, Ribeca P, Risk B, Robyr D, Sammeth M, Schaffer L, See LH, Shahab A, Skancke J, Suzuki AM, Takahashi H, Tilgner H, Trout D, Walters N, Wang H, Wrobel J, Yu Y, Ruan X, Hayashizaki Y, Harrow J, Gerstein M, Hubbard T, Reymond A, Antonarakis SE, Hannon G, Giddings MC, Ruan Y, Wold B, Carninci P, Guigó R, Gingeras TR. Landscape of transcription in human cells. Nature 2012; 489:101-8. [PMID: 22955620 PMCID: PMC3684276 DOI: 10.1038/nature11233] [Citation(s) in RCA: 3775] [Impact Index Per Article: 314.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2011] [Accepted: 05/15/2012] [Indexed: 02/07/2023]
Abstract
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
Collapse
Affiliation(s)
- Sarah Djebali
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Carrie A. Davis
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Angelika Merkel
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Alex Dobin
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Timo Lassmann
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Ali M. Mortazavi
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
- University of California Irvine, Dept of. Developmental and Cell Biology, 2300 Biological Sciences III, Irving, CA USA 92697
| | - Andrea Tanzer
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Wei Lin
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Felix Schlesinger
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Chenghai Xue
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Georgi K. Marinov
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Jainab Khatun
- Boise State University, College of Arts & Sciences, 1910 University Dr. Boise, ID USA 83725
| | - Brian A. Williams
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Chris Zaleski
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Joel Rozowsky
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520
| | - Maik Röder
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Felix Kokocinski
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire United Kingdom CB10 1SA
| | - Rehab F. Abdelhamid
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Tyler Alioto
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Igor Antoshechkin
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Michael T. Baer
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Nadav S. Bar
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Philippe Batut
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Kimberly Bell
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Ian Bell
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - Sudipto Chakrabortty
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Xian Chen
- University of North Carolina at Chapel Hill, Department of Biochemistry & Biophysics, 120 Mason Farm Rd., Chapel Hill, NC USA 27599
| | - Jacqueline Chrast
- University of Lausanne, Center for Integrative Genomics, Genopode building, Lausanne, Switzerland 1015
| | - Joao Curado
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Thomas Derrien
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Jorg Drenkow
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Erica Dumais
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - Jacqueline Dumais
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - Radha Duttagupta
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - Emilie Falconnet
- University of Geneva Medical School, Department of Genetic Medicine and Development and iGE3 Institute of Genetics and Genomics of Geneva, 1 rue Michel-Servet, Geneva, Switzerland 1015
| | - Meagan Fastuca
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Kata Fejes-Toth
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Pedro Ferreira
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Sylvain Foissac
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - Melissa J. Fullwood
- Genome Institute of Singapore, Genome Technology and Biology, 60 Biopolis Street, #02-01, Genome, Singapore, Singapore 138672
| | - Hui Gao
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - David Gonzalez
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Assaf Gordon
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Harsha Gunawardena
- University of North Carolina at Chapel Hill, Department of Biochemistry & Biophysics, 120 Mason Farm Rd., Chapel Hill, NC USA 27599
| | - Cedric Howald
- University of Lausanne, Center for Integrative Genomics, Genopode building, Lausanne, Switzerland 1015
| | - Sonali Jha
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Rory Johnson
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Philipp Kapranov
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
- St. Laurent Institute, One Kendall Square, Cambridge, MA
| | - Brandon King
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Colin Kingswood
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Oscar J. Luo
- Genome Institute of Singapore, Genome Technology and Biology, 60 Biopolis Street, #02-01, Genome, Singapore, Singapore 138672
| | - Eddie Park
- University of California Irvine, Dept of. Developmental and Cell Biology, 2300 Biological Sciences III, Irving, CA USA 92697
| | - Kimberly Persaud
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Jonathan B. Preall
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Paolo Ribeca
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Brian Risk
- Boise State University, College of Arts & Sciences, 1910 University Dr. Boise, ID USA 83725
| | - Daniel Robyr
- University of Geneva Medical School, Department of Genetic Medicine and Development and iGE3 Institute of Genetics and Genomics of Geneva, 1 rue Michel-Servet, Geneva, Switzerland 1015
| | - Michael Sammeth
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Lorian Schaffer
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Lei-Hoon See
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Atif Shahab
- Genome Institute of Singapore, Genome Technology and Biology, 60 Biopolis Street, #02-01, Genome, Singapore, Singapore 138672
| | - Jorgen Skancke
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Ana Maria Suzuki
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Hazuki Takahashi
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Hagen Tilgner
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Diane Trout
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Nathalie Walters
- University of Lausanne, Center for Integrative Genomics, Genopode building, Lausanne, Switzerland 1015
| | - Huaien Wang
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - John Wrobel
- Boise State University, College of Arts & Sciences, 1910 University Dr. Boise, ID USA 83725
| | - Yanbao Yu
- University of North Carolina at Chapel Hill, Department of Biochemistry & Biophysics, 120 Mason Farm Rd., Chapel Hill, NC USA 27599
| | - Xiaoan Ruan
- Genome Institute of Singapore, Genome Technology and Biology, 60 Biopolis Street, #02-01, Genome, Singapore, Singapore 138672
| | - Yoshihide Hayashizaki
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire United Kingdom CB10 1SA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520
- Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520
| | - Tim Hubbard
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire United Kingdom CB10 1SA
| | - Alexandre Reymond
- University of Lausanne, Center for Integrative Genomics, Genopode building, Lausanne, Switzerland 1015
| | - Stylianos E. Antonarakis
- University of Geneva Medical School, Department of Genetic Medicine and Development and iGE3 Institute of Genetics and Genomics of Geneva, 1 rue Michel-Servet, Geneva, Switzerland 1015
| | - Gregory Hannon
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Morgan C. Giddings
- Boise State University, College of Arts & Sciences, 1910 University Dr. Boise, ID USA 83725
- University of North Carolina at Chapel Hill, Department of Biochemistry & Biophysics, 120 Mason Farm Rd., Chapel Hill, NC USA 27599
| | - Yijun Ruan
- Genome Institute of Singapore, Genome Technology and Biology, 60 Biopolis Street, #02-01, Genome, Singapore, Singapore 138672
| | - Barbara Wold
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Piero Carninci
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Thomas R. Gingeras
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| |
Collapse
|