3101
|
Koboldt D, Larson D, Sullivan L, Bowne S, Steinberg K, Churchill J, Buhr A, Nutter N, Pierce E, Blanton S, Weinstock G, Wilson R, Daiger S. Exome-based mapping and variant prioritization for inherited Mendelian disorders. Am J Hum Genet 2014; 94:373-84. [PMID: 24560519 DOI: 10.1016/j.ajhg.2014.01.016] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 01/30/2014] [Indexed: 02/08/2023] Open
Abstract
Exome sequencing in families affected by rare genetic disorders has the potential to rapidly identify new disease genes (genes in which mutations cause disease), but the identification of a single causal mutation among thousands of variants remains a significant challenge. We developed a scoring algorithm to prioritize potential causal variants within a family according to segregation with the phenotype, population frequency, predicted effect, and gene expression in the tissue(s) of interest. To narrow the search space in families with multiple affected individuals, we also developed two complementary approaches to exome-based mapping of autosomal-dominant disorders. One approach identifies segments of maximum identity by descent among affected individuals; the other nominates regions on the basis of shared rare variants and the absence of homozygous differences between affected individuals. We showcase our methods by using exome sequence data from families affected by autosomal-dominant retinitis pigmentosa (adRP), a rare disorder characterized by night blindness and progressive vision loss. We performed exome capture and sequencing on 91 samples representing 24 families affected by probable adRP but lacking common disease-causing mutations. Eight of 24 families (33%) were revealed to harbor high-scoring, most likely pathogenic (by clinical assessment) mutations affecting known RP genes. Analysis of the remaining 17 families identified candidate variants in a number of interesting genes, some of which have withstood further segregation testing in extended pedigrees. To empower the search for Mendelian-disease genes in family-based sequencing studies, we implemented them in a cross-platform-compatible software package, MendelScan, which is freely available to the research community.
Collapse
|
3102
|
Bai W, Yang J, Yang G, Niu P, Tian L, Gao A. Long non-coding RNA NR_045623 and NR_028291 involved in benzene hematotoxicity in occupationally benzene-exposed workers. Exp Mol Pathol 2014; 96:354-60. [PMID: 24613687 DOI: 10.1016/j.yexmp.2014.02.016] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2014] [Accepted: 02/28/2014] [Indexed: 01/08/2023]
Abstract
Benzene is an established human hematotoxicant and leukemogen. New insights into the pathogenesis of benzene hematotoxicity are urgently needed. Long non-coding RNA (lncRNA) widely participate in various physiological and pathological processes. It has been shown that lncRNA plays an important role in hematologic malignancy tumorigenesis. However, the expression and biological function of lncRNA during benzene hematotoxicity progress remain largely unknown. An integrated analysis of differentially expressed lncRNA and mRNA was performed to identify genes which were likely to be critical for benzene hematotoxicity through Microarray analysis. Dynamic gene network analysis of the differentially expressed lncRNA and mRNA was constructed and two main lncRNA (NR_045623 and NR_028291) were discovered and two key lncRNA subnets were involved in immune responses, hematopoiesis, B cell receptor signaling pathway and chronic myeloid leukemia. These findings suggested that NR_045623 and NR_028291 might be the key genes associated with benzene hematotoxicity.
Collapse
Affiliation(s)
- Wenlin Bai
- Department of Occupational Health and Environmental Health, School of Public Health, Capital Medical University, Beijing 100069, China; Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing 100069, China
| | - Jing Yang
- Department of Occupational Health and Environmental Health, School of Public Health, Capital Medical University, Beijing 100069, China; Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing 100069, China
| | - Gengxia Yang
- Department of Occupational Health and Environmental Health, School of Public Health, Capital Medical University, Beijing 100069, China; Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing 100069, China
| | - Piye Niu
- Department of Occupational Health and Environmental Health, School of Public Health, Capital Medical University, Beijing 100069, China; Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing 100069, China
| | - Lin Tian
- Department of Occupational Health and Environmental Health, School of Public Health, Capital Medical University, Beijing 100069, China; Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing 100069, China
| | - Ai Gao
- Department of Occupational Health and Environmental Health, School of Public Health, Capital Medical University, Beijing 100069, China; Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing 100069, China.
| |
Collapse
|
3103
|
TF2LncRNA: identifying common transcription factors for a list of lncRNA genes from ChIP-Seq data. BIOMED RESEARCH INTERNATIONAL 2014; 2014:317642. [PMID: 24729968 PMCID: PMC3960524 DOI: 10.1155/2014/317642] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 01/14/2014] [Accepted: 01/27/2014] [Indexed: 11/17/2022]
Abstract
High-throughput genomic technologies like lncRNA microarray and RNA-Seq often generate a set of lncRNAs of interest, yet little is known about the transcriptional regulation of the set of lncRNA genes. Here, based on ChIP-Seq peak lists of transcription factors (TFs) from ENCODE and annotated human lncRNAs from GENCODE, we developed a web-based interface titled “TF2lncRNA,” where TF peaks from each ChIP-Seq experiment are crossed with the genomic coordinates of a set of input lncRNAs, to identify which TFs present a statistically significant number of binding sites (peaks) within the regulatory region of the input lncRNA genes. The input can be a set of coexpressed lncRNA genes or any other cluster of lncRNA genes. Users can thus infer which TFs are likely to be common transcription regulators of the set of lncRNAs. In addition, users can retrieve all lncRNAs potentially regulated by a specific TF in a specific cell line of interest or retrieve all TFs that have one or more binding sites in the regulatory region of a given lncRNA in the specific cell line. TF2LncRNA is an efficient and easy-to-use web-based tool.
Collapse
|
3104
|
Hackermüller J, Reiche K, Otto C, Hösler N, Blumert C, Brocke-Heidrich K, Böhlig L, Nitsche A, Kasack K, Ahnert P, Krupp W, Engeland K, Stadler PF, Horn F. Cell cycle, oncogenic and tumor suppressor pathways regulate numerous long and macro non-protein-coding RNAs. Genome Biol 2014; 15:R48. [PMID: 24594072 PMCID: PMC4054595 DOI: 10.1186/gb-2014-15-3-r48] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Accepted: 03/04/2014] [Indexed: 12/16/2022] Open
Abstract
Background The genome is pervasively transcribed but most transcripts do not code for proteins, constituting non-protein-coding RNAs. Despite increasing numbers of functional reports of individual long non-coding RNAs (lncRNAs), assessing the extent of functionality among the non-coding transcriptional output of mammalian cells remains intricate. In the protein-coding world, transcripts differentially expressed in the context of processes essential for the survival of multicellular organisms have been instrumental in the discovery of functionally relevant proteins and their deregulation is frequently associated with diseases. We therefore systematically identified lncRNAs expressed differentially in response to oncologically relevant processes and cell-cycle, p53 and STAT3 pathways, using tiling arrays. Results We found that up to 80% of the pathway-triggered transcriptional responses are non-coding. Among these we identified very large macroRNAs with pathway-specific expression patterns and demonstrated that these are likely continuous transcripts. MacroRNAs contain elements conserved in mammals and sauropsids, which in part exhibit conserved RNA secondary structure. Comparing evolutionary rates of a macroRNA to adjacent protein-coding genes suggests a local action of the transcript. Finally, in different grades of astrocytoma, a tumor disease unrelated to the initially used cell lines, macroRNAs are differentially expressed. Conclusions It has been shown previously that the majority of expressed non-ribosomal transcripts are non-coding. We now conclude that differential expression triggered by signaling pathways gives rise to a similar abundance of non-coding content. It is thus unlikely that the prevalence of non-coding transcripts in the cell is a trivial consequence of leaky or random transcription events.
Collapse
|
3105
|
MARIS: method for analyzing RNA following intracellular sorting. PLoS One 2014; 9:e89459. [PMID: 24594682 PMCID: PMC3940959 DOI: 10.1371/journal.pone.0089459] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2013] [Accepted: 01/22/2014] [Indexed: 11/21/2022] Open
Abstract
Transcriptional profiling is a key technique in the study of cell biology that is limited by the availability of reagents to uniquely identify specific cell types and isolate high quality RNA from them. We report a Method for Analyzing RNA following Intracellular Sorting (MARIS) that generates high quality RNA for transcriptome profiling following cellular fixation, intracellular immunofluorescent staining and FACS. MARIS can therefore be used to isolate high quality RNA from many otherwise inaccessible cell types simply based on immunofluorescent tagging of unique intracellular proteins. As proof of principle, we isolate RNA from sorted human embryonic stem cell-derived insulin-expressing cells as well as adult human β cells. MARIS is a basic molecular biology technique that could be used across several biological disciplines.
Collapse
|
3106
|
Eskola PJ, Männikkö M, Samartzis D, Karppinen J. Genome-wide association studies of lumbar disc degeneration--are we there yet? Spine J 2014; 14:479-82. [PMID: 24210639 DOI: 10.1016/j.spinee.2013.07.437] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Accepted: 07/14/2013] [Indexed: 02/03/2023]
Affiliation(s)
- Pasi J Eskola
- Department of Physical and Rehabilitation Medicine, Institute of Clinical Medicine, University of Oulu, and Medical Research Center Oulu, Box 5000, 90014 Oulu, Finland
| | - Minna Männikkö
- Institute of Health Sciences, Biocenter Oulu, University of Oulu, Box 5000, 90014 Oulu, Finland
| | - Dino Samartzis
- Department of Orthopaedics and Traumatology, University of Hong Kong, Professorial Block, 5th Floor, 102 Pokfulam Rd, Pokfulam, Hong Kong, SAR, China
| | - Jaro Karppinen
- Department of Physical and Rehabilitation Medicine, Institute of Clinical Medicine, University of Oulu, and Medical Research Center Oulu, Box 5000, 90014 Oulu, Finland.
| |
Collapse
|
3107
|
Ritchie GRS, Dunham I, Zeggini E, Flicek P. Functional annotation of noncoding sequence variants. Nat Methods 2014; 11:294-6. [PMID: 24487584 PMCID: PMC5015703 DOI: 10.1038/nmeth.2832] [Citation(s) in RCA: 388] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 01/02/2014] [Indexed: 12/18/2022]
Abstract
Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics. For variants in protein-coding regions, our understanding of the genetic code and splicing allows us to identify likely candidates, but interpreting variants outside genic regions is more difficult. Here we present genome-wide annotation of variants (GWAVA), a tool that supports prioritization of noncoding variants by integrating various genomic and epigenomic annotations.
Collapse
Affiliation(s)
- Graham R. S. Ritchie
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | | | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| |
Collapse
|
3108
|
Kuleshov V, Xie D, Chen R, Pushkarev D, Ma Z, Blauwkamp T, Kertesz M, Snyder M. Whole-genome haplotyping using long reads and statistical methods. Nat Biotechnol 2014; 32:261-266. [PMID: 24561555 PMCID: PMC4073643 DOI: 10.1038/nbt.2833] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 01/17/2014] [Indexed: 12/24/2022]
Abstract
The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2-1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.
Collapse
Affiliation(s)
- Volodymyr Kuleshov
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
- Illumina, Inc., 5200 Illumina Way, San Diego, CA 92199, USA
| | - Dan Xie
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Rui Chen
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | - Zhihai Ma
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Tim Blauwkamp
- Illumina, Inc., 5200 Illumina Way, San Diego, CA 92199, USA
| | | | - Michael Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
3109
|
Comparative genomic analysis of eutherian Mas-related G protein-coupled receptor genes. Gene 2014; 540:16-9. [PMID: 24583173 DOI: 10.1016/j.gene.2014.02.049] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Revised: 01/25/2014] [Accepted: 02/25/2014] [Indexed: 11/22/2022]
Abstract
The present study made attempts to update comprehensive eutherian Mas-related G protein-coupled receptor gene data sets, using public eutherian genomic sequence data sets and new genomics and molecular evolution tests. Among 254 potential coding sequences, the most comprehensive gene data set of eutherian Mas-related G protein-coupled receptor genes included 119 complete coding sequences that described eight major gene clusters. The present analysis integrated gene annotations, phylogenetic analysis and protein molecular evolution analysis and first explained differential gene expansion patterns of eutherian Mas-related G protein-coupled receptor genes. The updated classification and nomenclature of eutherian Mas-related G protein-coupled receptor genes were proposed as new framework of future experiments.
Collapse
|
3110
|
Scarpato M, Esposito R, Evangelista D, Aprile M, Ambrosio MR, Angelini C, Ciccodicola A, Costa V. AnaLysis of Expression on human chromosome 21, ALE-HSA21: a pilot integrated web resource. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau009. [PMID: 24573881 PMCID: PMC3935309 DOI: 10.1093/database/bau009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Transcriptome studies have shown the pervasive nature of transcription, demonstrating almost all the genes undergo alternative splicing. Accurately annotating all transcripts of a gene is crucial. It is needed to understand the impact of mutations on phenotypes, to shed light on genetic and epigenetic regulation of mRNAs and more generally to widen our knowledge about cell functionality and tissue diversity. RNA-sequencing (RNA-Seq), and the other applications of the next-generation sequencing, provides precious data to improve annotations' accuracy, simultaneously creating issues related to the variety, complexity and the size of produced data. In this ‘scenario’, the lack of user-friendly resources, easily accessible to researchers with low skills in bioinformatics, makes difficult to retrieve complete information about one or few genes without browsing a jungle of databases. Concordantly, the increasing amount of data from ‘omics’ technologies imposes to develop integrated databases merging different data formats coming from distinct but complementary sources. In light of these considerations, and given the wide interest in studying Down syndrome—a genetic condition due to the trisomy of human chromosome 21 (HSA21)—we developed an integrated relational database and a web interface, named ALE-HSA21 (AnaLysis of Expression on HSA21), accessible at http://bioinfo.na.iac.cnr.it/ALE-HSA21. This comprehensive and user-friendly web resource integrates—for all coding and noncoding transcripts of chromosome 21—existing gene annotations and transcripts identified de novo through RNA-Seq analysis with predictive computational analysis of regulatory sequences. Given the role of noncoding RNAs and untranslated regions of coding genes in key regulatory mechanisms, ALE-HSA21 is also an interesting web-based platform to investigate such processes. The ‘transcript-centric’ and easily-accessible nature of ALE-HSA21 makes this resource a valuable tool to rapidly retrieve data at the isoform level, rather than at gene level, useful to investigate any disease, molecular pathway or cell process involving chromosome 21 genes. Database URL: http://bioinfo.na.iac.cnr.it/ALE-HSA21/
Collapse
Affiliation(s)
- Margherita Scarpato
- Institute of Genetics and Biophysics 'Adriano Buzzati-Traverso', National Research Council, Naples, Italy, Department of Pharmaceutical Sciences, University of Salerno, National Research Council, Fisciano, Salerno, Italy, Istituto per le Applicazioni del Calcolo 'Mauro Picone', National Research Council, Naples, Italy and Department of Biochemistry and Biophysics, Second University of Naples (SUN), Naples, Italy
| | | | | | | | | | | | | | | |
Collapse
|
3111
|
Witasp A, Ekstrom TJ, Schalling M, Lindholm B, Stenvinkel P, Nordfors L. How can genetics and epigenetics help the nephrologist improve the diagnosis and treatment of chronic kidney disease patients? Nephrol Dial Transplant 2014; 29:972-80. [DOI: 10.1093/ndt/gfu021] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
|
3112
|
Bickhart DM, Liu GE. The challenges and importance of structural variation detection in livestock. Front Genet 2014; 5:37. [PMID: 24600474 PMCID: PMC3927395 DOI: 10.3389/fgene.2014.00037] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Accepted: 01/31/2014] [Indexed: 01/25/2023] Open
Abstract
Recent studies in humans and other model organisms have demonstrated that structural variants (SVs) comprise a substantial proportion of variation among individuals of each species. Many of these variants have been linked to debilitating diseases in humans, thereby cementing the importance of refining methods for their detection. Despite progress in the field, reliable detection of SVs still remains a problem even for human subjects. Many of the underlying problems that make SVs difficult to detect in humans are amplified in livestock species, whose lower quality genome assemblies and incomplete gene annotation can often give rise to false positive SV discoveries. Regardless of the challenges, SV detection is just as important for livestock researchers as it is for human researchers, given that several productive traits and diseases have been linked to copy number variations (CNVs) in cattle, sheep, and pig. Already, there is evidence that many beneficial SVs have been artificially selected in livestock such as a duplication of the agouti signaling protein gene that causes white coat color in sheep. In this review, we will list current SV and CNV discoveries in livestock and discuss the problems that hinder routine discovery and tracking of these polymorphisms. We will also discuss the impacts of selective breeding on CNV and SV frequencies and mention how SV genotyping could be used in the future to improve genetic selection.
Collapse
Affiliation(s)
- Derek M Bickhart
- Animal Improvement Programs Laboratory, United States Department of Agriculture-Agricultural Research Service Beltsville, MD, USA
| | - George E Liu
- Bovine Functional Genomics Laboratory, United States Department of Agriculture-Agricultural Research Service Beltsville, MD, USA
| |
Collapse
|
3113
|
Abstract
Human pluripotent stem cells (hPSCs) have the potential to generate any human cell type, and one widely recognized goal is to make pancreatic β cells. To this end, comparisons between differentiated cell types produced in vitro and their in vivo counterparts are essential to validate hPSC-derived cells. Genome-wide transcriptional analysis of sorted insulin-expressing (INS(+)) cells derived from three independent hPSC lines, human fetal pancreata, and adult human islets points to two major conclusions: (i) Different hPSC lines produce highly similar INS(+) cells and (ii) hPSC-derived INS(+) (hPSC-INS(+)) cells more closely resemble human fetal β cells than adult β cells. This study provides a direct comparison of transcriptional programs between pure hPSC-INS(+) cells and true β cells and provides a catalog of genes whose manipulation may convert hPSC-INS(+) cells into functional β cells.
Collapse
|
3114
|
Ayub Q, Moutsianas L, Chen Y, Panoutsopoulou K, Colonna V, Pagani L, Prokopenko I, Ritchie GRS, Tyler-Smith C, McCarthy MI, Zeggini E, Xue Y. Revisiting the thrifty gene hypothesis via 65 loci associated with susceptibility to type 2 diabetes. Am J Hum Genet 2014; 94:176-85. [PMID: 24412096 DOI: 10.1016/j.ajhg.2013.12.010] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Accepted: 12/10/2013] [Indexed: 12/27/2022] Open
Abstract
We have investigated the evidence for positive selection in samples of African, European, and East Asian ancestry at 65 loci associated with susceptibility to type 2 diabetes (T2D) previously identified through genome-wide association studies. Selection early in human evolutionary history is predicted to lead to ancestral risk alleles shared between populations, whereas late selection would result in population-specific signals at derived risk alleles. By using a wide variety of tests based on the site frequency spectrum, haplotype structure, and population differentiation, we found no global signal of enrichment for positive selection when we considered all T2D risk loci collectively. However, in a locus-by-locus analysis, we found nominal evidence for positive selection at 14 of the loci. Selection favored the protective and risk alleles in similar proportions, rather than the risk alleles specifically as predicted by the thrifty gene hypothesis, and may not be related to influence on diabetes. Overall, we conclude that past positive selection has not been a powerful influence driving the prevalence of T2D risk alleles.
Collapse
Affiliation(s)
- Qasim Ayub
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK
| | - Loukas Moutsianas
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Yuan Chen
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK
| | | | - Vincenza Colonna
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK; Institute of Genetics and Biophysics, National Research Council (CNR), 80125 Naples, Italy
| | - Luca Pagani
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK
| | - Inga Prokopenko
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Graham R S Ritchie
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK; European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SH, UK
| | - Chris Tyler-Smith
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK
| | - Mark I McCarthy
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK; Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Old Road, Headington, Oxford OX3 7LJ, UK; Oxford NIHR Biomedical Research Centre, Churchill Hospital, Old Road, Headington, Oxford OX3 7LJ, UK
| | - Eleftheria Zeggini
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK
| | - Yali Xue
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1HH, UK.
| |
Collapse
|
3115
|
Filone CM, Caballero IS, Dower K, Mendillo ML, Cowley GS, Santagata S, Rozelle DK, Yen J, Rubins KH, Hacohen N, Root DE, Hensley LE, Connor J. The master regulator of the cellular stress response (HSF1) is critical for orthopoxvirus infection. PLoS Pathog 2014; 10:e1003904. [PMID: 24516381 PMCID: PMC3916389 DOI: 10.1371/journal.ppat.1003904] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Accepted: 12/12/2013] [Indexed: 12/17/2022] Open
Abstract
The genus Orthopoxviridae contains a diverse group of human pathogens including monkeypox, smallpox and vaccinia. These viruses are presumed to be less dependent on host functions than other DNA viruses because they have large genomes and replicate in the cytoplasm, but a detailed understanding of the host factors required by orthopoxviruses is lacking. To address this topic, we performed an unbiased, genome-wide pooled RNAi screen targeting over 17,000 human genes to identify the host factors that support orthopoxvirus infection. We used secondary and tertiary assays to validate our screen results. One of the strongest hits was heat shock factor 1 (HSF1), the ancient master regulator of the cytoprotective heat-shock response. In investigating the behavior of HSF1 during vaccinia infection, we found that HSF1 was phosphorylated, translocated to the nucleus, and increased transcription of HSF1 target genes. Activation of HSF1 was supportive for virus replication, as RNAi knockdown and HSF1 small molecule inhibition prevented orthopoxvirus infection. Consistent with its role as a transcriptional activator, inhibition of several HSF1 targets also blocked vaccinia virus replication. These data show that orthopoxviruses co-opt host transcriptional responses for their own benefit, thereby effectively extending their functional genome to include genes residing within the host DNA. The dependence on HSF1 and its chaperone network offers multiple opportunities for antiviral drug development.
Collapse
Affiliation(s)
- Claire Marie Filone
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
- United States Army Medical Research Institute of Infectious Diseases, Virology Division, Fort Detrick, Maryland, United States of America
- * E-mail:
| | - Ignacio S. Caballero
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Ken Dower
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Marc L. Mendillo
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Glenn S. Cowley
- The Broad Institute, Cambridge Massachusetts, United States of America
| | - Sandro Santagata
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Daniel K. Rozelle
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Judy Yen
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Kathleen H. Rubins
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Nir Hacohen
- The Broad Institute, Cambridge Massachusetts, United States of America
| | - David E. Root
- The Broad Institute, Cambridge Massachusetts, United States of America
| | - Lisa E. Hensley
- United States Army Medical Research Institute of Infectious Diseases, Virology Division, Fort Detrick, Maryland, United States of America
| | - John Connor
- Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| |
Collapse
|
3116
|
Klus P, Bolognesi B, Agostini F, Marchese D, Zanzoni A, Tartaglia GG. The cleverSuite approach for protein characterization: predictions of structural properties, solubility, chaperone requirements and RNA-binding abilities. ACTA ACUST UNITED AC 2014; 30:1601-8. [PMID: 24493033 PMCID: PMC4029037 DOI: 10.1093/bioinformatics/btu074] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Motivation: The recent shift towards high-throughput screening is posing new challenges for the interpretation of experimental results. Here we propose the cleverSuite approach for large-scale characterization of protein groups. Description: The central part of the cleverSuite is the cleverMachine (CM), an algorithm that performs statistics on protein sequences by comparing their physico-chemical propensities. The second element is called cleverClassifier and builds on top of the models generated by the CM to allow classification of new datasets. Results: We applied the cleverSuite to predict secondary structure properties, solubility, chaperone requirements and RNA-binding abilities. Using cross-validation and independent datasets, the cleverSuite reproduces experimental findings with great accuracy and provides models that can be used for future investigations. Availability: The intuitive interface for dataset exploration, analysis and prediction is available at http://s.tartaglialab.com/clever_suite. Contact:gian.tartaglia@crg.es Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Petr Klus
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Benedetta Bolognesi
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Federico Agostini
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Domenica Marchese
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Andreas Zanzoni
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Gian Gaetano Tartaglia
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| |
Collapse
|
3117
|
Brooks AN, Choi PS, de Waal L, Sharifnia T, Imielinski M, Saksena G, Pedamallu CS, Sivachenko A, Rosenberg M, Chmielecki J, Lawrence MS, DeLuca DS, Getz G, Meyerson M. A pan-cancer analysis of transcriptome changes associated with somatic mutations in U2AF1 reveals commonly altered splicing events. PLoS One 2014; 9:e87361. [PMID: 24498085 PMCID: PMC3909098 DOI: 10.1371/journal.pone.0087361] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2013] [Accepted: 12/20/2013] [Indexed: 01/23/2023] Open
Abstract
Although recurrent somatic mutations in the splicing factor U2AF1 (also known as U2AF35) have been identified in multiple cancer types, the effects of these mutations on the cancer transcriptome have yet to be fully elucidated. Here, we identified splicing alterations associated with U2AF1 mutations across distinct cancers using DNA and RNA sequencing data from The Cancer Genome Atlas (TCGA). Using RNA-Seq data from 182 lung adenocarcinomas and 167 acute myeloid leukemias (AML), in which U2AF1 is somatically mutated in 3-4% of cases, we identified 131 and 369 splicing alterations, respectively, that were significantly associated with U2AF1 mutation. Of these, 30 splicing alterations were statistically significant in both lung adenocarcinoma and AML, including three genes in the Cancer Gene Census, CTNNB1, CHCHD7, and PICALM. Cell line experiments expressing U2AF1 S34F in HeLa cells and in 293T cells provide further support that these altered splicing events are caused by U2AF1 mutation. Consistent with the function of U2AF1 in 3' splice site recognition, we found that S34F/Y mutations cause preferences for CAG over UAG 3' splice site sequences. This report demonstrates consistent effects of U2AF1 mutation on splicing in distinct cancer cell types.
Collapse
Affiliation(s)
- Angela N. Brooks
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Peter S. Choi
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Luc de Waal
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Tanaz Sharifnia
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Marcin Imielinski
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Gordon Saksena
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Chandra Sekhar Pedamallu
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Andrey Sivachenko
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Mara Rosenberg
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Juliann Chmielecki
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Michael S. Lawrence
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - David S. DeLuca
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Gad Getz
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Matthew Meyerson
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Pathology, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
3118
|
Abstract
BigWig files are a compressed, indexed, binary format for genome-wide signal data for calculations (e.g. GC percent) or experiments (e.g. ChIP-seq/RNA-seq read depth). bwtool is a tool designed to read bigWig files rapidly and efficiently, providing functionality for extracting data and summarizing it in several ways, globally or at specific regions. Additionally, the tool enables the conversion of the positions of signal data from one genome assembly to another, also known as ‘lifting’. We believe bwtool can be useful for the analyst frequently working with bigWig data, which is becoming a standard format to represent functional signals along genomes. The article includes supplementary examples of running the software. Availability and implementation: The C source code is freely available under the GNU public license v3 at http://cromatina.crg.eu/bwtool. Contact:andrew.pohl@crg.eu, andypohl@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andy Pohl
- Department of Gene Regulation, Stem Cells, and Cancer, Centre for Genomic Regulation (CRG) and Department of Experimental and Health Sciences (CEXS), Universitat Pompeu Fabra, 08003 Barcelona, SpainDepartment of Gene Regulation, Stem Cells, and Cancer, Centre for Genomic Regulation (CRG) and Department of Experimental and Health Sciences (CEXS), Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Miguel Beato
- Department of Gene Regulation, Stem Cells, and Cancer, Centre for Genomic Regulation (CRG) and Department of Experimental and Health Sciences (CEXS), Universitat Pompeu Fabra, 08003 Barcelona, SpainDepartment of Gene Regulation, Stem Cells, and Cancer, Centre for Genomic Regulation (CRG) and Department of Experimental and Health Sciences (CEXS), Universitat Pompeu Fabra, 08003 Barcelona, Spain
| |
Collapse
|
3119
|
Cieślik M, Bekiranov S. Combinatorial epigenetic patterns as quantitative predictors of chromatin biology. BMC Genomics 2014; 15:76. [PMID: 24472558 PMCID: PMC3922690 DOI: 10.1186/1471-2164-15-76] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 01/15/2014] [Indexed: 01/01/2023] Open
Abstract
Background Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) is the most widely used method for characterizing the epigenetic states of chromatin on a genomic scale. With the recent availability of large genome-wide data sets, often comprising several epigenetic marks, novel approaches are required to explore functionally relevant interactions between histone modifications. Computational discovery of "chromatin states" defined by such combinatorial interactions enabled descriptive annotations of genomes, but more quantitative approaches are needed to progress towards predictive models. Results We propose non-negative matrix factorization (NMF) as a new unsupervised method to discover combinatorial patterns of epigenetic marks that frequently co-occur in subsets of genomic regions. We show that this small set of combinatorial "codes" can be effectively displayed and interpreted. NMF codes enable dimensionality reduction and have desirable statistical properties for regression and classification tasks. We demonstrate the utility of codes in the quantitative prediction of Pol2-binding and the discrimination between Pol2-bound promoters and enhancers. Finally, we show that specific codes can be linked to molecular pathways and targets of pluripotency genes during differentiation. Conclusions We have introduced and evaluated a new computational approach to represent combinatorial patterns of epigenetic marks as quantitative variables suitable for predictive modeling and supervised machine learning. To foster widespread adoption of this method we make it available as an open-source software-package – epicode at
https://github.com/mcieslik-mctp/epicode.
Collapse
Affiliation(s)
- Marcin Cieślik
- Department of Biochemistry and Molecular Genetics, University of Virginia Health System, Charlottesville, Virginia, USA.
| | | |
Collapse
|
3120
|
Sterne-Weiler T, Sanford JR. Exon identity crisis: disease-causing mutations that disrupt the splicing code. Genome Biol 2014; 15:201. [PMID: 24456648 PMCID: PMC4053859 DOI: 10.1186/gb4150] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Cis-acting RNA elements control the accurate expression of human multi-exon protein coding genes. Single nucleotide variants altering the fidelity of this regulatory code and, consequently, pre-mRNA splicing are expected to contribute to the etiology of numerous human diseases.
Collapse
|
3121
|
Hooper JE. A survey of software for genome-wide discovery of differential splicing in RNA-Seq data. Hum Genomics 2014; 8:3. [PMID: 24447644 PMCID: PMC3903050 DOI: 10.1186/1479-7364-8-3] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Accepted: 12/26/2013] [Indexed: 01/10/2023] Open
Abstract
Alternative splicing is a major contributor to cellular diversity. Therefore the identification and quantification of differentially spliced transcripts in genome-wide transcript analysis is an important consideration. Here, I review the software available for analysis of RNA-Seq data for differential splicing and discuss intrinsic challenges for differential splicing analyses. Three approaches to differential splicing analysis are described, along with their associated software implementations, their strengths, limitations, and caveats. Suggestions for future work include more extensive experimental validation to assess accuracy of the software predictions and consensus formats for outputs that would facilitate visualizations, data exchange, and downstream analyses.
Collapse
Affiliation(s)
- Joan E Hooper
- Department of Cell and Developmental Biology, University of Colorado Anschutz Medical Campus, 12801 17th Ave, rm 12103, MS 8108, PO Box 6511, Aurora, CO 80045, USA.
| |
Collapse
|
3122
|
Romero R, Tarca AL, Chaemsaithong P, Miranda J, Chaiworapongsa T, Jia H, Hassan SS, Kalita CA, Cai J, Yeo L, Lipovich L. Transcriptome interrogation of human myometrium identifies differentially expressed sense-antisense pairs of protein-coding and long non-coding RNA genes in spontaneous labor at term. J Matern Fetal Neonatal Med 2014; 27:1397-408. [PMID: 24168098 DOI: 10.3109/14767058.2013.860963] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
OBJECTIVE To identify differentially expressed long non-coding RNA (lncRNA) genes in human myometrium in women with spontaneous labor at term. MATERIALS AND METHODS Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n = 19) and women in spontaneous labor at term (n = 20). RNA was extracted and profiled using an Illumina® microarray platform. We have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. RESULTS We identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an experimental method completely independent of the microarray analysis. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site, that lacked evolutionary conservation beyond primates. CONCLUSIONS We provide, for the first time, evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term.
Collapse
Affiliation(s)
- Roberto Romero
- Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH , Bethesda, MD and Detroit, MI , USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3123
|
Kechavarzi B, Janga SC. Dissecting the expression landscape of RNA-binding proteins in human cancers. Genome Biol 2014; 15:R14. [PMID: 24410894 PMCID: PMC4053825 DOI: 10.1186/gb-2014-15-1-r14] [Citation(s) in RCA: 174] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 01/10/2014] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND RNA-binding proteins (RBPs) play important roles in cellular homeostasis by controlling gene expression at the post-transcriptional level. RESULTS We explore the expression of more than 800 RBPs in sixteen healthy human tissues and their patterns of dysregulation in cancer genomes from The Cancer Genome Atlas project. We show that genes encoding RBPs are consistently and significantly highly expressed compared with other classes of genes, including those encoding regulatory components such as transcription factors, miRNAs and long non-coding RNAs. We also demonstrate that a set of RBPs, numbering approximately 30, are strongly upregulated (SUR) across at least two-thirds of the nine cancers profiled in this study. Analysis of the protein-protein interaction network properties for the SUR and non-SUR groups of RBPs suggests that path length distributions between SUR RBPs is significantly lower than those observed for non-SUR RBPs. We further find that the mean path lengths between SUR RBPs increases in proportion to their contribution to prognostic impact. We also note that RBPs exhibiting higher variability in the extent of dysregulation across breast cancer patients have a higher number of protein-protein interactions. We propose that fluctuating RBP levels might result in an increase in non-specific protein interactions, potentially leading to changes in the functional consequences of RBP binding. Finally, we show that the expression variation of a gene within a patient group is inversely correlated with prognostic impact. CONCLUSIONS Overall, our results provide a roadmap for understanding the impact of RBPs on cancer pathogenesis.
Collapse
Affiliation(s)
- Bobak Kechavarzi
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University – Purdue University, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, IN 46202, USA
| | - Sarath Chandra Janga
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University – Purdue University, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), 410 West 10th Street, Indianapolis, IN 46202, USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Medical Research and Library Building, 975 West Walnut Street, Indianapolis, IN 46202, USA
| |
Collapse
|
3124
|
van Heesch S, van Iterson M, Jacobi J, Boymans S, Essers PB, de Bruijn E, Hao W, MacInnes AW, Cuppen E, Simonis M. Extensive localization of long noncoding RNAs to the cytosol and mono- and polyribosomal complexes. Genome Biol 2014; 15:R6. [PMID: 24393600 PMCID: PMC4053777 DOI: 10.1186/gb-2014-15-1-r6] [Citation(s) in RCA: 276] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 01/07/2014] [Indexed: 02/04/2023] Open
Abstract
Background Long noncoding RNAs (lncRNAs) form an abundant class of transcripts, but the function of the majority of them remains elusive. While it has been shown that some lncRNAs are bound by ribosomes, it has also been convincingly demonstrated that these transcripts do not code for proteins. To obtain a comprehensive understanding of the extent to which lncRNAs bind ribosomes, we performed systematic RNA sequencing on ribosome-associated RNA pools obtained through ribosomal fractionation and compared the RNA content with nuclear and (non-ribosome bound) cytosolic RNA pools. Results The RNA composition of the subcellular fractions differs significantly from each other, but lncRNAs are found in all locations. A subset of specific lncRNAs is enriched in the nucleus but surprisingly the majority is enriched in the cytosol and in ribosomal fractions. The ribosomal enriched lncRNAs include H19 and TUG1. Conclusions Most studies on lncRNAs have focused on the regulatory function of these transcripts in the nucleus. We demonstrate that only a minority of all lncRNAs are nuclear enriched. Our findings suggest that many lncRNAs may have a function in cytoplasmic processes, and in particular in ribosome complexes.
Collapse
|
3125
|
Klein HU, Schäfer M, Porse BT, Hasemann MS, Ickstadt K, Dugas M. Integrative analysis of histone ChIP-seq and transcription data using Bayesian mixture models. ACTA ACUST UNITED AC 2014; 30:1154-1162. [PMID: 24403540 DOI: 10.1093/bioinformatics/btu003] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 12/30/2013] [Indexed: 01/08/2023]
Abstract
MOTIVATION Histone modifications are a key epigenetic mechanism to activate or repress the transcription of genes. Datasets of matched transcription data and histone modification data obtained by ChIP-seq exist, but methods for integrative analysis of both data types are still rare. Here, we present a novel bioinformatics approach to detect genes that show different transcript abundances between two conditions putatively caused by alterations in histone modification. RESULTS We introduce a correlation measure for integrative analysis of ChIP-seq and gene transcription data measured by RNA sequencing or microarrays and demonstrate that a proper normalization of ChIP-seq data is crucial. We suggest applying Bayesian mixture models of different types of distributions to further study the distribution of the correlation measure. The implicit classification of the mixture models is used to detect genes with differences between two conditions in both gene transcription and histone modification. The method is applied to different datasets, and its superiority to a naive separate analysis of both data types is demonstrated. AVAILABILITY AND IMPLEMENTATION R/Bioconductor package epigenomix. CONTACT h.klein@uni-muenster.de Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hans-Ulrich Klein
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Martin Schäfer
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Bo T Porse
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Marie S Hasemann
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Katja Ickstadt
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Martin Dugas
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| |
Collapse
|
3126
|
Sun K, Zhao Y, Wang H, Sun H. Sebnif: an integrated bioinformatics pipeline for the identification of novel large intergenic noncoding RNAs (lincRNAs)--application in human skeletal muscle cells. PLoS One 2014; 9:e84500. [PMID: 24400097 PMCID: PMC3882232 DOI: 10.1371/journal.pone.0084500] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 11/15/2013] [Indexed: 11/19/2022] Open
Abstract
Ab initio assembly of transcriptome sequencing data has been widely used to identify large intergenic non-coding RNAs (lincRNAs), a novel class of gene regulators involved in many biological processes. To differentiate real lincRNA transcripts from thousands of assembly artifacts, a series of filtering steps such as filters of transcript length, expression level and coding potential, need to be applied. However, an easy-to-use and publicly available bioinformatics pipeline that integrates these filters is not yet available. Hence, we implemented sebnif, an integrative bioinformatics pipeline to facilitate the discovery of bona fide novel lincRNAs that are suitable for further functional characterization. Specifically, sebnif is the only pipeline that implements an algorithm for identifying high-quality single-exonic lincRNAs that were often omitted in many studies. To demonstrate the usage of sebnif, we applied it on a real biological RNA-seq dataset from Human Skeletal Muscle Cells (HSkMC) and built a novel lincRNA catalog containing 917 highly reliable lincRNAs. Sebnif is available at http://sunlab.lihs.cuhk.edu.hk/sebnif/.
Collapse
Affiliation(s)
- Kun Sun
- Department of Chemical Pathology, Li Ka Shing Institute of Health Sciences, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yu Zhao
- Department of Obstetrics and Gynaecology, Li Ka Shing Institute of Health Sciences, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Huating Wang
- Department of Obstetrics and Gynaecology, Li Ka Shing Institute of Health Sciences, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Hao Sun
- Department of Chemical Pathology, Li Ka Shing Institute of Health Sciences, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, China
- * E-mail: .
| |
Collapse
|
3127
|
Sims D, Ilott NE, Sansom SN, Sudbery IM, Johnson JS, Fawcett KA, Berlanga-Taylor AJ, Luna-Valero S, Ponting CP, Heger A. CGAT: computational genomics analysis toolkit. Bioinformatics 2014; 30:1290-1. [PMID: 24395753 PMCID: PMC3998125 DOI: 10.1093/bioinformatics/btt756] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Computational genomics seeks to draw biological inferences from genomic datasets, often by integrating and contextualizing next-generation sequencing data. CGAT provides an extensive suite of tools designed to assist in the analysis of genome scale data from a range of standard file formats. The toolkit enables filtering, comparison, conversion, summarization and annotation of genomic intervals, gene sets and sequences. The tools can both be run from the Unix command line and installed into visual workflow builders, such as Galaxy.
Collapse
Affiliation(s)
- David Sims
- CGAT, MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, Parks Road, Oxford OX1 3PT, UK
| | | | | | | | | | | | | | | | | | | |
Collapse
|
3128
|
Cirillo D, Marchese D, Agostini F, Livi CM, Botta-Orfila T, Tartaglia GG. Constitutive patterns of gene expression regulated by RNA-binding proteins. Genome Biol 2014; 15:R13. [PMID: 24401680 PMCID: PMC4054784 DOI: 10.1186/gb-2014-15-1-r13] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Accepted: 01/02/2014] [Indexed: 02/04/2023] Open
Abstract
Background RNA-binding proteins regulate a number of cellular processes, including synthesis, folding, translocation, assembly and clearance of RNAs. Recent studies have reported that an unexpectedly large number of proteins are able to interact with RNA, but the partners of many RNA-binding proteins are still uncharacterized. Results We combined prediction of ribonucleoprotein interactions, based on catRAPID calculations, with analysis of protein and RNA expression profiles from human tissues. We found strong interaction propensities for both positively and negatively correlated expression patterns. Our integration of in silico and ex vivo data unraveled two major types of protein–RNA interactions, with positively correlated patterns related to cell cycle control and negatively correlated patterns related to survival, growth and differentiation. To facilitate the investigation of protein–RNA interactions and expression networks, we developed the catRAPID express web server. Conclusions Our analysis sheds light on the role of RNA-binding proteins in regulating proliferation and differentiation processes, and we provide a data exploration tool to aid future experimental studies.
Collapse
|
3129
|
Zheng CL, Kawane S, Bottomly D, Wilmot B. Analysis considerations for utilizing RNA-Seq to characterize the brain transcriptome. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2014; 116:21-54. [PMID: 25172470 DOI: 10.1016/b978-0-12-801105-8.00002-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
RNA-Seq allows one to examine only gene expression as well as expression of noncoding RNAs, alternative splicing, and allele-specific expression. With this increased sensitivity and dynamic range, there are computational and statistical considerations that need to be contemplated, which are highly dependent on the biological question being asked. We highlight these to provide an overview of their importance and the impact they can have on downstream interpretation of the brain transcriptome.
Collapse
Affiliation(s)
- Christina L Zheng
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon, USA; Knight Cancer Institute, Oregon Health, Oregon Health and Science University, Portland, Oregon, USA.
| | - Sunita Kawane
- Clinical & Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
| | - Daniel Bottomly
- Clinical & Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
| | - Beth Wilmot
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon, USA; Clinical & Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
| |
Collapse
|
3130
|
Zhang J, Zhang P, Wang L, Piao HL, Ma L. Long non-coding RNA HOTAIR in carcinogenesis and metastasis. Acta Biochim Biophys Sin (Shanghai) 2014; 46:1-5. [PMID: 24165275 DOI: 10.1093/abbs/gmt117] [Citation(s) in RCA: 139] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) have gained massive attention in recent years as a potentially new and crucial layer of gene regulation. LncRNAs are prevalently transcribed in the genome, but their roles in gene regulation and disease development are largely unknown. HOX antisense intergenic RNA (HOTAIR), a lncRNA located in the HOXC locus, has been shown to repress HOXD gene expression and promote breast cancer metastasis. Mechanistically, HOTAIR interacts with and recruits polycomb repressive complex 2 (PRC2) and regulates chromosome occupancy by EZH2 (a subunit of PRC2), which leads to histone H3 lysine 27 trimethylation of the HOXD locus. Moreover, HOTAIR is pervasively overexpressed in most human cancers compared with noncancerous adjacent tissues. This review summarizes the studies on the HOTAIR lncRNA over the past 6 years.
Collapse
Affiliation(s)
- Jinsong Zhang
- Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | | | | | | | | |
Collapse
|
3131
|
Genomics of alternative splicing: evolution, development and pathophysiology. Hum Genet 2014; 133:679-87. [PMID: 24378600 DOI: 10.1007/s00439-013-1411-3] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Accepted: 12/15/2013] [Indexed: 12/11/2022]
Abstract
Alternative splicing is a major cellular mechanism in metazoans for generating proteomic diversity. A large proportion of protein-coding genes in multicellular organisms undergo alternative splicing, and in humans, it has been estimated that nearly 90 % of protein-coding genes-much larger than expected-are subject to alternative splicing. Genomic analyses of alternative splicing have illuminated its universal role in shaping the evolution of genomes, in the control of developmental processes, and in the dynamic regulation of the transcriptome to influence phenotype. Disruption of the splicing machinery has been found to drive pathophysiology, and indeed reprogramming of aberrant splicing can provide novel approaches to the development of molecular therapy. This review focuses on the recent progress in our understanding of alternative splicing brought about by the unprecedented explosive growth of genomic data and highlights the relevance of human splicing variation on disease and therapy.
Collapse
|
3132
|
Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hinrichs AS, Learned K, Lee BT, Li CH, Raney BJ, Rhead B, Rosenbloom KR, Sloan CA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res 2014; 42:D764-70. [PMID: 24270787 PMCID: PMC3964947 DOI: 10.1093/nar/gkt1168] [Citation(s) in RCA: 550] [Impact Index Per Article: 55.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Revised: 10/30/2013] [Accepted: 10/30/2013] [Indexed: 12/17/2022] Open
Abstract
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser's web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation 'tracks' for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany.
Collapse
Affiliation(s)
- Donna Karolchik
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Galt P. Barber
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Melissa S. Cline
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Timothy R. Dreszer
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Pauline A. Fujita
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Rachel A. Harte
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Angie S. Hinrichs
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian T. Lee
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Chin H. Li
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian J. Raney
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brooke Rhead
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Kate R. Rosenbloom
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Cricket A. Sloan
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Matthew L. Speir
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ann S. Zweig
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Robert M. Kuhn
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - W. James Kent
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| |
Collapse
|
3133
|
Ucciferri N, Rocchiccioli S. Proteomics techniques for the detection of translated pseudogenes. Methods Mol Biol 2014; 1167:187-95. [PMID: 24823778 DOI: 10.1007/978-1-4939-0835-6_12] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Increasing evidence indicates that pseudogenes can reach the translational process. Translated pseudogene products have in fact been found in various organisms, confuting the original definition of pseudogenes as genes without any coding potential. Proteomics is the main technology allowing the study of proteins and, when integrated with genomics, is defined as proteogenomics. In proteogenomics, the peptide-genome alignment drives the identification and annotation of gene products and allows for a better understanding of their function. In this chapter, we give a brief overview of the proteomic techniques applied to pseudogenes. In particular, we discuss peptide spectrum acquisition, mass data analysis, and genome database matching.
Collapse
Affiliation(s)
- Nadia Ucciferri
- CNR, Institute of Clinical Physiology, Via Moruzzi 1, 56124, Pisa, Italy
| | | |
Collapse
|
3134
|
Cirillo D, Livi CM, Agostini F, Tartaglia GG. Discovery of protein–RNA networks. ACTA ACUST UNITED AC 2014; 10:1632-42. [DOI: 10.1039/c4mb00099d] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
We review the latest advances and future challenges in experimental and computational investigation of protein–RNA networks.
Collapse
Affiliation(s)
- Davide Cirillo
- Gene Function and Evolution
- Centre for Genomic Regulation (CRG)
- 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF)
- 08003 Barcelona, Spain
| | - Carmen Maria Livi
- Gene Function and Evolution
- Centre for Genomic Regulation (CRG)
- 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF)
- 08003 Barcelona, Spain
| | - Federico Agostini
- Gene Function and Evolution
- Centre for Genomic Regulation (CRG)
- 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF)
- 08003 Barcelona, Spain
| | - Gian Gaetano Tartaglia
- Gene Function and Evolution
- Centre for Genomic Regulation (CRG)
- 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF)
- 08003 Barcelona, Spain
| |
Collapse
|
3135
|
Zaghlool A, Ameur A, Cavelier L, Feuk L. Splicing in the human brain. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2014; 116:95-125. [PMID: 25172473 DOI: 10.1016/b978-0-12-801105-8.00005-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has become increasingly clear over the past decade that RNA has important functions in human cells beyond its role as an intermediate translator of DNA to protein. It is now known that RNA plays highly specific roles in pathways involved in regulatory, structural, and catalytic functions. The complexity of RNA production and regulation has become evident with the advent of high-throughput methods to study the transcriptome. Deep sequencing has revealed an enormous diversity of RNA types and transcript isoforms in human cells. The transcriptome of the human brain is particularly interesting as it contains more expressed genes than other tissues and also displays an extreme diversity of transcript isoforms, indicating that highly complex regulatory pathways are present in the brain. Several of these regulatory proteins are now identified, including RNA-binding proteins that are neuron specific. RNA-binding proteins also play important roles in regulating the splicing process and the temporal and spatial isoform production. While significant progress has been made in understanding the human transcriptome, many questions still remain regarding the basic mechanisms of splicing and subcellular localization of RNA. A long-standing question is to what extent the splicing of pre-mRNA is cotranscriptional and posttranscriptional, respectively. Recent data, including studies of the human brain, indicate that splicing is primarily cotranscriptional in human cells. This chapter describes the current understanding of splicing and splicing regulation in the human brain and discusses the recent global sequence-based analyses of transcription and splicing.
Collapse
Affiliation(s)
- Ammar Zaghlool
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden; Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden; Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Lucia Cavelier
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden; Uppsala University Hospital, Uppsala, Sweden
| | - Lars Feuk
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden; Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
3136
|
Eteleeb AM, Flight RM, Harrison BJ, Petruska JC, Rouchka EC. An Island-Based Approach for Differential Expression Analysis. 2013 ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICAL INFORMATICS : ACM - BCB 2013 : WASHINGTON, D.C., U.S.A., SEPTEMBER 22 - 25, 2013. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICAL INFORMA... 2013; 2013:419-429. [PMID: 25632406 DOI: 10.1145/2506583.2506589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
High-throughput mRNA sequencing (also known as RNA-Seq) promises to be the technique of choice for studying transcriptome profiles. This technique provides the ability to develop precise methodologies for transcript and gene expression quantification, novel transcript and exon discovery, and splice variant detection. One of the limitations of current RNA-Seq methods is the dependency on annotated biological features (e.g. exons, transcripts, genes) to detect expression differences across samples. This forces the identification of expression levels and the detection of significant changes to known genomic regions. Any significant changes that occur in unannotated regions will not be captured. To overcome this limitation, we developed a novel segmentation approach, Island-Based (IB), for analyzing differential expression in RNA-Seq and targeted sequencing (exome capture) data without specific knowledge of an isoform. The IB segmentation determines individual islands of expression based on windowed read counts that can be compared across experimental conditions to determine differential island expression. In order to detect differentially expressed genes, the significance of islands (p-values) are combined using Fisher's method. We tested and evaluated the performance of our approach by comparing it to the existing differentially expressed gene (DEG) methods: CuffDiff, DESeq, and edgeR using two benchmark MAQC RNA-Seq datasets. The IB algorithm outperforms all three methods in both datasets as illustrated by an increased auROC.
Collapse
Affiliation(s)
- Abdallah M Eteleeb
- Department of Computer, Engineering and Computer, Science, University of Louisville, Louisville, KY, USA,
| | - Robert M Flight
- Department of Chemistry, University of Louisville, Louisville, KY, USA,
| | - Benjamin J Harrison
- Department of Anatomical, Sciences and Neurobiology, University of Louisville, Louisville, KY, USA,
| | - Jeffrey C Petruska
- Department of Anatomical, Sciences and Neurobiology, University of Louisville, Louisville, KY, USA,
| | - Eric C Rouchka
- Department of Computer, Engineering and Computer, Science, University of Louisville, Louisville, KY, USA,
| |
Collapse
|
3137
|
Small silencing non-coding RNAs: cancer connections and significance. Mol Oncol 2013. [DOI: 10.1017/cbo9781139046947.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
3138
|
Comparative genomic analysis of eutherian ribonuclease A genes. Mol Genet Genomics 2013; 289:161-7. [DOI: 10.1007/s00438-013-0801-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 12/04/2013] [Indexed: 01/25/2023]
|
3139
|
Kheradpour P, Kellis M. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res 2013; 42:2976-87. [PMID: 24335146 PMCID: PMC3950668 DOI: 10.1093/nar/gkt1249] [Citation(s) in RCA: 313] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Recent advances in technology have led to a dramatic increase in the number of available transcription factor ChIP-seq and ChIP-chip data sets. Understanding the motif content of these data sets is an important step in understanding the underlying mechanisms of regulation. Here we provide a systematic motif analysis for 427 human ChIP-seq data sets using motifs curated from the literature and also discovered de novo using five established motif discovery tools. We use a systematic pipeline for calculating motif enrichment in each data set, providing a principled way for choosing between motif variants found in the literature and for flagging potentially problematic data sets. Our analysis confirms the known specificity of 41 of the 56 analyzed factor groups and reveals motifs of potential cofactors. We also use cell type-specific binding to find factors active in specific conditions. The resource we provide is accessible both for browsing a small number of factors and for performing large-scale systematic analyses. We provide motif matrices, instances and enrichments in each of the ENCODE data sets. The motifs discovered here have been used in parallel studies to validate the specificity of antibodies, understand cooperativity between data sets and measure the variation of motif binding across individuals and species.
Collapse
Affiliation(s)
- Pouya Kheradpour
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA 02139, USA and Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02139, USA
| | | |
Collapse
|
3140
|
Marchese FP, Huarte M. Long non-coding RNAs and chromatin modifiers: their place in the epigenetic code. Epigenetics 2013; 9:21-6. [PMID: 24335342 DOI: 10.4161/epi.27472] [Citation(s) in RCA: 142] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The emergence of long non-coding RNAs (lncRNAs) has shaken up our conception of gene expression regulation, as lncRNAs take prominent positions as components of cellular networks. Several cellular processes involve lncRNAs, and a significant number of them have been shown to function in cooperation with chromatin modifying enzymes to promote epigenetic activation or silencing of gene expression. Different model mechanisms have been proposed to explain how lncRNAs achieve regulation of gene expression by interacting with the epigenetic machinery. Here we describe these models in light of the current knowledge of lncRNAs, such as Xist and HOTAIR, and discuss recent literature on the role of the three-dimensional structure of the genome in the mechanism of action of lncRNAs and chromatin modifiers.
Collapse
Affiliation(s)
| | - Maite Huarte
- Center for Applied Medical Research; University of Navarra; Pamplona, Spain
| |
Collapse
|
3141
|
Computational approaches to identify functional genetic variants in cancer genomes. Nat Methods 2013; 10:723-9. [PMID: 23900255 DOI: 10.1038/nmeth.2562] [Citation(s) in RCA: 127] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Accepted: 06/07/2013] [Indexed: 12/13/2022]
Abstract
The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor but only a minority of these drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.
Collapse
|
3142
|
Harrow JL, Steward CA, Frankish A, Gilbert JG, Gonzalez JM, Loveland JE, Mudge J, Sheppard D, Thomas M, Trevanion S, Wilming LG. The Vertebrate Genome Annotation browser 10 years on. Nucleic Acids Res 2013; 42:D771-9. [PMID: 24316575 PMCID: PMC3964964 DOI: 10.1093/nar/gkt1241] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The Vertebrate Genome Annotation (VEGA) database (http://vega.sanger.ac.uk), initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).
Collapse
Affiliation(s)
- Jennifer L Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3143
|
Popadin K, Gutierrez-Arcelus M, Dermitzakis ET, Antonarakis SE. Genetic and epigenetic regulation of human lincRNA gene expression. Am J Hum Genet 2013; 93:1015-26. [PMID: 24268656 DOI: 10.1016/j.ajhg.2013.10.022] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Revised: 10/11/2013] [Accepted: 10/21/2013] [Indexed: 02/06/2023] Open
Abstract
Large intergenic noncoding RNAs (lincRNAs) are still poorly functionally characterized. We analyzed the genetic and epigenetic regulation of human lincRNA expression in the GenCord collection by using three cell types from 195 unrelated European individuals. We detected a considerable number of cis expression quantitative trait loci (cis-eQTLs) and demonstrated that the genetic regulation of lincRNA expression is independent of the regulation of neighboring protein-coding genes. lincRNAs have relatively more cis-eQTLs than do equally expressed protein-coding genes with the same exon number. lincRNA cis-eQTLs are located closer to transcription start sites (TSSs) and their effect sizes are higher than cis-eQTLs found for protein-coding genes, suggesting that lincRNA expression levels are less constrained than that of protein-coding genes. Additionally, lincRNA cis-eQTLs can influence the expression level of nearby protein-coding genes and thus could be considered as QTLs for enhancer activity. Enrichment of expressed lincRNA promoters in enhancer marks provides an additional argument for the involvement of lincRNAs in the regulation of transcription in cis. By investigating the epigenetic regulation of lincRNAs, we observed both positive and negative correlations between DNA methylation and gene expression (expression quantitative trait methylation [eQTMs]), as expected, and found that the landscapes of passive and active roles of DNA methylation in gene regulation are similar to protein-coding genes. However, lincRNA eQTMs are located closer to TSSs than are protein-coding gene eQTMs. These similarities and differences in genetic and epigenetic regulation between lincRNAs and protein-coding genes contribute to the elucidation of potential functions of lincRNAs.
Collapse
Affiliation(s)
- Konstantin Popadin
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1 rue Michel-Servet, 1211 Geneva, Switzerland; Institute of Genetics and Genomics in Geneva (iGE3), 1211 Geneva, Switzerland; Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow 127994, Russia
| | | | | | | |
Collapse
|
3144
|
Marinov GK, Williams BA, McCue K, Schroth GP, Gertz J, Myers RM, Wold BJ. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res 2013; 24:496-510. [PMID: 24299736 PMCID: PMC3941114 DOI: 10.1101/gr.161034.113] [Citation(s) in RCA: 393] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Single-cell RNA-seq mammalian transcriptome studies are at an early stage in uncovering cell-to-cell variation in gene expression, transcript processing and editing, and regulatory module activity. Despite great progress recently, substantial challenges remain, including discriminating biological variation from technical noise. Here we apply the SMART-seq single-cell RNA-seq protocol to study the reference lymphoblastoid cell line GM12878. By using spike-in quantification standards, we estimate the absolute number of RNA molecules per cell for each gene and find significant variation in total mRNA content: between 50,000 and 300,000 transcripts per cell. We directly measure technical stochasticity by a pool/split design and find that there are significant differences in expression between individual cells, over and above technical variation. Specific gene coexpression modules were preferentially expressed in subsets of individual cells, including one enriched for mRNA processing and splicing factors. We assess cell-to-cell variation in alternative splicing and allelic bias and report evidence of significant differences in splice site usage that exceed splice variation in the pool/split comparison. Finally, we show that transcriptomes from small pools of 30–100 cells approach the information content and reproducibility of contemporary RNA-seq from large amounts of input material. Together, our results define an experimental and computational path forward for analyzing gene expression in rare cell types and cell states.
Collapse
Affiliation(s)
- Georgi K Marinov
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | | | | | | | | | | | | |
Collapse
|
3145
|
Tzeng DTW, Tseng YT, Ung M, Liao IE, Liu CC, Cheng C. DPRP: a database of phenotype-specific regulatory programs derived from transcription factor binding data. Nucleic Acids Res 2013; 42:D178-83. [PMID: 24302579 PMCID: PMC3965116 DOI: 10.1093/nar/gkt1254] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Gene expression profiling has been extensively used in the past decades, resulting in an enormous amount of expression data available in public databases. These data sets are informative in elucidating transcriptional regulation of genes underlying various biological and clinical conditions. However, it is usually difficult to identify transcription factors (TFs) responsible for gene expression changes directly from their own expression, as TF activity is often regulated at the posttranscriptional level. In recent years, technical advances have made it possible to systematically determine the target genes of TFs by ChIP-seq experiments. To identify the regulatory programs underlying gene expression profiles, we constructed a database of phenotype-specific regulatory programs (DPRP, http://syslab.nchu.edu.tw/DPRP/) derived from the integrative analysis of TF binding data and gene expression data. DPRP provides three methods: the Fisher’s Exact Test, the Kolmogorov–Smirnov test and the BASE algorithm to facilitate the application of gene expression data for generating new hypotheses on transcriptional regulatory programs in biological and clinical studies.
Collapse
Affiliation(s)
- David T W Tzeng
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung 402, Taiwan, Department of Computer Science and Engineering, National Chung Hsing University, Taichung 402, Taiwan, Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH, USA, Agricultural Biotechnology Center, National Chung Hsing University, Taichung 402, Taiwan, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA and Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | | | | | | | | | | |
Collapse
|
3146
|
Li JH, Liu S, Zhou H, Qu LH, Yang JH. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res 2013; 42:D92-7. [PMID: 24297251 PMCID: PMC3964941 DOI: 10.1093/nar/gkt1248] [Citation(s) in RCA: 3663] [Impact Index Per Article: 333.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Although microRNAs (miRNAs), other non-coding RNAs (ncRNAs) (e.g. lncRNAs, pseudogenes and circRNAs) and competing endogenous RNAs (ceRNAs) have been implicated in cell-fate determination and in various human diseases, surprisingly little is known about the regulatory interaction networks among the multiple classes of RNAs. In this study, we developed starBase v2.0 (http://starbase.sysu.edu.cn/) to systematically identify the RNA–RNA and protein–RNA interaction networks from 108 CLIP-Seq (PAR-CLIP, HITS-CLIP, iCLIP, CLASH) data sets generated by 37 independent studies. By analyzing millions of RNA-binding protein binding sites, we identified ∼9000 miRNA-circRNA, 16 000 miRNA-pseudogene and 285 000 protein–RNA regulatory relationships. Moreover, starBase v2.0 has been updated to provide the most comprehensive CLIP-Seq experimentally supported miRNA-mRNA and miRNA-lncRNA interaction networks to date. We identified ∼10 000 ceRNA pairs from CLIP-supported miRNA target sites. By combining 13 functional genomic annotations, we developed miRFunction and ceRNAFunction web servers to predict the function of miRNAs and other ncRNAs from the miRNA-mediated regulatory networks. Finally, we developed interactive web implementations to provide visualization, analysis and downloading of the aforementioned large-scale data sets. This study will greatly expand our understanding of ncRNA functions and their coordinated regulatory networks.
Collapse
Affiliation(s)
- Jun-Hao Li
- RNA Information Center, Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou 510275, PR China
| | | | | | | | | |
Collapse
|
3147
|
Steijger T, Abril JF, Engström PG, Kokocinski F, Hubbard TJ, Guigó R, Harrow J, Bertone P. Assessment of transcript reconstruction methods for RNA-seq. Nat Methods 2013; 10:1177-84. [PMID: 24185837 PMCID: PMC3851240 DOI: 10.1038/nmeth.2714] [Citation(s) in RCA: 454] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2013] [Accepted: 09/23/2013] [Indexed: 11/09/2022]
Abstract
We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.
Collapse
Affiliation(s)
- Tamara Steijger
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Josep F Abril
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Pär G Engström
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | | | | | - Roderic Guigó
- Center for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | | | - Paul Bertone
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| |
Collapse
|
3148
|
Jefferson OA, Köllhofer D, Ehrich TH, Jefferson RA. Transparency tools in gene patenting for informing policy and practice. Nat Biotechnol 2013; 31:1086-93. [PMID: 24316644 PMCID: PMC7416664 DOI: 10.1038/nbt.2755] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The Supreme Court's decision in Myriad highlights the need for tools enabling nuanced and precise analysis of gene patents at the global level.
Collapse
Affiliation(s)
- Osmat A Jefferson
- Osmat A. Jefferson, Deniz Köllhofer, Thomas H. Ehrich and Richard A. Jefferson are at Cambia, Canberra, Australia, and Queensland University of Technology, Brisbane, Australia.,
| | - Deniz Köllhofer
- Osmat A. Jefferson, Deniz Köllhofer, Thomas H. Ehrich and Richard A. Jefferson are at Cambia, Canberra, Australia, and Queensland University of Technology, Brisbane, Australia.,
| | - Thomas H Ehrich
- Osmat A. Jefferson, Deniz Köllhofer, Thomas H. Ehrich and Richard A. Jefferson are at Cambia, Canberra, Australia, and Queensland University of Technology, Brisbane, Australia.,
| | - Richard A Jefferson
- Osmat A. Jefferson, Deniz Köllhofer, Thomas H. Ehrich and Richard A. Jefferson are at Cambia, Canberra, Australia, and Queensland University of Technology, Brisbane, Australia.,
| |
Collapse
|
3149
|
Marques AC, Hughes J, Graham B, Kowalczyk MS, Higgs DR, Ponting CP. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol 2013; 14:R131. [PMID: 24289259 PMCID: PMC4054604 DOI: 10.1186/gb-2013-14-11-r131] [Citation(s) in RCA: 145] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Accepted: 11/29/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mammalian transcriptomes contain thousands of long noncoding RNAs (lncRNAs). Some lncRNAs originate from intragenic enhancers which, when active, behave as alternative promoters producing transcripts that are processed using the canonical signals of their host gene. We have followed up this observation by analyzing intergenic lncRNAs to determine the extent to which they might also originate from intergenic enhancers. RESULTS We integrated high-resolution maps of transcriptional initiation and transcription to annotate a conservative set of intergenic lncRNAs expressed in mouse erythroblasts. We subclassified intergenic lncRNAs according to chromatin status at transcriptional initiation regions, defined by relative levels of histone H3K4 mono- and trimethylation. These transcripts are almost evenly divided between those arising from enhancer-associated (elncRNA) or promoter-associated (plncRNA) elements. These two classes of 5' capped and polyadenylated RNA transcripts are indistinguishable with regard to their length, number of exons or transcriptional orientation relative to their closest neighboring gene. Nevertheless, elncRNAs are more tissue-restricted, less highly expressed and less well conserved during evolution. Of considerable interest, we found that expression of elncRNAs, but not plncRNAs, is associated with enhanced expression of neighboring protein-coding genes during erythropoiesis. CONCLUSIONS We have determined globally the sites of initiation of intergenic lncRNAs in erythroid cells, allowing us to distinguish two similarly abundant classes of transcripts. Different correlations between the levels of elncRNAs, plncRNAs and expression of neighboring genes suggest that functional lncRNAs from the two classes may play contrasting roles in regulating the transcript abundance of local or distal loci.
Collapse
|
3150
|
MacArthur JAL, Morales J, Tully RE, Astashyn A, Gil L, Bruford EA, Larsson P, Flicek P, Dalgleish R, Maglott DR, Cunningham F. Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants. Nucleic Acids Res 2013; 42:D873-8. [PMID: 24285302 PMCID: PMC3965024 DOI: 10.1093/nar/gkt1198] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Locus Reference Genomic (LRG; http://www.lrg-sequence.org/) records contain internationally recognized stable reference sequences designed specifically for reporting clinically relevant sequence variants. Each LRG is contained within a single file consisting of a stable ‘fixed’ section and a regularly updated ‘updatable’ section. The fixed section contains stable genomic DNA sequence for a genomic region, essential transcripts and proteins for variant reporting and an exon numbering system. The updatable section contains mapping information, annotation of all transcripts and overlapping genes in the region and legacy exon and amino acid numbering systems. LRGs provide a stable framework that is vital for reporting variants, according to Human Genome Variation Society (HGVS) conventions, in genomic DNA, transcript or protein coordinates. To enable translation of information between LRG and genomic coordinates, LRGs include mapping to the human genome assembly. LRGs are compiled and maintained by the National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). LRG reference sequences are selected in collaboration with the diagnostic and research communities, locus-specific database curators and mutation consortia. Currently >700 LRGs have been created, of which >400 are publicly available. The aim is to create an LRG for every locus with clinical implications.
Collapse
Affiliation(s)
- Jacqueline A L MacArthur
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, National Center for Biotechnology Information, Bethesda, MD 20894, USA, and Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|