1
|
Tung KF, Pan CY, Lin WC. Housekeeping protein-coding genes interrogated with tissue and individual variations. Sci Rep 2024; 14:12454. [PMID: 38816574 PMCID: PMC11139953 DOI: 10.1038/s41598-024-63269-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 05/27/2024] [Indexed: 06/01/2024] Open
Abstract
Housekeeping protein-coding genes are stably expressed genes in cells and tissues that are thought to be engaged in fundamental cellular biological functions. They are often utilized as normalization references in molecular biology research and are especially important in integrated bioinformatic investigations. Prior studies have examined human housekeeping protein-coding genes by analyzing various gene expression datasets. The inclusion of different tissue types significantly impacted the discovery of housekeeping genes. In this report, we investigated particularly individual human subject expression differences in protein-coding genes across different tissue types. We used GTEx V8 gene expression datasets obtained from more than 16,000 human normal tissue samples. Furthermore, the Gini index is utilized to investigate the expression variations of protein-coding genes between tissue and individual donor subjects. Housekeeping protein-coding genes found using Gini index profiles may vary depending on the tissue subtypes investigated, particularly given the diverse sample size collections across the GTEx tissue subtypes. We subsequently selected major tissues and identified subsets of housekeeping genes with stable expression levels among human donors within those tissues. In this work, we provide alternative sets of housekeeping protein-coding genes that show more consistent expression patterns in human subjects across major solid organs. Weblink: https://hpsv.ibms.sinica.edu.tw .
Collapse
Affiliation(s)
- Kuo-Feng Tung
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan, R.O.C
| | - Chao-Yu Pan
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan, R.O.C
| | - Wen-Chang Lin
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan, R.O.C..
| |
Collapse
|
2
|
Cao H, Kapranov P. Methods to Analyze the Non-Coding RNA Interactome—Recent Advances and Challenges. Front Genet 2022; 13:857759. [PMID: 35368711 PMCID: PMC8969105 DOI: 10.3389/fgene.2022.857759] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 02/15/2022] [Indexed: 12/03/2022] Open
Abstract
Most of the human genome is transcribed to generate a multitude of non-coding RNAs. However, while these transcripts have generated an immense amount of scientific interest, their biological function remains a subject of an intense debate. Understanding mechanisms of action of non-coding RNAs is a key to addressing the issue of biological relevance of these transcripts. Based on some well-understood non-coding RNAs that function inside the cell by interacting with other molecules, it is generally believed many other non-coding transcripts could also function in a similar fashion. Therefore, development of methods that can map RNA interactome is the key to understanding functionality of the extensive cellular non-coding transcriptome. Here, we review the vast progress that has been made in the past decade in technologies that can map RNA interactions with different sites in DNA, proteins or other RNA molecules; the general approaches used to validate the existence of novel interactions; and the challenges posed by interpreting the data obtained using the interactome mapping methods.
Collapse
|
3
|
Wang T, Li Z, Yan L, Yan F, Shen H, Tian X. Long Non-Coding RNA Neighbor of BRCA1 Gene 2: A Crucial Regulator in Cancer Biology. Front Oncol 2021; 11:783526. [PMID: 34926299 PMCID: PMC8674783 DOI: 10.3389/fonc.2021.783526] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are involved in fundamental biochemical and cellular processes. The neighbor of BRCA1 gene 2 (NBR2) is a long intergenic non-coding RNA (lincRNA) whose gene locus is adjacent to the tumor suppressor gene breast cancer susceptibility gene 1 (BRCA1). In human cancers, NBR2 expression is dysregulated and correlates with clinical outcomes. Moreover, NBR2 is crucial for glucose metabolism and affects the proliferation, survival, metastasis, and therapeutic resistance in different types of cancer. Here, we review the precise molecular mechanisms underlying NBR2-induced changes in cancer. In addition, the potential application of NBR2 in the diagnosis and treatment of cancer is also discussed, as well as the challenges of exploiting NBR2 for cancer intervention.
Collapse
Affiliation(s)
- Ting Wang
- Department of Laboratory Medicine, Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research & The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Zhaosheng Li
- Department of Laboratory Medicine, Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research & The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Liujia Yan
- Department of Laboratory Medicine, Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research & The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Feng Yan
- Department of Laboratory Medicine, Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research & The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Han Shen
- Department of Laboratory Medicine, Nanjing Drum Tower Hospital, Nanjing University Medical School, Nanjing, China
| | - Xinyu Tian
- Department of Laboratory Medicine, Nanjing Drum Tower Hospital, Nanjing University Medical School, Nanjing, China
| |
Collapse
|
4
|
Karousis ED, Gypas F, Zavolan M, Mühlemann O. Nanopore sequencing reveals endogenous NMD-targeted isoforms in human cells. Genome Biol 2021; 22:223. [PMID: 34389041 PMCID: PMC8361881 DOI: 10.1186/s13059-021-02439-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Nonsense-mediated mRNA decay (NMD) is a eukaryotic, translation-dependent degradation pathway that targets mRNAs with premature termination codons and also regulates the expression of some mRNAs that encode full-length proteins. Although many genes express NMD-sensitive transcripts, identifying them based on short-read sequencing data remains a challenge. RESULTS To identify and analyze endogenous targets of NMD, we apply cDNA Nanopore sequencing and short-read sequencing to human cells with varying expression levels of NMD factors. Our approach detects full-length NMD substrates that are highly unstable and increase in levels or even only appear when NMD is inhibited. Among the many new NMD-targeted isoforms that our analysis identifies, most derive from alternative exon usage. The isoform-aware analysis reveals many genes with significant changes in splicing but no significant changes in overall expression levels upon NMD knockdown. NMD-sensitive mRNAs have more exons in the 3΄UTR and, for those mRNAs with a termination codon in the last exon, the length of the 3΄UTR per se does not correlate with NMD sensitivity. Analysis of splicing signals reveals isoforms where NMD has been co-opted in the regulation of gene expression, though the main function of NMD seems to be ridding the transcriptome of isoforms resulting from spurious splicing events. CONCLUSIONS Long-read sequencing enables the identification of many novel NMD-sensitive mRNAs and reveals both known and unexpected features concerning their biogenesis and their biological role. Our data provide a highly valuable resource of human NMD transcript targets for future genomic and transcriptomic applications.
Collapse
Affiliation(s)
- Evangelos D Karousis
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Foivos Gypas
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058, Basel, Switzerland
| | - Mihaela Zavolan
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Klingelbergstrasse 50-70, 4056, Basel, Switzerland
| | - Oliver Mühlemann
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
5
|
Xin X, Li Q, Fang J, Zhao T. LncRNA HOTAIR: A Potential Prognostic Factor and Therapeutic Target in Human Cancers. Front Oncol 2021; 11:679244. [PMID: 34367966 PMCID: PMC8340021 DOI: 10.3389/fonc.2021.679244] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 07/08/2021] [Indexed: 02/06/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are emerging as crucial regulators of gene expression and physiological processes. LncRNAs are a class of ncRNAs of 200 nucleotides in length. HOX transcript antisense RNA (HOTAIR), a trans-acting lncRNA with regulatory function on transcription, can repress gene expression by recruiting chromatin modifiers. HOTAIR is an oncogenic lncRNA, and numerous studies have determined that HOTAIR is highly upregulated in a wide variety of human cancers. In this review, we briefly summarize the impact of lncRNA HOTAIR expression and functions on different human solid tumors, and emphasize the potential of HOTAIR on tumor prognosis and therapy. Here, we review the recent studies that highlight the prognostic potential of HOTAIR in drug resistance and survival, and the progress of therapies developed to target HOTAIR to date. Furthermore, targeting HOTAIR results in the suppression of HOTAIR expression or function. Thus, HOTAIR knockdown exhibits great therapeutic potential in various cancers, indicating that targeting lncRNA HOTAIR may serve as a promising strategy for cancer therapy. We also propose that preclinical studies involving HOTAIR are required to provide a better understanding of the exact molecular mechanisms underlying the dysregulation of its expression and function in different human cancers and to explore effective methods of targeting HOTAIR and engineering efficient and targeted drug delivery methods in vivo.
Collapse
Affiliation(s)
- Xiaoru Xin
- College of Chemistry and Life Sciences, Zhejiang Normal University, Jinhua, China
| | - Qianan Li
- College of Chemistry and Life Sciences, Zhejiang Normal University, Jinhua, China
| | - Jinyong Fang
- Department of Science and Education, Jinhua Guangfu Oncology Hospital, Jinhua, China
| | - Tiejun Zhao
- College of Chemistry and Life Sciences, Zhejiang Normal University, Jinhua, China
| |
Collapse
|
6
|
Dragomir MP, Manyam GC, Ott LF, Berland L, Knutsen E, Ivan C, Lipovich L, Broom BM, Calin GA. FuncPEP: A Database of Functional Peptides Encoded by Non-Coding RNAs. Noncoding RNA 2020; 6:E41. [PMID: 32977531 PMCID: PMC7712257 DOI: 10.3390/ncrna6040041] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 09/15/2020] [Accepted: 09/18/2020] [Indexed: 02/06/2023] Open
Abstract
Non-coding RNAs (ncRNAs) are essential players in many cellular processes, from normal development to oncogenic transformation. Initially, ncRNAs were defined as transcripts that lacked an open reading frame (ORF). However, multiple lines of evidence suggest that certain ncRNAs encode small peptides of less than 100 amino acids. The sequences encoding these peptides are known as small open reading frames (smORFs), many initiating with the traditional AUG start codon but terminating with atypical stop codons, suggesting a different biogenesis. The ncRNA-encoded peptides (ncPEPs) are gradually becoming appreciated as a new class of functional molecules that contribute to diverse cellular processes, and are deregulated in different diseases contributing to pathogenesis. As multiple publications have identified unique ncPEPs, we appreciated the need for assembling a new web resource that could gather information about these functional ncPEPs. We developed FuncPEP, a new database of functional ncRNA encoded peptides, containing all experimentally validated and functionally characterized ncPEPs. Currently, FuncPEP includes a comprehensive annotation of 112 functional ncPEPs and specific details regarding the ncRNA transcripts that encode these peptides. We believe that FuncPEP will serve as a platform for further deciphering the biologic significance and medical use of ncPEPs.
Collapse
Affiliation(s)
- Mihnea P. Dragomir
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
- Department of Surgery, Fundeni Clinical Hospital, Carol Davila University of Medicine and Pharmacy, 022328 Bucharest, Romania
| | - Ganiraju C. Manyam
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (G.C.M.); (B.M.B.)
| | - Leonie Florence Ott
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
- Institute of Tumor Biology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Léa Berland
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
| | - Erik Knutsen
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
- Department of Medical Biology, Faculty of Health Sciences, UiT—The Arctic University of Norway, N-9037 Tromsø, Norway
| | - Cristina Ivan
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
- Center for RNA Interference and Non-Coding RNAs, The University of Texas MD Anderson Cancer Centre, Houston, TX 77054, USA
| | - Leonard Lipovich
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48201, USA;
| | - Bradley M. Broom
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (G.C.M.); (B.M.B.)
| | - George A. Calin
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; (L.F.O.); (L.B.); (E.K.); (C.I.)
- Center for RNA Interference and Non-Coding RNAs, The University of Texas MD Anderson Cancer Centre, Houston, TX 77054, USA
| |
Collapse
|
7
|
Yao RW, Liu CX, Chen LL. Linking RNA Processing and Function. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2020; 84:67-82. [PMID: 32019863 DOI: 10.1101/sqb.2019.84.039495] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
RNA processing is critical for eukaryotic mRNA maturation and function. It appears there is no exception for other types of RNAs. Long noncoding RNAs (lncRNAs) represent a subclass of noncoding RNAs, have sizes of >200 nucleotides (nt), and participate in various aspects of gene regulation. Although many lncRNAs are capped, polyadenylated, and spliced just like mRNAs, others are derived from primary transcripts of RNA polymerase II and stabilized by forming circular structures or by ending with small nucleolar RNA-protein complexes. Here we summarize the recent progress in linking the processing and function of these unconventionally processed lncRNAs; we also discuss how directional RNA movement is achieved using the radial flux movement of nascent precursor ribosomal RNA (pre-rRNA) in the human nucleolus as an example.
Collapse
Affiliation(s)
- Run-Wen Yao
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Chu-Xiao Liu
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Ling-Ling Chen
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
8
|
A bioinformatics workflow for the evaluation of RT-qPCR primer specificity: Application for the assessment of gene expression data reliability in toxicological studies. Regul Toxicol Pharmacol 2020; 111:104575. [PMID: 31945455 DOI: 10.1016/j.yrtph.2020.104575] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Accepted: 01/02/2020] [Indexed: 12/11/2022]
Abstract
The reliability of Reverse Transcription quantitative real-time PCR (RT-qPCR) gene expression data depends on proper primer design and RNA quality controls. Despite freely available genomic databases and bioinformatics tools, primer design deficiencies can be found across life science publications. In order to assess the prevalence of such deficiencies in the toxicological literature, 504 primer sets extracted from a random selection of 70 recent rat toxicological studies were evaluated. The specificity of each primer set was systematically analysed using a bioinformatics workflow developed from publicly available resources (NCBI Primer BLAST, in silico PCR in UCSC genome browser, Ensembl DNA database). Potential mismatches (9%), cross-matches (13.5%), co-amplification of multiple gene splice variants (9%) and sub-optimal amplicon sizes (25%) were identified for a significant proportion of the primer sets assessed in silico. Quality controls for gDNA contamination of RNA samples were infrequently reported in the surveyed manuscripts. Hence, the impacts of gDNA contamination on RT-qPCR data were further investigated, revealing that lowly expressed genes presented higher susceptibility to contaminating gDNA. In addition to the retrospective identification of potential primer design issues presented in this study, the described bioinformatics workflow can also be used prospectively to select candidate primer sets for experimental validation.
Collapse
|
9
|
Mudge JM, Jungreis I, Hunt T, Gonzalez JM, Wright JC, Kay M, Davidson C, Fitzgerald S, Seal R, Tweedie S, He L, Waterhouse RM, Li Y, Bruford E, Choudhary JS, Frankish A, Kellis M. Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci. Genome Res 2019; 29:2073-2087. [PMID: 31537640 PMCID: PMC6886504 DOI: 10.1101/gr.246462.118] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 09/09/2019] [Indexed: 12/15/2022]
Abstract
The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.
Collapse
Affiliation(s)
- Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - James C Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, London SW7 3RP, United Kingdom
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Stephen Fitzgerald
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Ruth Seal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.,Department of Haematology, University of Cambridge, Cambridge CB2 0PT, United Kingdom
| | - Susan Tweedie
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Liang He
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Robert M Waterhouse
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Yue Li
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Elspeth Bruford
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.,Department of Haematology, University of Cambridge, Cambridge CB2 0PT, United Kingdom
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, London SW7 3RP, United Kingdom
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
10
|
Akter S, Xu D, Nagel SC, Bromfield JJ, Pelch K, Wilshire GB, Joshi T. Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data. Front Genet 2019; 10:766. [PMID: 31552087 PMCID: PMC6737999 DOI: 10.3389/fgene.2019.00766] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 07/19/2019] [Indexed: 12/29/2022] Open
Abstract
Endometriosis is a complex and common gynecological disorder yet a poorly understood disease affecting about 176 million women worldwide and causing significant impact on their quality of life and economic burden. Neither a definitive clinical symptom nor a minimally invasive diagnostic method is available, thus leading to an average of 4 to 11 years of diagnostic latency. Discovery of relevant biological patterns from microarray expression or next generation sequencing (NGS) data has been advanced over the last several decades by applying various machine learning tools. We performed machine learning analysis using 38 RNA-seq and 80 enrichment-based DNA methylation (MBD-seq) datasets. We experimented how well various supervised machine learning methods such as decision tree, partial least squares discriminant analysis (PLSDA), support vector machine, and random forest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data. The assessment was done from two different perspectives for improving classification performances: a) implication of three different normalization techniques and b) implication of differential analysis using the generalized linear model (GLM). Several candidate biomarker genes were identified by multiple machine learning experiments including NOTCH3, SNAPC2, B4GALNT1, SMAP2, DDB2, GTF3C5, and PTOV1 from the transcriptomics data analysis and TRPM6, RASSF2, TNIP2, RP3-522J7.6, FGD3, and MFSD14B from the methylomics data analysis. We concluded that an appropriate machine learning diagnostic pipeline for endometriosis should use TMM normalization for transcriptomics data, and quantile or voom normalization for methylomics data, GLM for feature space reduction and classification performance maximization.
Collapse
Affiliation(s)
- Sadia Akter
- Informatics Institute, University of Missouri, Columbia, MO, United States
| | - Dong Xu
- Informatics Institute, University of Missouri, Columbia, MO, United States
- Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Susan C. Nagel
- OB/GYN and Women’s Health, University of Missouri School of Medicine, Columbia, MO, United States
| | - John J. Bromfield
- OB/GYN and Women’s Health, University of Missouri School of Medicine, Columbia, MO, United States
| | - Katherine Pelch
- OB/GYN and Women’s Health, University of Missouri School of Medicine, Columbia, MO, United States
| | | | - Trupti Joshi
- Informatics Institute, University of Missouri, Columbia, MO, United States
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- Health Management and Informatics, University of Missouri, Columbia, MO, United States
| |
Collapse
|
11
|
HOX transcript antisense RNA (HOTAIR) in cancer. Cancer Lett 2019; 454:90-97. [DOI: 10.1016/j.canlet.2019.04.016] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 04/06/2019] [Accepted: 04/08/2019] [Indexed: 01/17/2023]
|
12
|
Pervouchine D, Popov Y, Berry A, Borsari B, Frankish A, Guigó R. Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay. Nucleic Acids Res 2019; 47:5293-5306. [PMID: 30916337 PMCID: PMC6547761 DOI: 10.1093/nar/gkz193] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 03/12/2019] [Indexed: 11/12/2022] Open
Abstract
Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance system that selectively degrades transcripts with premature termination codons (PTC). Many RNA-binding proteins (RBP) regulate their expression levels by a negative feedback loop, in which RBP binds its own pre-mRNA and causes alternative splicing to introduce a PTC. We present a bioinformatic analysis integrating three data sources, eCLIP assays for a large RBP panel, shRNA inactivation of NMD pathway, and shRNA-depletion of RBPs followed by RNA-seq, to identify novel such autoregulatory feedback loops. We show that RBPs frequently bind their own pre-mRNAs, their exons respond prominently to NMD pathway disruption, and that the responding exons are enriched with nearby eCLIP peaks. We confirm previously proposed models of autoregulation in SRSF7 and U2AF1 genes and present two novel models, in which (i) SFPQ binds its mRNA and promotes switching to an alternative distal 3'-UTR that is targeted by NMD, and (ii) RPS3 binding activates a poison 5'-splice site in its pre-mRNA that leads to a frame shift and degradation by NMD. We also suggest specific splicing events that could be implicated in autoregulatory feedback loops in RBM39, HNRNPM, and U2AF2 genes. The results are available through a UCSC Genome Browser track hub.
Collapse
Affiliation(s)
- Dmitri Pervouchine
- Skolkovo Institute of Science and Technology, Ulitsa Nobelya 3, Moscow 121205, Russia
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Leninskiye Gory 1-73, 119234 Moscow, Russia
| | - Yaroslav Popov
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Leninskiye Gory 1-73, 119234 Moscow, Russia
| | - Andy Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SA Hinxton, Cambridge, UK
| | - Beatrice Borsari
- Center for Genomic Regulation, The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SA Hinxton, Cambridge, UK
| | - Roderic Guigó
- Center for Genomic Regulation, The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain
| |
Collapse
|
13
|
Johnson NT, Dhroso A, Hughes KJ, Korkin D. Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers? RNA (NEW YORK, N.Y.) 2018; 24:1119-1132. [PMID: 29941426 PMCID: PMC6097660 DOI: 10.1261/rna.062802.117] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 06/03/2018] [Indexed: 05/09/2023]
Abstract
RNA sequencing (RNA-seq) is becoming a prevalent approach to quantify gene expression and is expected to gain better insights into a number of biological and biomedical questions compared to DNA microarrays. Most importantly, RNA-seq allows us to quantify expression at the gene or transcript levels. However, leveraging the RNA-seq data requires development of new data mining and analytics methods. Supervised learning methods are commonly used approaches for biological data analysis that have recently gained attention for their applications to RNA-seq data. Here, we assess the utility of supervised learning methods trained on RNA-seq data for a diverse range of biological classification tasks. We hypothesize that the transcript-level expression data are more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment utilizes multiple data sets, organisms, lab groups, and RNA-seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-seq data sets and include over 2000 samples that come from multiple organisms, lab groups, and RNA-seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes, and pathological tumor stages for the samples from the cancerous tissue. For each problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the transcript-based classifiers outperform or are comparable with gene expression-based methods. The top-performing techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-seq based data analysis.
Collapse
Affiliation(s)
- Nathan T Johnson
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
| | - Andi Dhroso
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
| | - Katelyn J Hughes
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
| | - Dmitry Korkin
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
- Worcester Polytechnic Institute, Department of Computer Science, Worcester, Massachusetts 01609, USA
| |
Collapse
|
14
|
De Bortoli M, Miano V, Coscujuela Tarrero L. The new world of RNA biomarkers and explorers’ prudence rules. Int J Biol Markers 2018; 33:239-243. [DOI: 10.1177/1724600818764071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Michele De Bortoli
- Center for Molecular Systems Biology and Department of Clinical and Biological Sciences, University of Turin, Orbassano, Turin, Italy
| | - Valentina Miano
- Center for Molecular Systems Biology and Department of Clinical and Biological Sciences, University of Turin, Orbassano, Turin, Italy
| | - Lucia Coscujuela Tarrero
- Center for Molecular Systems Biology and Department of Clinical and Biological Sciences, University of Turin, Orbassano, Turin, Italy
| |
Collapse
|
15
|
Cheah HL, Raabe CA, Lee LP, Rozhdestvensky TS, Citartan M, Ahmed SA, Tang TH. Bacterial regulatory RNAs: complexity, function, and putative drug targeting. Crit Rev Biochem Mol Biol 2018; 53:335-355. [PMID: 29793351 DOI: 10.1080/10409238.2018.1473330] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Over the past decade, RNA-deep sequencing has uncovered copious non-protein coding RNAs (npcRNAs) in bacteria. Many of them are key players in the regulation of gene expression, taking part in various regulatory circuits, such as metabolic responses to different environmental stresses, virulence, antibiotic resistance, and host-pathogen interactions. This has contributed to the high adaptability of bacteria to changing or even hostile environments. Their mechanisms include the regulation of transcriptional termination, modulation of translation, and alteration of messenger RNA (mRNA) stability, as well as protein sequestration. Here, the mechanisms of gene expression by regulatory bacterial npcRNAs are comprehensively reviewed and supplemented with well-characterized examples. This class of molecules and their mechanisms of action might be useful targets for the development of novel antibiotics.
Collapse
Affiliation(s)
- Hong-Leong Cheah
- a Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia , Kepala Batas , Malaysia
| | - Carsten A Raabe
- b Institute of Experimental Pathology, Centre for Molecular Biology of Inflammation , University of Münster , Münster , Germany.,c Brandenburg Medical School (MHB) , Neuruppin , Germany.,d Institute of Medical Biochemistry, Centre for Molecular Biology of Inflammation , University of Münster , Münster , Germany
| | - Li-Pin Lee
- a Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia , Kepala Batas , Malaysia
| | - Timofey S Rozhdestvensky
- e Medical Faculty, Transgenic Mouse and Genome Engineering Model Core Facility (TRAM) , University of Münster , Münster , Germany
| | - Marimuthu Citartan
- a Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia , Kepala Batas , Malaysia
| | - Siti Aminah Ahmed
- a Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia , Kepala Batas , Malaysia
| | - Thean-Hock Tang
- a Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia , Kepala Batas , Malaysia
| |
Collapse
|
16
|
Inherited, not acquired, Gitelman syndrome in a patient with Sjögren's syndrome: importance of genetic testing to distinguish the two forms. CEN Case Rep 2017; 6:180-184. [PMID: 28819721 PMCID: PMC5694408 DOI: 10.1007/s13730-017-0271-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 07/30/2017] [Indexed: 12/13/2022] Open
Abstract
Gitelman syndrome (GS) is an autosomal recessive, salt-losing renal tubulopathy caused by mutations in the SLC12A3 gene; however, it can also be acquired in patients with autoimmune disease, especially in those with Sjögren’s syndrome. Differentiating between the inherited and acquired forms of GS is clinically difficult. We report a case of inherited, not acquired, GS in a patient with Sjögren’s syndrome. A 41-year-old woman, who had been diagnosed with Sjögren’s syndrome at 27-years-old, had shown chronic hypokalemia (2.5–3.5 mmol/L). Laboratory tests showed hypokalemic alkalosis, hypomagnesemia, and hypocalciuria, corresponding to GS. Although acquired GS associated with Sjögren’s syndrome was initially suspected, a genetic test identified a novel homozygous mutation of c.1336-2A > T in the SLC12A3 gene, which resulted in aberrant splicing in the SLC12A3 transcript with the exclusion of exons 11 and 12. Thus, the GS was diagnosed as not the acquired but the inherited form. In the diagnosis of GS in patients with autoimmune disease, genetic testing of SLC12A3 is essential for differentiating the two forms.
Collapse
|
17
|
Abstract
There is growing evidence that transcription and nuclear organization are tightly linked. Yet, whether transcription of thousands of long noncoding RNAs (lncRNAs) could play a role in this packaging process remains elusive. Although some lncRNAs have been found to have clear roles in nuclear architecture (e.g., FIRRE, NEAT1, XIST, and others), the vast majority remain poorly understood. In this Perspective, we highlight how the act of transcription can affect nuclear architecture. We synthesize several recent findings into a proposed model where the transcription of lncRNAs can serve as guide-posts for shaping genome organization. This model is similar to the game "cat's cradle," where the shape of a string is successively changed by opening up new sites for finger placement. Analogously, transcription of lncRNAs could serve as "grip holds" for nuclear proteins to pull the genome into new positions. This model could explain general lncRNA properties such as low abundance and tissue specificity. Overall, we propose a general framework for how the act of lncRNA transcription could play a role in organizing the 3D genome.
Collapse
Affiliation(s)
- Marta Melé
- Harvard Stem Cell and Regenerative Biology Department, Harvard University, Cambridge, MA 02138, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, MA 02138, USA
| | - John L Rinn
- Harvard Stem Cell and Regenerative Biology Department, Harvard University, Cambridge, MA 02138, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, MA 02138, USA; Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA.
| |
Collapse
|
18
|
Steward CA, Parker APJ, Minassian BA, Sisodiya SM, Frankish A, Harrow J. Genome annotation for clinical genomic diagnostics: strengths and weaknesses. Genome Med 2017; 9:49. [PMID: 28558813 PMCID: PMC5448149 DOI: 10.1186/s13073-017-0441-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
The Human Genome Project and advances in DNA sequencing technologies have revolutionized the identification of genetic disorders through the use of clinical exome sequencing. However, in a considerable number of patients, the genetic basis remains unclear. As clinicians begin to consider whole-genome sequencing, an understanding of the processes and tools involved and the factors to consider in the annotation of the structure and function of genomic elements that might influence variant identification is crucial. Here, we discuss and illustrate the strengths and weaknesses of approaches for the annotation and classification of important elements of protein-coding genes, other genomic elements such as pseudogenes and the non-coding genome, comparative-genomic approaches for inferring gene function, and new technologies for aiding genome annotation, as a practical guide for clinicians when considering pathogenic sequence variation. Complete and accurate annotation of structure and function of genome features has the potential to reduce both false-negative (from missing annotation) and false-positive (from incorrect annotation) errors in causal variant identification in exome and genome sequences. Re-analysis of unsolved cases will be necessary as newer technology improves genome annotation, potentially improving the rate of diagnosis.
Collapse
Affiliation(s)
- Charles A Steward
- Congenica Ltd, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1DR, UK. .,The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | | | - Berge A Minassian
- Department of Pediatrics (Neurology), University of Texas Southwestern, Dallas, TX, USA.,Program in Genetics and Genome Biology and Department of Paediatrics (Neurology), The Hospital for Sick Children and University of Toronto, Toronto, Canada
| | - Sanjay M Sisodiya
- Department of Clinical and Experimental Epilepsy, UCL Institute of Neurology, London, WC1N 3BG, UK.,Chalfont Centre for Epilepsy, Chesham Lane, Chalfont St Peter, Buckinghamshire, SL9 0RJ, UK
| | - Adam Frankish
- The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jennifer Harrow
- The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Illumina Inc, Great Chesterford, Essex, CB10 1XL, UK
| |
Collapse
|
19
|
Yang RC. Genome-wide estimation of heritability and its functional components for flowering, defense, ionomics, and developmental traits in a geographically diverse population of Arabidopsis thaliana. Genome 2017; 60:572-580. [PMID: 28314113 DOI: 10.1139/gen-2016-0213] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Narrow-sense heritability (portion of the total phenotypic variation attributable to additive genetic effect, h2) is a critical parameter in plant breeding and genetics, but its estimation is difficult for populations with unknown pedigree information. This study applied a marker-based linear mixed model (LMM) analysis to estimate narrow-sense heritability and its seven functional components corresponding to SNPs in coding and noncoding regions for each of 107 flowering, defense, ionomics, and developmental traits in an Arabidopsis (Arabidopsis thaliana) population of 199 inbred lines with unknown genetic relatedness. Genetic relationship matrix (GRM) based on 214 051 SNPs and component GRMs based on seven subsets of SNPs were computed for LMM estimation of h2 and functional components contributing to h2, respectively. The h2 estimates for flowering traits were higher than those for defense, ionomics, and developmental traits, supporting a general view that the fitness-related traits have lower heritabilities than other traits. The function component owing to SNPs in coding (exon) regions was the least contributor to h2. Our LMM analysis provides an opportunity to gain a comprehensive view on heritability and its functional components for populations with unknown structure but with genome-wide DNA markers.
Collapse
Affiliation(s)
- Rong-Cai Yang
- Alberta Agriculture and Forestry, #307, 7000-113 Street, Edmonton, AB T6H 5T6, Canada; Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2P5, Canada.,Alberta Agriculture and Forestry, #307, 7000-113 Street, Edmonton, AB T6H 5T6, Canada; Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2P5, Canada
| |
Collapse
|
20
|
Jiang C, Li Y, Zhao Z, Lu J, Chen H, Ding N, Wang G, Xu J, Li X. Identifying and functionally characterizing tissue-specific and ubiquitously expressed human lncRNAs. Oncotarget 2016; 7:7120-33. [PMID: 26760768 PMCID: PMC4872773 DOI: 10.18632/oncotarget.6859] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Accepted: 12/26/2015] [Indexed: 01/12/2023] Open
Abstract
Recent advances in transcriptome sequencing have made it possible to distinguish ubiquitously expressed long non-coding RNAs (UE lncRNAs) from tissue-specific lncRNAs (TS lncRNAs), thereby providing clues to their cellular functions. Here, we assembled and functionally characterized a consensus lncRNA transcriptome by curating hundreds of RNA-seq datasets across normal human tissues from 16 independent studies. In total, 1,184 UE and 2,583 TS lncRNAs were identified. These different lncRNA populations had several distinct features. Specifically, UE lncRNAs were associated with genomic compaction and highly conserved exons and promoter regions. We found that UE lncRNAs are regulated at the transcriptional level (with especially strong regulation of enhancers) and are associated with epigenetic modifications and post-transcriptional regulation. Based on these observations we propose a novel way to predict the functions of UE and TS lncRNAs through analysis of their genomic location and similarities in epigenetic modifications. Our characterization of UE and TS lncRNAs may provide a foundation for lncRNA genomics and the delineation of complex disease mechanisms.
Collapse
Affiliation(s)
- Chunjie Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yongsheng Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zheng Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jianping Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hong Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Na Ding
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Guangjuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Juan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
21
|
Abstract
Every ribonucleic acid begins its cellular life as a transcript. If the transcript or its processing product has a function it should be regarded an RNA. Nonfunctional transcripts, by-products from processing, degradation intermediates, even those originating from (functional) RNAs, and non-functional products of transcriptional gene regulation accomplished via the act of transcription, as well as stochastic (co)transcripts could simply be addressed as transcripts (class 0). The copious functional RNAs (class I), often maturing after one or more processing steps, already are systematized into ever expanding sub-classifications ranging from micro RNAs to rRNAs. Established sub-classifications addressing a wide functional diversity remain unaffected. mRNAs (class II) are distinct from any other RNA by virtue of their potential to be translated into (poly)peptide(s) on ribosomes. We are not proposing a novel RNA classification, but wish to add a basic concept with existing terminology (transcript, RNA, and mRNA) that should serve as an additional framework for carefully delineating RNA function from an avalanche of RNA sequencing data. At the same time, this top level hierarchical model should illuminate important principles of RNA evolution and biology thus heightening our awareness that in biology boundaries and categorizations are typically fuzzy.
Collapse
Affiliation(s)
- Jürgen Brosius
- a Institute of Experimental Pathology, ZMBE, University of Münster , Von-Esmarch-Str. 56, 48149 ; Münster , Germany.,b Institute of Evolutionary and Medical Genomics, Brandenburg Medical School (MHB) , Fehrbelliner Str. 38, 16816 ; Germany
| | - Carsten A Raabe
- a Institute of Experimental Pathology, ZMBE, University of Münster , Von-Esmarch-Str. 56, 48149 ; Münster , Germany.,b Institute of Evolutionary and Medical Genomics, Brandenburg Medical School (MHB) , Fehrbelliner Str. 38, 16816 ; Germany
| |
Collapse
|
22
|
Abstract
A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe - or 'annotate' - genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists - from clinicians to evolutionary biologists - need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.
Collapse
Affiliation(s)
- Jonathan M Mudge
- Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK
| | - Jennifer Harrow
- Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.,Illumina Cambridge Ltd, Chesterford Research Park, Little Chesterford, Saffron Walden CB10 1 XL, UK
| |
Collapse
|
23
|
Beyond the survival and death of the deltamethrin-threatened pollen beetle Meligethes aeneus: An in-depth proteomic study employing a transcriptome database. J Proteomics 2016; 150:281-289. [PMID: 27705816 DOI: 10.1016/j.jprot.2016.09.016] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 09/10/2016] [Accepted: 09/28/2016] [Indexed: 12/18/2022]
Abstract
Insecticide resistance is an increasingly global problem that hampers pest control. We sought the mechanism responsible for survival following pyrethroid treatment and the factors connected to paralysis/death of the pollen beetle Meligethes aeneus through a proteome-level analysis using nanoLC coupled with Orbitrap Fusion™ Tribrid™ mass spectrometry. A tolerant field population of beetles was treated with deltamethrin, and the ensuing proteome changes were observed in the survivors (resistant), dead (paralyzed) and control-treated beetles. The protein database consisted of the translated transcriptome, and the resulting changes were manually annotated via BLASTP. We identified a number of high-abundance changes in which there were several dominant proteins, e.g., the electron carrier cytochrome b5, ribosomal proteins 60S RPL28, 40S RPS23 and RPS26, eIF4E-transporter, anoxia up-regulated protein, 2 isoforms of vitellogenin and pathogenesis-related protein 5. Deltamethrin detoxification was influenced by different cytochromes P450, which were likely boosted by increased cytochrome b5, but glutathione-S-transferase ε and UDP-glucuronosyltransferases also contributed. Moreover, we observed changes in proteins related to RNA interference, RNA binding and epigenetic modifications. The high changes in ribosomal proteins and associated factors suggest specific control of translation. Overall, we showed modulation of expression processes by epigenetic markers, alternative splicing and translation. Future functional studies will benefit. BIOLOGICAL SIGNIFICANCE Insects develop pesticide resistance, which has become one of the key issues in plant protection. This growing resistance increases the demand for pesticide applications and the development of new substances. Knowledge in the field regarding the resistance mechanism and its responses to pesticide treatment provides us the opportunity to propose a solution for this issue. Although the pollen beetle Meligethes aeneus was effectively controlled with pyrethroids for many years, there have been reports of increasing resistance. We show protein changes including production of isoforms in response to deltamethrin at the protein level. These results illustrate the insect's survival state as a resistant beetle and in its paralyzed state (evaluated as dead) relative to resistant individuals.
Collapse
|
24
|
Signal B, Gloss BS, Dinger ME. Computational Approaches for Functional Prediction and Characterisation of Long Noncoding RNAs. Trends Genet 2016; 32:620-637. [PMID: 27592414 DOI: 10.1016/j.tig.2016.08.004] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Revised: 08/03/2016] [Accepted: 08/04/2016] [Indexed: 02/09/2023]
Abstract
Although a considerable portion of eukaryotic genomes is transcribed as long noncoding RNAs (lncRNAs), the vast majority are functionally uncharacterised. The rapidly expanding catalogue of mechanistically investigated lncRNAs has provided evidence for distinct functional subclasses, which are now ripe for exploitation as a general model to predict functions for uncharacterised lncRNAs. By utilising publicly-available genome-wide datasets and computational methods, we present several developed and emerging in silico approaches to characterise and predict the functions of lncRNAs. We propose that the application of these techniques provides valuable functional and mechanistic insight into lncRNAs, and is a crucial step for informing subsequent functional studies.
Collapse
Affiliation(s)
- Bethany Signal
- Garvan Institute of Medical Research, Sydney, Australia; St Vincent's Clinical School, University of New South Wales, Sydney, Australia
| | - Brian S Gloss
- Garvan Institute of Medical Research, Sydney, Australia; St Vincent's Clinical School, University of New South Wales, Sydney, Australia
| | - Marcel E Dinger
- Garvan Institute of Medical Research, Sydney, Australia; St Vincent's Clinical School, University of New South Wales, Sydney, Australia.
| |
Collapse
|
25
|
Lagarde J, Uszczynska-Ratajczak B, Santoyo-Lopez J, Gonzalez JM, Tapanari E, Mudge JM, Steward CA, Wilming L, Tanzer A, Howald C, Chrast J, Vela-Boza A, Rueda A, Lopez-Domingo FJ, Dopazo J, Reymond A, Guigó R, Harrow J. Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq). Nat Commun 2016; 7:12339. [PMID: 27531712 PMCID: PMC4992054 DOI: 10.1038/ncomms12339] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 06/23/2016] [Indexed: 12/22/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome. Such characterization requires a comprehensive, high-quality annotation of their gene structure and boundaries, which is currently lacking. Here we describe RACE-Seq, an experimental workflow designed to address this based on RACE (rapid amplification of cDNA ends) and long-read RNA sequencing. We apply RACE-Seq to 398 human lncRNA genes in seven tissues, leading to the discovery of 2,556 on-target, novel transcripts. About 60% of the targeted loci are extended in either 5′ or 3′, often reaching genomic hallmarks of gene boundaries. Analysis of the novel transcripts suggests that lncRNAs are as long, have as many exons and undergo as much alternative splicing as protein-coding genes, contrary to current assumptions. Overall, we show that RACE-Seq is an effective tool to annotate an organism's deep transcriptome, and compares favourably to other targeted sequencing techniques. Long non-coding RNAs are increasingly recognised to be important factors in regulating cellular processes and comprise a large faction of the transcriptome, however most are uncharacterised. Here the authors present RACE-Seq, a tool to improve and extend the annotation of low-expression transcripts.
Collapse
Affiliation(s)
- Julien Lagarde
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Barbara Uszczynska-Ratajczak
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | | | | | - Electra Tapanari
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK
| | - Jonathan M Mudge
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK
| | - Charles A Steward
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK
| | - Laurens Wilming
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK
| | - Andrea Tanzer
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Cédric Howald
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Jacqueline Chrast
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Alicia Vela-Boza
- Genomics and Bioinformatics Platform of Andalusia (GBPA), 41092 Seville, Spain.,Roche Diagnostics, 08174 Sant Cugat Del Vallès, Barcelona, Spain
| | - Antonio Rueda
- Genomics and Bioinformatics Platform of Andalusia (GBPA), 41092 Seville, Spain
| | | | - Joaquin Dopazo
- Genomics and Bioinformatics Platform of Andalusia (GBPA), 41092 Seville, Spain.,Computational Genomics Department, Centro de Investigación Príncipe Felipe, 46012 Valencia, Spain.,Functional Genomics Node (INB), Centro de Investigación Príncipe Felipe, 46012 Valencia, Spain
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK
| |
Collapse
|
26
|
St Laurent G, Vyatkin Y, Antonets D, Ri M, Qi Y, Saik O, Shtokalo D, de Hoon MJL, Kawaji H, Itoh M, Lassmann T, Arner E, Forrest ARR, Nicolas E, McCaffrey TA, Carninci P, Hayashizaki Y, Wahlestedt C, Kapranov P. Functional annotation of the vlinc class of non-coding RNAs using systems biology approach. Nucleic Acids Res 2016; 44:3233-52. [PMID: 27001520 PMCID: PMC4838384 DOI: 10.1093/nar/gkw162] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 03/02/2016] [Indexed: 12/20/2022] Open
Abstract
Functionality of the non-coding transcripts encoded by the human genome is the coveted goal of the modern genomics research. While commonly relied on the classical methods of forward genetics, integration of different genomics datasets in a global Systems Biology fashion presents a more productive avenue of achieving this very complex aim. Here we report application of a Systems Biology-based approach to dissect functionality of a newly identified vast class of very long intergenic non-coding (vlinc) RNAs. Using highly quantitative FANTOM5 CAGE dataset, we show that these RNAs could be grouped into 1542 novel human genes based on analysis of insulators that we show here indeed function as genomic barrier elements. We show that vlinc RNAs genes likely function in cisto activate nearby genes. This effect while most pronounced in closely spaced vlinc RNA-gene pairs can be detected over relatively large genomic distances. Furthermore, we identified 101 vlinc RNA genes likely involved in early embryogenesis based on patterns of their expression and regulation. We also found another 109 such genes potentially involved in cellular functions also happening at early stages of development such as proliferation, migration and apoptosis. Overall, we show that Systems Biology-based methods have great promise for functional annotation of non-coding RNAs.
Collapse
Affiliation(s)
- Georges St Laurent
- St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801, USA Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, USA
| | - Yuri Vyatkin
- St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801, USA AcademGene Ltd., 6, Acad. Lavrentjev ave., Novosibirsk 630090, Russia
| | - Denis Antonets
- AcademGene Ltd., 6, Acad. Lavrentjev ave., Novosibirsk 630090, Russia State Research Center of Virology and Biotechnology 'Vector', Novosibirsk, Russia A. P. Ershov Institute of Informatics Systems SB RAS, 6, Acad. Lavrentjev ave., Novosibirsk 630090, Russia
| | - Maxim Ri
- St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801, USA AcademGene Ltd., 6, Acad. Lavrentjev ave., Novosibirsk 630090, Russia
| | - Yao Qi
- Institute of Genomics, School of Biomedical Sciences, Huaqiao University, 668 Jimei Road, Xiamen 361021, China
| | - Olga Saik
- St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801, USA AcademGene Ltd., 6, Acad. Lavrentjev ave., Novosibirsk 630090, Russia Federal Research Center Institute of Cytology and Genetics SB RAS, 10, Acad. Lavrentjev ave., Novosibirsk 630090, Russia
| | - Dmitry Shtokalo
- St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801, USA AcademGene Ltd., 6, Acad. Lavrentjev ave., Novosibirsk 630090, Russia A. P. Ershov Institute of Informatics Systems SB RAS, 6, Acad. Lavrentjev ave., Novosibirsk 630090, Russia
| | - Michiel J L de Hoon
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Hideya Kawaji
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan RIKEN Preventive Medicine and Diagnosis Innovation Program (PMI), 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan
| | - Masayoshi Itoh
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan RIKEN Preventive Medicine and Diagnosis Innovation Program (PMI), 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan
| | - Timo Lassmann
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan Telethon Kids Institute, The University of Western Australia, 100 Roberts Road, Subiaco, Subiaco, 6008, Western Australia, Australia
| | - Erik Arner
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Alistair R R Forrest
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | | | - Estelle Nicolas
- LBCMCP, Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, France
| | - Timothy A McCaffrey
- The George Washington University Medical Center, Department of Medicine, Division of Genomic Medicine, 2300 I St. NW, Washington, DC, USA
| | - Piero Carninci
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Yoshihide Hayashizaki
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Preventive Medicine and Diagnosis Innovation Program (PMI), 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan
| | - Claes Wahlestedt
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1501 NW 10th Ave., Miami, FL 33136, USA
| | - Philipp Kapranov
- Institute of Genomics, School of Biomedical Sciences, Huaqiao University, 668 Jimei Road, Xiamen 361021, China St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801, USA
| |
Collapse
|
27
|
Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans. Genome Biol 2016; 17:14. [PMID: 26821746 PMCID: PMC4731934 DOI: 10.1186/s13059-016-0873-8] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/06/2016] [Indexed: 02/06/2023] Open
Abstract
Background Long non-coding RNAs (lncRNAs) are increasingly implicated as gene regulators and may ultimately be more numerous than protein-coding genes in the human genome. Despite large numbers of reported lncRNAs, reference annotations are likely incomplete due to their lower and tighter tissue-specific expression compared to mRNAs. An unexplored factor potentially confounding lncRNA identification is inter-individual expression variability. Here, we characterize lncRNA natural expression variability in human primary granulocytes. Results We annotate granulocyte lncRNAs and mRNAs in RNA-seq data from 10 healthy individuals, identifying multiple lncRNAs absent from reference annotations, and use this to investigate three known features (higher tissue-specificity, lower expression, and reduced splicing efficiency) of lncRNAs relative to mRNAs. Expression variability was examined in seven individuals sampled three times at 1- or more than 1-month intervals. We show that lncRNAs display significantly more inter-individual expression variability compared to mRNAs. We confirm this finding in two independent human datasets by analyzing multiple tissues from the GTEx project and lymphoblastoid cell lines from the GEUVADIS project. Using the latter dataset we also show that including more human donors into the transcriptome annotation pipeline allows identification of an increasing number of lncRNAs, but minimally affects mRNA gene number. Conclusions A comprehensive annotation of lncRNAs is known to require an approach that is sensitive to low and tight tissue-specific expression. Here we show that increased inter-individual expression variability is an additional general lncRNA feature to consider when creating a comprehensive annotation of human lncRNAs or proposing their use as prognostic or disease markers. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0873-8) contains supplementary material, which is available to authorized users.
Collapse
|
28
|
Liu T, Lin K. The distribution pattern of genetic variation in the transcript isoforms of the alternatively spliced protein-coding genes in the human genome. MOLECULAR BIOSYSTEMS 2016; 11:1378-88. [PMID: 25820936 DOI: 10.1039/c5mb00132c] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
By enabling the transcription of multiple isoforms from the same gene locus, alternative-splicing mechanisms greatly expand the diversity of the human transcriptome and proteome. Currently, the alternatively spliced transcripts from each protein-coding gene locus in the human genome can be classified as either principal or non-principal isoforms, providing that they differ with respect to cross-species conservation or biological features. By mapping the variants from the 1000 Genomes Project onto the coding region of each isoform, an interesting pattern of the genetic variation distributions of the coding regions for these two types of transcript isoforms was revealed on a whole-genome scale: compared with the principal isoform-specific coding regions, the non-principal isoform-specific coding regions are significantly enriched in amino acid-changing variants, particularly those that have a strong impact on protein function and have higher derived allele frequencies, suggesting that non-principal isoform-specific substitutions are less likely to be related to phenotype changes or disease. The results herein can help us better understand the potential consequences of alternatively spliced products from a population perspective.
Collapse
Affiliation(s)
- Ting Liu
- College of Life Sciences, Beijing Normal University, No. 19, Xinjiekouwai Street, Haidian District, Beijing, 100875, P. R. China.
| | | |
Collapse
|
29
|
Kanduri K, Tripathi S, Larjo A, Mannerström H, Ullah U, Lund R, Hawkins RD, Ren B, Lähdesmäki H, Lahesmaa R. Identification of global regulators of T-helper cell lineage specification. Genome Med 2015; 7:122. [PMID: 26589177 PMCID: PMC4654807 DOI: 10.1186/s13073-015-0237-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2015] [Accepted: 11/02/2015] [Indexed: 11/15/2022] Open
Abstract
Background Activation and differentiation of T-helper (Th) cells into Th1 and Th2 types is a complex process orchestrated by distinct gene activation programs engaging a number of genes. This process is crucial for a robust immune response and an imbalance might lead to disease states such as autoimmune diseases or allergy. Therefore, identification of genes involved in this process is paramount to further understand the pathogenesis of, and design interventions for, immune-mediated diseases. Methods We aimed at identifying protein-coding genes and long non-coding RNAs (lncRNAs) involved in early differentiation of T-helper cells by transcriptome analysis of cord blood-derived naïve precursor, primary and polarized cells. Results Here, we identified lineage-specific genes involved in early differentiation of Th1 and Th2 subsets by integrating transcriptional profiling data from multiple platforms. We have obtained a high confidence list of genes as well as a list of novel genes by employing more than one profiling platform. We show that the density of lineage-specific epigenetic marks is higher around lineage-specific genes than anywhere else in the genome. Based on next-generation sequencing data we identified lineage-specific lncRNAs involved in early Th1 and Th2 differentiation and predicted their expected functions through Gene Ontology analysis. We show that there is a positive trend in the expression of the closest lineage-specific lncRNA and gene pairs. We also found out that there is an enrichment of disease SNPs around a number of lncRNAs identified, suggesting that these lncRNAs might play a role in the etiology of autoimmune diseases. Conclusion The results presented here show the involvement of several new actors in the early differentiation of T-helper cells and will be a valuable resource for better understanding of autoimmune processes. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0237-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kartiek Kanduri
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland. .,Department of Computer Science, Aalto University School of Science, Espoo, Finland.
| | - Subhash Tripathi
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland.
| | - Antti Larjo
- Department of Computer Science, Aalto University School of Science, Espoo, Finland.
| | - Henrik Mannerström
- Department of Computer Science, Aalto University School of Science, Espoo, Finland.
| | - Ubaid Ullah
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland.
| | - Riikka Lund
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland.
| | - R David Hawkins
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland. .,Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, 98195, USA. .,Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
| | - Bing Ren
- Ludwig Institute for Cancer Research, La Jolla, CA, 92093, USA. .,Department of Cellular and Molecular Medicine, Institute of Genomic Medicine and Moores Cancer Center, University of California, San Diego, La Jolla, CA, 92093, USA.
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University School of Science, Espoo, Finland.
| | - Riitta Lahesmaa
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland.
| |
Collapse
|
30
|
Mudge JM, Harrow J. Creating reference gene annotation for the mouse C57BL6/J genome assembly. Mamm Genome 2015; 26:366-78. [PMID: 26187010 PMCID: PMC4602055 DOI: 10.1007/s00335-015-9583-x] [Citation(s) in RCA: 168] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 06/18/2015] [Indexed: 12/14/2022]
Abstract
Annotation on the reference genome of the C57BL6/J mouse has been an ongoing project ever since the draft genome was first published. Initially, the principle focus was on the identification of all protein-coding genes, although today the importance of describing long non-coding RNAs, small RNAs, and pseudogenes is recognized. Here, we describe the progress of the GENCODE mouse annotation project, which combines manual annotation from the HAVANA group with Ensembl computational annotation, alongside experimental and in silico validation pipelines from other members of the consortium. We discuss the more recent incorporation of next-generation sequencing datasets into this workflow, including the usage of mass-spectrometry data to potentially identify novel protein-coding genes. Finally, we will outline how the C57BL6/J genebuild can be used to gain insights into the variant sites that distinguish different mouse strains and species.
Collapse
|
31
|
Remington DL. Alleles versus mutations: Understanding the evolution of genetic architecture requires a molecular perspective on allelic origins. Evolution 2015; 69:3025-38. [DOI: 10.1111/evo.12775] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Revised: 07/06/2015] [Accepted: 09/08/2015] [Indexed: 01/02/2023]
Affiliation(s)
- David L. Remington
- Department of Biology; University of North Carolina at Greensboro; Greensboro North Carolina 27402
| |
Collapse
|
32
|
Hu Z, Scott HS, Qin G, Zheng G, Chu X, Xie L, Adelson DL, Oftedal BE, Venugopal P, Babic M, Hahn CN, Zhang B, Wang X, Li N, Wei C. Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics. Sci Rep 2015; 5:10940. [PMID: 26156868 PMCID: PMC4496727 DOI: 10.1038/srep10940] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/05/2015] [Indexed: 01/02/2023] Open
Abstract
Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to known transcripts, the expression of these novel transcripts were highly tissue-specific. Based on these novel transcripts, at least 36 novel proteins were detected from shotgun proteomics data of 41 breast samples. We also showed L1 retrotransposons have a more significant impact on the origin of new transcripts/genes than previously thought. Furthermore, we found that alternative splicing is extraordinarily widespread for genes involved in specific biological functions like protein binding, nucleoside binding, neuron projection, membrane organization and cell adhesion. In the end, the total number of human transcripts with protein-coding potential was estimated to be at least 204,950.
Collapse
Affiliation(s)
- Zhiqiang Hu
- 1] School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China [2] Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Pudong District, Shanghai 201203, China
| | - Hamish S Scott
- 1] Department of Genetics and Molecular Pathology, Centre for Cancer Biology, Frome Road, Adelaide, SA 5000 Australia [2] School of Biological Sciences, University of Adelaide, SA 5005, Australia [3] School of Medicine, University of Adelaide, North Terrace, Adelaide, SA 5000, Australia [4] School of Pharmacy and Medical Sciences, Division of Health Sciences, University of South Australia, SA, Australia [5] ACRF Cancer Genomics Facility, Centre for Cancer Biology, SA Pathology, Frome Road, Adelaide, SA 5000, Australia
| | - Guangrong Qin
- Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Pudong District, Shanghai 201203, China
| | - Guangyong Zheng
- 1] Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Pudong District, Shanghai 201203, China [2] CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China
| | - Xixia Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Lu Xie
- Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Pudong District, Shanghai 201203, China
| | - David L Adelson
- School of Biological Sciences, University of Adelaide, SA 5005, Australia
| | - Bergithe E Oftedal
- 1] Department of Genetics and Molecular Pathology, Centre for Cancer Biology, Frome Road, Adelaide, SA 5000 Australia [2] Department of Biomedical Informatics (DBMI), Vanderbilt University Medical Center (VUMC), 2525 West End Ave, Suite 800, Nashville, TN 37203, USA
| | - Parvathy Venugopal
- 1] Department of Genetics and Molecular Pathology, Centre for Cancer Biology, Frome Road, Adelaide, SA 5000 Australia [2] School of Biological Sciences, University of Adelaide, SA 5005, Australia
| | - Milena Babic
- Department of Genetics and Molecular Pathology, Centre for Cancer Biology, Frome Road, Adelaide, SA 5000 Australia
| | - Christopher N Hahn
- 1] Department of Genetics and Molecular Pathology, Centre for Cancer Biology, Frome Road, Adelaide, SA 5000 Australia [2] School of Biological Sciences, University of Adelaide, SA 5005, Australia [3] School of Medicine, University of Adelaide, North Terrace, Adelaide, SA 5000, Australia
| | - Bing Zhang
- Department of Biomedical Informatics (DBMI), Vanderbilt University Medical Center (VUMC), 2525 West End Ave, Suite 800, Nashville, TN 37203, USA
| | - Xiaojing Wang
- Department of Biomedical Informatics (DBMI), Vanderbilt University Medical Center (VUMC), 2525 West End Ave, Suite 800, Nashville, TN 37203, USA
| | - Nan Li
- Institute of Immunology, Second Military Medical University, 800 Xiangyin Road, Shanghai 200433, China
| | - Chaochun Wei
- 1] School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China [2] Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Pudong District, Shanghai 201203, China
| |
Collapse
|
33
|
St Laurent G, Wahlestedt C, Kapranov P. The Landscape of long noncoding RNA classification. Trends Genet 2015; 31:239-51. [PMID: 25869999 DOI: 10.1016/j.tig.2015.03.007] [Citation(s) in RCA: 810] [Impact Index Per Article: 90.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Revised: 03/09/2015] [Accepted: 03/12/2015] [Indexed: 12/12/2022]
Abstract
Advances in the depth and quality of transcriptome sequencing have revealed many new classes of long noncoding RNAs (lncRNAs). lncRNA classification has mushroomed to accommodate these new findings, even though the real dimensions and complexity of the noncoding transcriptome remain unknown. Although evidence of functionality of specific lncRNAs continues to accumulate, conflicting, confusing, and overlapping terminology has fostered ambiguity and lack of clarity in the field in general. The lack of fundamental conceptual unambiguous classification framework results in a number of challenges in the annotation and interpretation of noncoding transcriptome data. It also might undermine integration of the new genomic methods and datasets in an effort to unravel the function of lncRNA. Here, we review existing lncRNA classifications, nomenclature, and terminology. Then, we describe the conceptual guidelines that have emerged for their classification and functional annotation based on expanding and more comprehensive use of large systems biology-based datasets.
Collapse
Affiliation(s)
- Georges St Laurent
- St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801 USA; Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, 185 Meeting Street, Providence, RI 02912, USA
| | - Claes Wahlestedt
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1501 NW 10th Ave, Miami, FL 33136 USA.
| | - Philipp Kapranov
- Institute of Genomics, School of Biomedical Sciences, Huaqiao Univerisity, 668 Jimei Road, Xiamen, China 361021; St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801 USA.
| |
Collapse
|
34
|
Andersson L, Archibald AL, Bottema CD, Brauning R, Burgess SC, Burt DW, Casas E, Cheng HH, Clarke L, Couldrey C, Dalrymple BP, Elsik CG, Foissac S, Giuffra E, Groenen MA, Hayes BJ, Huang LS, Khatib H, Kijas JW, Kim H, Lunney JK, McCarthy FM, McEwan JC, Moore S, Nanduri B, Notredame C, Palti Y, Plastow GS, Reecy JM, Rohrer GA, Sarropoulou E, Schmidt CJ, Silverstein J, Tellam RL, Tixier-Boichard M, Tosser-Klopp G, Tuggle CK, Vilkki J, White SN, Zhao S, Zhou H. Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biol 2015; 16:57. [PMID: 25854118 PMCID: PMC4373242 DOI: 10.1186/s13059-015-0622-4] [Citation(s) in RCA: 203] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
We describe the organization of a nascent international effort, the Functional Annotation of Animal Genomes (FAANG) project, whose aim is to produce comprehensive maps of functional elements in the genomes of domesticated animal species.
Collapse
|
35
|
Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nat Methods 2015; 11:1114-25. [PMID: 25357241 DOI: 10.1038/nmeth.3144] [Citation(s) in RCA: 505] [Impact Index Per Article: 56.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 09/22/2014] [Indexed: 12/19/2022]
Abstract
Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry-based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models. In recent years, owing to the emergence of new sequencing technologies such as RNA-seq and dramatic improvements in the depth and throughput of mass spectrometry-based proteomics, the pace of proteogenomic research has greatly accelerated. Here I review the current state of proteogenomic methods and applications, including computational strategies for building and using customized protein sequence databases. I also draw attention to the challenge of false positive identifications in proteogenomics and provide guidelines for analyzing the data and reporting the results of proteogenomic studies.
Collapse
Affiliation(s)
- Alexey I Nesvizhskii
- 1] Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
36
|
Abstract
Systems cell biology melds high-throughput experimentation with quantitative analysis and modeling to understand many critical processes that contribute to cellular organization and dynamics. Recently, there have been several advances in technology and in the application of modeling approaches that enable the exploration of the dynamic properties of cells. Merging technology and computation offers an opportunity to objectively address unsolved cellular mechanisms, and has revealed emergent properties and helped to gain a more comprehensive and fundamental understanding of cell biology.
Collapse
Affiliation(s)
- Fred D Mast
- Seattle Biomedical Research Institute, Seattle, WA 98109 Institute for Systems Biology, Seattle, WA 98109
| | - Alexander V Ratushny
- Seattle Biomedical Research Institute, Seattle, WA 98109 Institute for Systems Biology, Seattle, WA 98109
| | - John D Aitchison
- Seattle Biomedical Research Institute, Seattle, WA 98109 Institute for Systems Biology, Seattle, WA 98109
| |
Collapse
|
37
|
Gusev A, Lee S, Trynka G, Finucane H, Vilhjálmsson B, Xu H, Zang C, Ripke S, Bulik-Sullivan B, Stahl E, Kähler AK, Hultman CM, Purcell SM, McCarroll SA, Daly M, Pasaniuc B, Sullivan PF, Neale BM, Wray NR, Raychaudhuri S, Price AL, Ripke S, Neale B, Corvin A, Walters J, Farh KH, Holmans P, Lee P, Bulik-Sullivan B, Collier D, Huang H, Pers T, Agartz I, Agerbo E, Albus M, Alexander M, Amin F, Bacanu S, Begemann M, Belliveau R, Bene J, Bergen S, Bevilacqua E, Bigdeli T, Black D, Børglum A, Bruggeman R, Buccola N, Buckner R, Byerley W, Cahn W, Cai G, Campion D, Cantor R, Carr V, Carrera N, Catts S, Chambert K, Chan R, Chen R, Chen E, Cheng W, Cheung E, Chong S, Cloninger C, Cohen D, Cohen N, Cormican P, Craddock N, Crowley J, Curtis D, Davidson M, Davis K, Degenhardt F, Del Favero J, DeLisi L, Demontis D, Dikeos D, Dinan T, Djurovic S, Donohoe G, Drapeau E, Duan J, Dudbridge F, Durmishi N, Eichhammer P, Eriksson J, Escott-Price V, Essioux L, Fanous A, Farrell M, Frank J, Franke L, Freedman R, Freimer N, Friedl M, Friedman J, Fromer M, Genovese G, Georgieva L, Gershon E, Giegling I, Giusti-Rodrguez P, Godard S, Goldstein J, Golimbet V, Gopal S, Gratten J, Grove J, de Haan L, Hammer C, Hamshere M, Hansen M, Hansen T, Haroutunian V, Hartmann A, Henskens F, Herms S, Hirschhorn J, Hoffmann P, Hofman A, Hollegaard M, Hougaard D, Ikeda M, Joa I, Julià A, Kahn R, Kalaydjieva L, Karachanak-Yankova S, Karjalainen J, Kavanagh D, Keller M, Kelly B, Kennedy J, Khrunin A, Kim Y, Klovins J, Knowles J, Konte B, Kucinskas V, Kucinskiene Z, Kuzelova-Ptackova H, Kähler A, Laurent C, Keong J, Lee S, Legge S, Lerer B, Li M, Li T, Liang KY, Lieberman J, Limborska S, Loughland C, Lubinski J, Lnnqvist J, Macek M, Magnusson P, Maher B, Maier W, Mallet J, Marsal S, Mattheisen M, Mattingsdal M, McCarley R, McDonald C, McIntosh A, Meier S, Meijer C, Melegh B, Melle I, Mesholam-Gately R, Metspalu A, Michie P, Milani L, Milanova V, Mokrab Y, Morris D, Mors O, Mortensen P, Murphy K, Murray R, Myin-Germeys I, Mller-Myhsok B, Nelis M, Nenadic I, Nertney D, Nestadt G, Nicodemus K, Nikitina-Zake L, Nisenbaum L, Nordin A, O’Callaghan E, O’Dushlaine C, O’Neill F, Oh SY, Olincy A, Olsen L, Van Os J, Pantelis C, Papadimitriou G, Papiol S, Parkhomenko E, Pato M, Paunio T, Pejovic-Milovancevic M, Perkins D, Pietilinen O, Pimm J, Pocklington A, Powell J, Price A, Pulver A, Purcell S, Quested D, Rasmussen H, Reichenberg A, Reimers M, Richards A, Roffman J, Roussos P, Ruderfer D, Salomaa V, Sanders A, Schall U, Schubert C, Schulze T, Schwab S, Scolnick E, Scott R, Seidman L, Shi J, Sigurdsson E, Silagadze T, Silverman J, Sim K, Slominsky P, Smoller J, So HC, Spencer C, Stahl E, Stefansson H, Steinberg S, Stogmann E, Straub R, Strengman E, Strohmaier J, Stroup T, Subramaniam M, Suvisaari J, Svrakic D, Szatkiewicz J, Sderman E, Thirumalai S, Toncheva D, Tooney P, Tosato S, Veijola J, Waddington J, Walsh D, Wang D, Wang Q, Webb B, Weiser M, Wildenauer D, Williams N, Williams S, Witt S, Wolen A, Wong E, Wormley B, Wu J, Xi H, Zai C, Zheng X, Zimprich F, Wray N, Stefansson K, Visscher P, Adolfsson R, Andreassen O, Blackwood D, Bramon E, Buxbaum J, Brglum A, Cichon S, Darvasi A, Domenici E, Ehrenreich H, Esko T, Gejman P, Gill M, Gurling H, Hultman C, Iwata N, Jablensky A, Jönsson E, Kendler K, Kirov G, Knight J, Lencz T, Levinson D, Li Q, Liu J, Malhotra A, McCarroll S, McQuillin A, Moran J, Mortensen P, Mowry B, Nthen M, Ophoff R, Owen M, Palotie A, Pato C, Petryshen T, Posthuma D, Rietschel M, Riley B, Rujescu D, Sham P, Sklar P, St. Clair D, Weinberger D, Wendland J, Werge T, Daly M, Sullivan P, O’Donovan M, Ripke S, O’Dushlaine C, Chambert K, Moran JL, Kähler AK, Akterin S, Bergen S, Magnusson PK, Neale BM, Ruderfer D, Scolnick E, Purcell S, McCarroll S, Sklar P, Hultman CM, Sullivan PF. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am J Hum Genet 2014; 95:535-52. [PMID: 25439723 PMCID: PMC4225595 DOI: 10.1016/j.ajhg.2014.10.004] [Citation(s) in RCA: 411] [Impact Index Per Article: 41.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 10/02/2014] [Indexed: 10/25/2022] Open
Abstract
Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs (hg(2)) across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of hg(2) from imputed SNPs (5.1× enrichment; p = 3.7 × 10(-17)) and 38% (SE = 4%) of hg(2) from genotyped SNPs (1.6× enrichment, p = 1.0 × 10(-4)). Further enrichment was observed at enhancer DHSs and cell-type-specific DHSs. In contrast, coding variants, which span 1% of the genome, explained <10% of hg(2) despite having the highest enrichment. We replicated these findings but found no significant contribution from rare coding variants in independent schizophrenia cohorts genotyped on GWAS and exome chips. Our results highlight the value of analyzing components of heritability to unravel the functional architecture of common disease.
Collapse
|
38
|
Harrisson KA, Pavlova A, Telonis-Scott M, Sunnucks P. Using genomics to characterize evolutionary potential for conservation of wild populations. Evol Appl 2014; 7:1008-25. [PMID: 25553064 PMCID: PMC4231592 DOI: 10.1111/eva.12149] [Citation(s) in RCA: 157] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 02/10/2014] [Indexed: 12/16/2022] Open
Abstract
Genomics promises exciting advances towards the important conservation goal of maximizing evolutionary potential, notwithstanding associated challenges. Here, we explore some of the complexity of adaptation genetics and discuss the strengths and limitations of genomics as a tool for characterizing evolutionary potential in the context of conservation management. Many traits are polygenic and can be strongly influenced by minor differences in regulatory networks and by epigenetic variation not visible in DNA sequence. Much of this critical complexity is difficult to detect using methods commonly used to identify adaptive variation, and this needs appropriate consideration when planning genomic screens, and when basing management decisions on genomic data. When the genomic basis of adaptation and future threats are well understood, it may be appropriate to focus management on particular adaptive traits. For more typical conservations scenarios, we argue that screening genome-wide variation should be a sensible approach that may provide a generalized measure of evolutionary potential that accounts for the contributions of small-effect loci and cryptic variation and is robust to uncertainty about future change and required adaptive response(s). The best conservation outcomes should be achieved when genomic estimates of evolutionary potential are used within an adaptive management framework.
Collapse
Affiliation(s)
| | - Alexandra Pavlova
- School of Biological Sciences, Monash UniversityMelbourne, Vic., Australia
| | | | - Paul Sunnucks
- School of Biological Sciences, Monash UniversityMelbourne, Vic., Australia
| |
Collapse
|
39
|
Abstract
Identifying sequence variants that play a mechanistic role in human disease and other phenotypes is a fundamental goal in human genetics and will be important in translating the results of variation studies. Experimental validation to confirm that a variant causes the biochemical changes responsible for a given disease or phenotype is considered the gold standard, but this cannot currently be applied to the 3 million or so variants expected in an individual genome. This has prompted the development of a wide variety of computational approaches that use several different sources of information to identify functional variation. Here, we review and assess the limitations of computational techniques for categorizing variants according to functional classes, prioritizing variants for experimental follow-up and generating hypotheses about the possible molecular mechanisms to inform downstream experiments. We discuss the main current bioinformatics approaches to identifying functional variation, including widely used algorithms for coding variation such as SIFT and PolyPhen and also novel techniques for interpreting variation across the genome.
Collapse
Affiliation(s)
- Graham RS Ritchie
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD UK
- />Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA UK
| | - Paul Flicek
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD UK
- />Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA UK
| |
Collapse
|
40
|
Brosius J. The persistent contributions of RNA to eukaryotic gen(om)e architecture and cellular function. Cold Spring Harb Perspect Biol 2014; 6:a016089. [PMID: 25081515 DOI: 10.1101/cshperspect.a016089] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Currently, the best scenario for earliest forms of life is based on RNA molecules as they have the proven ability to catalyze enzymatic reactions and harbor genetic information. Evolutionary principles valid today become apparent in such models already. Furthermore, many features of eukaryotic genome architecture might have their origins in an RNA or RNA/protein (RNP) world, including the onset of a further transition, when DNA replaced RNA as the genetic bookkeeper of the cell. Chromosome maintenance, splicing, and regulatory function via RNA may be deeply rooted in the RNA/RNP worlds. Mostly in eukaryotes, conversion from RNA to DNA is still ongoing, which greatly impacts the plasticity of extant genomes. Raw material for novel genes encoding protein or RNA, or parts of genes including regulatory elements that selection can act on, continues to enter the evolutionary lottery.
Collapse
Affiliation(s)
- Jürgen Brosius
- Institute of Experimental Pathology (ZMBE), University of Münster, D-48149 Münster, Germany
| |
Collapse
|
41
|
Cole C, Kroboth K, Schurch NJ, Sandilands A, Sherstnev A, O'Regan GM, Watson RM, McLean WHI, Barton GJ, Irvine AD, Brown SJ. Filaggrin-stratified transcriptomic analysis of pediatric skin identifies mechanistic pathways in patients with atopic dermatitis. J Allergy Clin Immunol 2014; 134:82-91. [PMID: 24880632 PMCID: PMC4090750 DOI: 10.1016/j.jaci.2014.04.021] [Citation(s) in RCA: 100] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Revised: 03/28/2014] [Accepted: 04/24/2014] [Indexed: 02/07/2023]
Abstract
BACKGROUND Atopic dermatitis (AD; eczema) is characterized by a widespread abnormality in cutaneous barrier function and propensity to inflammation. Filaggrin is a multifunctional protein and plays a key role in skin barrier formation. Loss-of-function mutations in the gene encoding filaggrin (FLG) are a highly significant risk factor for atopic disease, but the molecular mechanisms leading to dermatitis remain unclear. OBJECTIVE We sought to interrogate tissue-specific variations in the expressed genome in the skin of children with AD and to investigate underlying pathomechanisms in atopic skin. METHODS We applied single-molecule direct RNA sequencing to analyze the whole transcriptome using minimal tissue samples. Uninvolved skin biopsy specimens from 26 pediatric patients with AD were compared with site-matched samples from 10 nonatopic teenage control subjects. Cases and control subjects were screened for FLG genotype to stratify the data set. RESULTS Two thousand four hundred thirty differentially expressed genes (false discovery rate, P < .05) were identified, of which 211 were significantly upregulated and 490 downregulated by greater than 2-fold. Gene ontology terms for "extracellular space" and "defense response" were enriched, whereas "lipid metabolic processes" were downregulated. The subset of FLG wild-type cases showed dysregulation of genes involved with lipid metabolism, whereas filaggrin haploinsufficiency affected global gene expression and was characterized by a type 1 interferon-mediated stress response. CONCLUSION These analyses demonstrate the importance of extracellular space and lipid metabolism in atopic skin pathology independent of FLG genotype, whereas an aberrant defense response is seen in subjects with FLG mutations. Genotype stratification of the large data set has facilitated functional interpretation and might guide future therapy development.
Collapse
Affiliation(s)
- Christian Cole
- Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee, United Kingdom
| | - Karin Kroboth
- Centre for Dermatology and Genetic Medicine, Division of Molecular Medicine, Colleges of Life Sciences and Medicine, Dentistry & Nursing, University of Dundee, Dundee, United Kingdom
| | - Nicholas J Schurch
- Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee, United Kingdom
| | - Aileen Sandilands
- Centre for Dermatology and Genetic Medicine, Division of Molecular Medicine, Colleges of Life Sciences and Medicine, Dentistry & Nursing, University of Dundee, Dundee, United Kingdom
| | - Alexander Sherstnev
- Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee, United Kingdom
| | - Grainne M O'Regan
- Department of Dermatology, Our Lady's Children's Hospital, Crumlin, Dublin, Ireland
| | - Rosemarie M Watson
- Department of Dermatology, Our Lady's Children's Hospital, Crumlin, Dublin, Ireland
| | - W H Irwin McLean
- Centre for Dermatology and Genetic Medicine, Division of Molecular Medicine, Colleges of Life Sciences and Medicine, Dentistry & Nursing, University of Dundee, Dundee, United Kingdom
| | - Geoffrey J Barton
- Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee, United Kingdom.
| | - Alan D Irvine
- Department of Dermatology, Our Lady's Children's Hospital, Crumlin, Dublin, Ireland; National Children's Research Centre, Our Lady's Children's Hospital, Crumlin, Dublin, Ireland; Clinical Medicine, Trinity College Dublin, Dublin, Ireland.
| | - Sara J Brown
- Centre for Dermatology and Genetic Medicine, Division of Molecular Medicine, Colleges of Life Sciences and Medicine, Dentistry & Nursing, University of Dundee, Dundee, United Kingdom; National Children's Research Centre, Our Lady's Children's Hospital, Crumlin, Dublin, Ireland.
| |
Collapse
|
42
|
St Laurent G, Vyatkin Y, Kapranov P. Dark matter RNA illuminates the puzzle of genome-wide association studies. BMC Med 2014; 12:97. [PMID: 24924000 PMCID: PMC4054906 DOI: 10.1186/1741-7015-12-97] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Accepted: 05/22/2014] [Indexed: 12/12/2022] Open
Abstract
In the past decade, numerous studies have made connections between sequence variants in human genomes and predisposition to complex diseases. However, most of these variants lie outside of the charted regions of the human genome whose function we understand; that is, the sequences that encode proteins. Consequently, the general concept of a mechanism that translates these variants into predisposition to diseases has been lacking, potentially calling into question the validity of these studies. Here we make a connection between the growing class of apparently functional RNAs that do not encode proteins and whose function we do not yet understand (the so-called 'dark matter' RNAs) and the disease-associated variants. We review advances made in a different genomic mapping effort - unbiased profiling of all RNA transcribed from the human genome - and provide arguments that the disease-associated variants exert their effects via perturbation of regulatory properties of non-coding RNAs existing in mammalian cells.
Collapse
Affiliation(s)
| | | | - Philipp Kapranov
- St, Laurent Institute, 317 New Boston St, Suite 201, Woburn, MA 01801, USA.
| |
Collapse
|
43
|
Elliott DJ. Illuminating the Transcriptome through the Genome. Genes (Basel) 2014; 5:235-53. [PMID: 24705295 PMCID: PMC3978521 DOI: 10.3390/genes5010235] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Revised: 03/03/2014] [Accepted: 03/05/2014] [Indexed: 02/01/2023] Open
Abstract
Sequencing the human genome was a huge milestone in genetic research that revealed almost the total DNA sequence required to create a human being. However, in order to function, the DNA genome needs to be expressed as an RNA transcriptome. This article reviews how knowledge of genome sequence information has led to fundamental discoveries in how the transcriptome is processed, with a focus on new system-wide insights into how pre-mRNAs that are encoded by split genes in the genome are rearranged by splicing into functional mRNAs. These advances have been made possible by the development of new post-genome technologies to probe splicing patterns. Transcriptome-wide approaches have characterised a "splicing code" that is embedded within and has a significant role in deciphering the genome, and is deciphered by RNA binding proteins. These analyses have also found that most human genes encode multiple mRNA isoforms, and in some cases proteins, leading in turn to a re-assessment of what exactly a gene is. Analysis of the transcriptome has given insights into how the genome is packaged and transcribed, and is helping to explain important aspects of genome evolution.
Collapse
Affiliation(s)
- David J Elliott
- Institute of Genetic Medicine, Newcastle University, Newcastle, NE1 3BZ, UK.
| |
Collapse
|