1
|
Rodriguez JM, Abascal F, Cerdán-Vélez D, Gómez LM, Vázquez J, Tress ML. Evidence for widespread translation of 5' untranslated regions. Nucleic Acids Res 2024:gkae571. [PMID: 38953162 DOI: 10.1093/nar/gkae571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 06/07/2024] [Accepted: 06/19/2024] [Indexed: 07/03/2024] Open
Abstract
Ribosome profiling experiments support the translation of a range of novel human open reading frames. By contrast, most peptides from large-scale proteomics experiments derive from just one source, 5' untranslated regions. Across the human genome we find evidence for 192 translated upstream regions, most of which would produce protein isoforms with extended N-terminal ends. Almost all of these N-terminal extensions are from highly abundant genes, which suggests that the novel regions we detect are just the tip of the iceberg. These upstream regions have characteristics that are not typical of coding exons. Their GC-content is remarkably high, even higher than 5' regions in other genes, and a large majority have non-canonical start codons. Although some novel upstream regions have cross-species conservation - five have orthologues in invertebrates for example - the reading frames of two thirds are not conserved beyond simians. These non-conserved regions also have no evidence of purifying selection, which suggests that much of this translation is not functional. In addition, non-conserved upstream regions have significantly more peptides in cancer cell lines than would be expected, a strong indication that an aberrant or noisy translation initiation process may play an important role in translation from upstream regions.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA. UK
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Jesús Vázquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| |
Collapse
|
2
|
Carrion SA, Michal JJ, Jiang Z. Alternative Transcripts Diversify Genome Function for Phenome Relevance to Health and Diseases. Genes (Basel) 2023; 14:2051. [PMID: 38002994 PMCID: PMC10671453 DOI: 10.3390/genes14112051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/26/2023] Open
Abstract
Manipulation using alternative exon splicing (AES), alternative transcription start (ATS), and alternative polyadenylation (APA) sites are key to transcript diversity underlying health and disease. All three are pervasive in organisms, present in at least 50% of human protein-coding genes. In fact, ATS and APA site use has the highest impact on protein identity, with their ability to alter which first and last exons are utilized as well as impacting stability and translation efficiency. These RNA variants have been shown to be highly specific, both in tissue type and stage, with demonstrated importance to cell proliferation, differentiation and the transition from fetal to adult cells. While alternative exon splicing has a limited effect on protein identity, its ubiquity highlights the importance of these minor alterations, which can alter other features such as localization. The three processes are also highly interwoven, with overlapping, complementary, and competing factors, RNA polymerase II and its CTD (C-terminal domain) chief among them. Their role in development means dysregulation leads to a wide variety of disorders and cancers, with some forms of disease disproportionately affected by specific mechanisms (AES, ATS, or APA). Challenges associated with the genome-wide profiling of RNA variants and their potential solutions are also discussed in this review.
Collapse
Affiliation(s)
| | | | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA 99164-7620, USA; (S.A.C.); (J.J.M.)
| |
Collapse
|
3
|
Barbagallo C, Stella M, Ferrara C, Caponnetto A, Battaglia R, Barbagallo D, Di Pietro C, Ragusa M. RNA-RNA competitive interactions: a molecular civil war ruling cell physiology and diseases. EXPLORATION OF MEDICINE 2023:504-540. [DOI: 10.37349/emed.2023.00159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/02/2023] [Indexed: 09/02/2023] Open
Abstract
The idea that proteins are the main determining factors in the functioning of cells and organisms, and their dysfunctions are the first cause of pathologies, has been predominant in biology and biomedicine until recently. This protein-centered view was too simplistic and failed to explain the physiological and pathological complexity of the cell. About 80% of the human genome is dynamically and pervasively transcribed, mostly as non-protein-coding RNAs (ncRNAs), which competitively interact with each other and with coding RNAs generating a complex RNA network regulating RNA processing, stability, and translation and, accordingly, fine-tuning the gene expression of the cells. Qualitative and quantitative dysregulations of RNA-RNA interaction networks are strongly involved in the onset and progression of many pathologies, including cancers and degenerative diseases. This review will summarize the RNA species involved in the competitive endogenous RNA network, their mechanisms of action, and involvement in pathological phenotypes. Moreover, it will give an overview of the most advanced experimental and computational methods to dissect and rebuild RNA networks.
Collapse
Affiliation(s)
- Cristina Barbagallo
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Michele Stella
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | | | - Angela Caponnetto
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Rosalia Battaglia
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Davide Barbagallo
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Cinzia Di Pietro
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Marco Ragusa
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| |
Collapse
|
4
|
Abstract
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a small set of protein-coding mRNAs play a disproportionately large role in defining phenotypes. Due to their sequence conservation, orthology can be established, making it possible to define the universal catalog of eukaryotic protein-coding genes. This catalog should substantially contribute to uncovering the genomic events underlying the emergence of eukaryotic phenotypes. This piece briefly reviews the basics of protein-coding gene prediction, discusses challenges in finalizing annotation of the human genome, and proposes strategies for producing annotations across the eukaryotic Tree of Life. This lays the groundwork for obtaining the catalog of all genes-the Earth's code of life.
Collapse
Affiliation(s)
- Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Catalonia
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia
| |
Collapse
|
5
|
Longo G. From information to physics to biology. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2023; 177:202-206. [PMID: 36572284 DOI: 10.1016/j.pbiomolbio.2022.12.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Revised: 12/15/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022]
Abstract
Commentary to "The gene: An appraisal" by Keith Baverstock. PBMB, Volume 164, September 2021, Pages 46-62. NOTE: this short and informal commentary constructively criticizes the very interesting approach in the paper by a brief survey of the work that a few of us develop since several years. I will first recall the very pertinent critique of the Modern Synthesis and the genocentric approach presented in the paper, then suggest a methodological (and theoretical) critique of the approach by K. Baverstock and hint to alternatives paths that are compatible, but "extend" the physics for biology presented by the author. The purposes and the space allowed force a limited number of references and technical details. These may be found in the references contained in the few papers quoted below that are not the most nor the only representative contributions to the that work, but are inserted as a source of references or as synthetic presentations of our views.
Collapse
|
6
|
Zhou Z, Cao Q, Diao Y, Wang Y, Long L, Wang S, Li P. Non-coding RNA-related antitumor mechanisms of marine-derived agents. Front Pharmacol 2022; 13:1053556. [PMID: 36532760 PMCID: PMC9752855 DOI: 10.3389/fphar.2022.1053556] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 11/21/2022] [Indexed: 09/26/2023] Open
Abstract
In the last two decades, natural active substances have attracted great attention in developing new antitumor drugs, especially in the marine environment. A series of marine-derived compounds or derivatives with potential antitumor effects have been discovered and developed, but their mechanisms of action are not well understood. Emerging studies have found that several tumor-related signaling pathways and molecules are involved in the antitumor mechanisms of marine-derived agents, including noncoding RNAs (ncRNAs). In this review, we provide an update on the regulation of marine-derived agents associated with ncRNAs on tumor cell proliferation, apoptosis, cell cycle, invasion, migration, drug sensitivity and resistance. Herein, we also describe recent advances in marine food-derived ncRNAs as antitumor agents that modulate cross-species gene expression. A better understanding of the antitumor mechanisms of marine-derived agents mediated, regulated, or sourced by ncRNAs will provide new biomarkers or targets for potential antitumor drugs from preclinical discovery and development to clinical application.
Collapse
Affiliation(s)
- Zhixia Zhou
- Institute for Translational Medicine, The Affiliated Hospital of Qingdao University, College of Medicine, Qingdao University, Qingdao, China
| | - Qianqian Cao
- Qingdao Central Hospital, Central Hospital Affiliated to Qingdao University, Qingdao, China
| | - Yujing Diao
- Qingdao Central Hospital, Central Hospital Affiliated to Qingdao University, Qingdao, China
| | - Yin Wang
- Institute for Translational Medicine, The Affiliated Hospital of Qingdao University, College of Medicine, Qingdao University, Qingdao, China
| | - Linhai Long
- Institute for Translational Medicine, The Affiliated Hospital of Qingdao University, College of Medicine, Qingdao University, Qingdao, China
| | - Shoushi Wang
- Qingdao Central Hospital, Central Hospital Affiliated to Qingdao University, Qingdao, China
| | - Peifeng Li
- Institute for Translational Medicine, The Affiliated Hospital of Qingdao University, College of Medicine, Qingdao University, Qingdao, China
| |
Collapse
|
7
|
Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:ijms232012272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
|
8
|
The Road Traveled and Journey Ahead for the Genetics and Genomics of Tinnitus. Mol Diagn Ther 2022; 26:129-136. [PMID: 35167110 PMCID: PMC8942952 DOI: 10.1007/s40291-022-00578-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2022] [Indexed: 10/29/2022]
Abstract
The feasibility to unravel genetic and genomic signatures for disorders affecting the auditory system has accelerated since arriving in the post-genomics era roughly 20 years ago. Newly emerging studies have provided initial landmarks signaling heritability and thus, a genetic link, to severe tinnitus. Tinnitus, the phantom perception of ringing in the ears, is experienced by at least 15% of the adult population and can be extremely disabling. Despite its ubiquity, there is no cure for tinnitus and modalities offering relief are often of limited success. Because tinnitus is frequently reported in patients with acquired conductive or sensorineural hearing impairment, it has been widely accepted that tinnitus is secondary to and a symptom arising from hearing impairment. However, tinnitus has also been identified in the absence of auditory dysfunction and in young individuals, resulting in a debate about its origins. Genetics studies have identified severe tinnitus as a complex disorder arising from gene and environment interactions, refining its classification as a neurological disorder and, in at least a subset of patients, it appears not as a symptom of another health issue. This current opinion summarizes several recent studies that have challenged a long-accepted dogma and postulates how this information could eventually be used in the future to help patients. It is with great hope that this knowledge opens translational paths to provide relief for the many who suffer from the burden of tinnitus on a daily basis.
Collapse
|
9
|
Abstract
In Eukarya, immature mRNA transcripts (pre-mRNA) often contain coding sequences, or exons, interleaved by non-coding sequences, or introns. Introns are removed upon splicing, and further regulation of the retained exons leads to alternatively spliced mRNA. The splicing reaction requires the stepwise assembly of the spliceosome, a macromolecular machine composed of small nuclear ribonucleoproteins (snRNPs). This review focuses on the early stage of spliceosome assembly, when U1 snRNP defines each intron 5’-splice site (5ʹss) in the pre-mRNA. We first introduce the splicing reaction and the impact of alternative splicing on gene expression regulation. Thereafter, we extensively discuss splicing descriptors that influence the 5ʹss selection by U1 snRNP, such as sequence determinants, and interactions mediated by U1-specific proteins or U1 small nuclear RNA (U1 snRNA). We also include examples of diseases that affect the 5ʹss selection by U1 snRNP, and discuss recent therapeutic advances that manipulate U1 snRNP 5ʹss selectivity with antisense oligonucleotides and small-molecule splicing switches.
Collapse
Affiliation(s)
- Florian Malard
- Inserm U1212, CNRS UMR5320, ARNA Laboratory, University of Bordeaux, Bordeaux Cedex, France
| | - Cameron D Mackereth
- Inserm U1212, CNRS UMR5320, ARNA Laboratory, University of Bordeaux, Bordeaux Cedex, France
| | - Sébastien Campagne
- Inserm U1212, CNRS UMR5320, ARNA Laboratory, University of Bordeaux, Bordeaux Cedex, France
| |
Collapse
|
10
|
Prensner JR, Enache OM, Luria V, Krug K, Clauser KR, Dempster JM, Karger A, Wang L, Stumbraite K, Wang VM, Botta G, Lyons NJ, Goodale A, Kalani Z, Fritchman B, Brown A, Alan D, Green T, Yang X, Jaffe JD, Roth JA, Piccioni F, Kirschner MW, Ji Z, Root DE, Golub TR. Noncanonical open reading frames encode functional proteins essential for cancer cell survival. Nat Biotechnol 2021; 39:697-704. [PMID: 33510483 PMCID: PMC8195866 DOI: 10.1038/s41587-020-00806-2] [Citation(s) in RCA: 76] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 12/16/2020] [Indexed: 01/30/2023]
Abstract
Although genomic analyses predict many noncanonical open reading frames (ORFs) in the human genome, it is unclear whether they encode biologically active proteins. Here we experimentally interrogated 553 candidates selected from noncanonical ORF datasets. Of these, 57 induced viability defects when knocked out in human cancer cell lines. Following ectopic expression, 257 showed evidence of protein expression and 401 induced gene expression changes. Clustered regularly interspaced short palindromic repeat (CRISPR) tiling and start codon mutagenesis indicated that their biological effects required translation as opposed to RNA-mediated effects. We found that one of these ORFs, G029442-renamed glycine-rich extracellular protein-1 (GREP1)-encodes a secreted protein highly expressed in breast cancer, and its knockout in 263 cancer cell lines showed preferential essentiality in breast cancer-derived lines. The secretome of GREP1-expressing cells has an increased abundance of the oncogenic cytokine GDF15, and GDF15 supplementation mitigated the growth-inhibitory effect of GREP1 knockout. Our experiments suggest that noncanonical ORFs can express biologically active proteins that are potential therapeutic targets.
Collapse
Affiliation(s)
- John R. Prensner
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215,Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115
| | - Oana M. Enache
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Victor Luria
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Karsten Krug
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Karl R. Clauser
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Amir Karger
- IT-Research Computing, Harvard Medical School, Boston, MA, USA, 02115
| | - Li Wang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Vickie M. Wang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Ginevra Botta
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Amy Goodale
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Zohra Kalani
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Adam Brown
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Douglas Alan
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Thomas Green
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Xiaoping Yang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Jacob D. Jaffe
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Present address: Inzen Therapeutics, Cambridge, MA, 02139, USA
| | | | - Federica Piccioni
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Present address: Merck Research Laboratories, Boston, MA, 02115, USA
| | - Marc W. Kirschner
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Zhe Ji
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611,Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60628
| | - David E. Root
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Todd R. Golub
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215,Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115,Corresponding author: Address correspondence to: Todd R. Golub, MD, Chief Scientific Officer, Broad Institute of Harvard and MIT, Room 4013, 415 Main Street, Cambridge, MA, 02142, , Phone: 617-714-7050
| |
Collapse
|
11
|
Brain Cytoplasmic RNAs in Neurons: From Biosynthesis to Function. Biomolecules 2020; 10:biom10020313. [PMID: 32079202 PMCID: PMC7072442 DOI: 10.3390/biom10020313] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 02/13/2020] [Accepted: 02/13/2020] [Indexed: 01/10/2023] Open
Abstract
Flexibility in signal transmission is essential for high-level brain function. This flexibility is achieved through strict spatial and temporal control of gene expression in neurons. Given the key regulatory roles of a variety of noncoding RNAs (ncRNAs) in neurons, studying neuron-specific ncRNAs provides an important basis for understanding molecular principles of brain function. This approach will have wide use in understanding the pathogenesis of brain diseases and in the development of therapeutic agents in the future. Brain cytoplasmic RNAs (BC RNAs) are a leading paradigm for research on neuronal ncRNAs. Since the first confirmation of brain-specific expression of BC RNAs in 1982, their investigation has been an area of active research. In this review, we summarize key studies on the characteristics and functions of BC RNAs in neurons.
Collapse
|
12
|
Hatje K, Mühlhausen S, Simm D, Kollmar M. The Protein-Coding Human Genome: Annotating High-Hanging Fruits. Bioessays 2019; 41:e1900066. [PMID: 31544971 DOI: 10.1002/bies.201900066] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 08/07/2019] [Indexed: 12/19/2022]
Abstract
The major transcript variants of human protein-coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes-they can be protein-coding, noncoding, or pseudogenes-and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro-exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re-assignments. The emerging human pan-genome is necessitating distinctive annotations incorporating differences between individuals and between populations.
Collapse
Affiliation(s)
- Klas Hatje
- Roche Pharmaceutical Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstr. 124, 4070, Basel, Switzerland
| | - Stefanie Mühlhausen
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| | - Dominic Simm
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany.,Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Goldschmidtstr. 7, 37077, Göttingen, Germany
| | - Martin Kollmar
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| |
Collapse
|
13
|
Qadir MI, Bukhat S, Rasul S, Manzoor H, Manzoor M. RNA therapeutics: Identification of novel targets leading to drug discovery. J Cell Biochem 2019; 121:898-929. [DOI: 10.1002/jcb.29364] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 08/20/2019] [Indexed: 12/23/2022]
Affiliation(s)
- Muhammad Imran Qadir
- Institute of Molecular Biology and Biotechnology Bahauddin Zakariya University Multan Pakistan
| | - Sherien Bukhat
- Institute of Molecular Biology and Biotechnology Bahauddin Zakariya University Multan Pakistan
| | - Sumaira Rasul
- Institute of Molecular Biology and Biotechnology Bahauddin Zakariya University Multan Pakistan
| | - Hamid Manzoor
- Institute of Molecular Biology and Biotechnology Bahauddin Zakariya University Multan Pakistan
| | - Majid Manzoor
- College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| |
Collapse
|
14
|
Abascal F, Juan D, Jungreis I, Kellis M, Martinez L, Rigau M, Rodriguez JM, Vazquez J, Tress ML. Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Res 2019; 46:7070-7084. [PMID: 29982784 PMCID: PMC6101605 DOI: 10.1093/nar/gky587] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 06/18/2018] [Indexed: 12/16/2022] Open
Abstract
Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.
Collapse
Affiliation(s)
- Federico Abascal
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - David Juan
- Comparative Genomics Lab, Instituto de Biologica Evolutiva, Universitat Pompeu Fabra, Barcelona, Spain
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA and Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Laura Martinez
- Bioinformatics Unit, Spanish National Cancer Research Centre, Madrid, Spain
| | - Maria Rigau
- Computational Biology Life Sciences Group, Barcelona Supercomputing Center, Barcelona, Spain
| | - Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain
| | - Jesus Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre, Madrid, Spain
| |
Collapse
|
15
|
Mai H, Zhou B, Liu L, Yang F, Conran C, Ji Y, Hou J, Jiang D. Molecular pattern of lncRNAs in hepatocellular carcinoma. JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH : CR 2019; 38:198. [PMID: 31097003 PMCID: PMC6524221 DOI: 10.1186/s13046-019-1213-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 05/07/2019] [Indexed: 02/07/2023]
Abstract
Hepatocellular carcinoma (HCC) is one of the most notable lethal malignancies worldwide. However, the molecular mechanisms involved in the initiation and progression of this disease remain poorly understood. Over the past decade, many studies have demonstrated the important regulatory roles of long non-coding RNAs (lncRNAs) in HCC. Here, we comprehensively review recent discoveries regarding HCC-associated lncRNA functions, which we have classified and described according to their mechanism models.
Collapse
Affiliation(s)
- Haoming Mai
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institute of Liver Diseases Research of Guangdong Province, Guangzhou, China.,Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China
| | - Bin Zhou
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institute of Liver Diseases Research of Guangdong Province, Guangzhou, China.,Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China
| | - Li Liu
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institute of Liver Diseases Research of Guangdong Province, Guangzhou, China.,Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China
| | - Fu Yang
- Department of Medical Genetics, Second Military Medical University, Shanghai, 200433, China
| | - Carly Conran
- University of Illinois College of Medicine, Chicago, IL, 60612, USA
| | - Yuan Ji
- Department of Public Health Sciences, University of Chicago, Chicago, IL, 60637, USA
| | - Jinlin Hou
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institute of Liver Diseases Research of Guangdong Province, Guangzhou, China.,Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China
| | - Deke Jiang
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institute of Liver Diseases Research of Guangdong Province, Guangzhou, China. .,Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China.
| |
Collapse
|
16
|
Pertea M, Shumate A, Pertea G, Varabyou A, Breitwieser FP, Chang YC, Madugundu AK, Pandey A, Salzberg SL. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol 2018; 19:208. [PMID: 30486838 PMCID: PMC6260756 DOI: 10.1186/s13059-018-1590-2] [Citation(s) in RCA: 162] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 11/16/2018] [Indexed: 01/06/2023] Open
Abstract
We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess .
Collapse
Affiliation(s)
- Mihaela Pertea
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Alaina Shumate
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Geo Pertea
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ales Varabyou
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Florian P Breitwieser
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Yu-Chi Chang
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Anil K Madugundu
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, India
- Present address: Center for Individualized Medicine and Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Departments of Biological Chemistry, Pathology, Neurology, and Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Present address: Center for Individualized Medicine and Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Steven L Salzberg
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
17
|
Michelini F, Jalihal AP, Francia S, Meers C, Neeb ZT, Rossiello F, Gioia U, Aguado J, Jones-Weinert C, Luke B, Biamonti G, Nowacki M, Storici F, Carninci P, Walter NG, d'Adda di Fagagna F. From "Cellular" RNA to "Smart" RNA: Multiple Roles of RNA in Genome Stability and Beyond. Chem Rev 2018; 118:4365-4403. [PMID: 29600857 DOI: 10.1021/acs.chemrev.7b00487] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Coding for proteins has been considered the main function of RNA since the "central dogma" of biology was proposed. The discovery of noncoding transcripts shed light on additional roles of RNA, ranging from the support of polypeptide synthesis, to the assembly of subnuclear structures, to gene expression modulation. Cellular RNA has therefore been recognized as a central player in often unanticipated biological processes, including genomic stability. This ever-expanding list of functions inspired us to think of RNA as a "smart" phone, which has replaced the older obsolete "cellular" phone. In this review, we summarize the last two decades of advances in research on the interface between RNA biology and genome stability. We start with an account of the emergence of noncoding RNA, and then we discuss the involvement of RNA in DNA damage signaling and repair, telomere maintenance, and genomic rearrangements. We continue with the depiction of single-molecule RNA detection techniques, and we conclude by illustrating the possibilities of RNA modulation in hopes of creating or improving new therapies. The widespread biological functions of RNA have made this molecule a reoccurring theme in basic and translational research, warranting it the transcendence from classically studied "cellular" RNA to "smart" RNA.
Collapse
Affiliation(s)
- Flavia Michelini
- IFOM - The FIRC Institute of Molecular Oncology , Milan , 20139 , Italy
| | - Ameya P Jalihal
- Single Molecule Analysis Group and Center for RNA Biomedicine, Department of Chemistry , University of Michigan , Ann Arbor , Michigan 48109-1055 , United States
| | - Sofia Francia
- IFOM - The FIRC Institute of Molecular Oncology , Milan , 20139 , Italy.,Istituto di Genetica Molecolare , CNR - Consiglio Nazionale delle Ricerche , Pavia , 27100 , Italy
| | - Chance Meers
- School of Biological Sciences , Georgia Institute of Technology , Atlanta , Georgia 30332 , United States
| | - Zachary T Neeb
- Institute of Cell Biology , University of Bern , Baltzerstrasse 4 , 3012 Bern , Switzerland
| | | | - Ubaldo Gioia
- IFOM - The FIRC Institute of Molecular Oncology , Milan , 20139 , Italy
| | - Julio Aguado
- IFOM - The FIRC Institute of Molecular Oncology , Milan , 20139 , Italy
| | | | - Brian Luke
- Institute of Developmental Biology and Neurobiology , Johannes Gutenberg University , 55099 Mainz , Germany.,Institute of Molecular Biology (IMB) , 55128 Mainz , Germany
| | - Giuseppe Biamonti
- Istituto di Genetica Molecolare , CNR - Consiglio Nazionale delle Ricerche , Pavia , 27100 , Italy
| | - Mariusz Nowacki
- Institute of Cell Biology , University of Bern , Baltzerstrasse 4 , 3012 Bern , Switzerland
| | - Francesca Storici
- School of Biological Sciences , Georgia Institute of Technology , Atlanta , Georgia 30332 , United States
| | - Piero Carninci
- RIKEN Center for Life Science Technologies , 1-7-22 Suehiro-cho, Tsurumi-ku , Yokohama City , Kanagawa 230-0045 , Japan
| | - Nils G Walter
- Single Molecule Analysis Group and Center for RNA Biomedicine, Department of Chemistry , University of Michigan , Ann Arbor , Michigan 48109-1055 , United States
| | - Fabrizio d'Adda di Fagagna
- IFOM - The FIRC Institute of Molecular Oncology , Milan , 20139 , Italy.,Istituto di Genetica Molecolare , CNR - Consiglio Nazionale delle Ricerche , Pavia , 27100 , Italy
| |
Collapse
|
18
|
Liu Z, Liang Y, Wang H, Lu Z, Chen J, Huang Q, Sheng L, Ma Y, Du H, Gong Q. LncRNA expression in the spinal cord modulated by minocycline in a mouse model of spared nerve injury. J Pain Res 2017; 10:2503-2514. [PMID: 29123421 PMCID: PMC5661508 DOI: 10.2147/jpr.s147055] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Neuropathic pain is a common and refractory chronic pain that affects millions of people worldwide. Its underlying mechanisms are still unclear, but they may involve long noncoding RNAs (lncRNAs), which play crucial roles in a variety of biological functions, including nociception. We used microarrays to investigate the possible interactions between lncRNAs and neuropathic pain and identified 22,213 lncRNAs and 19,528 mRNAs in the spinal cord in a mouse model of spared nerve injury (SNI)-induced neuropathic pain. The abundance levels of 183 lncRNAs and 102 mRNAs were significantly modulated by both SNI and administration of minocycline. A quantitative real-time polymerase chain reaction analysis validated expression changes in three lncRNAs (NR_015491, ENSMUST00000174263, and ENSMUST00000146263). Class distribution analysis of differentially expressed lncRNAs revealed intergenic lncRNAs as the largest category. Functional analysis indicated that SNI-induced gene regulations might be involved in the activities of cytokines (IL17A and IL17F) and chemokines (CCL2, CCL5, and CCL7), whereas minocycline might exert a pain-alleviating effect on mice through actin binding, thereby regulating nociception by controlling the cytoskeleton. Thus, lncRNAs might be responsible for SNI-induced neuropathic pain and the attenuation caused by minocycline. Our study could implicate lncRNAs as potential targets for future treatment of neuropathic pain.
Collapse
Affiliation(s)
- Zihao Liu
- Department of Pain Medicine, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Ying Liang
- Department of Pain Medicine, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Honghua Wang
- Department of Pain Medicine, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Zhenhe Lu
- Department of Pain Medicine, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Jinsheng Chen
- Department of Pain Medicine, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Qiaodong Huang
- Department of Pain Medicine, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Lei Sheng
- Department of Pain Medicine, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yinghong Ma
- Department of Pain Medicine, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Huiying Du
- Department of Pain Medicine, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Qingjuan Gong
- Department of Pain Medicine, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
19
|
Abstract
Multiple inherent biases related to different citation practices (for e.g., self-citations, negative citations, wrong citations, multi-authorship-biased citations, honorary citations, circumstantial citations, discriminatory citations, selective and arbitrary citations, etc.) make citation-based bibliometrics strongly flawed and defective measures. A paper can be highly cited for a while (for e.g., under circumstantial or transitional knowledge), but years later it may appear that its findings, paradigms, or theories were untrue or invalid anymore. By contrast, a paper may remain shelved or overlooked for years or decades, but new studies or discoveries may actualize its subject at any moment. As citation-based metrics are transformed into "commercial activities," the "citation credit" should be considered on a commercial basis too, in the sense that "citation credit" should be shared out as a "citation dividend" by shareholders (coauthors) averagely or proportionally to their contributions but not fully appropriated by each of them. At equal numbers of citations, the greater number of authors, the lower "citation credit" should be and vice versa. Overlooking the presence of distorted and subjective citation practices makes many people and administrators "obsessed" with the number of citations to such an extent to run after "highly cited" authors and to create specialized citation databases for commercial purposes. Citation-based bibliometrics, however, are unreliable and unscientific measures; citation counts do not mean that a more cited work is of a higher quality or accuracy than a less cited work because citations do not measure the quality or accuracy. Citations do not mean that a highly cited author or journal is more commendable than a less cited author or journal. Citations are not more than countable numbers: no more, no less.
Collapse
Affiliation(s)
- Khaled Moustafa
- a Conservatoire National des Arts et Métiers , Paris , France
| |
Collapse
|
20
|
Lu Z, Liu N, Wang F. Epigenetic Regulations in Diabetic Nephropathy. J Diabetes Res 2017; 2017:7805058. [PMID: 28401169 PMCID: PMC5376412 DOI: 10.1155/2017/7805058] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Revised: 02/06/2017] [Accepted: 02/09/2017] [Indexed: 01/10/2023] Open
Abstract
Diabetic nephropathy (DN) is a chronic complication of diabetes and the most common cause of end-stage kidney disease. It has been reported that multiple factors are involved in the pathogenesis of DN, while the molecular mechanisms that lead to DN are still not fully understood. Numerous risk factors for the development of diabetic nephropathy have been proposed, including ethnicity and inherited genetic differences. Recently, with the development of high-throughput technologies, there is emerging evidence that suggests the important role of epigenetic mechanisms in the pathogenesis of DN. Epigenetic regulations, including DNA methylation, noncoding RNAs, and histone modifications, play a pivotal role in DN pathogenesis by a second layer of gene regulation. All these findings can contribute to developing novel therapies for DN.
Collapse
Affiliation(s)
- Zeyuan Lu
- Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Na Liu
- Department of Nephrology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
- *Na Liu: and
| | - Feng Wang
- Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
- *Feng Wang:
| |
Collapse
|
21
|
Navarro E, Funtikova AN, Fíto M, Schröder H. Prenatal nutrition and the risk of adult obesity: Long-term effects of nutrition on epigenetic mechanisms regulating gene expression. J Nutr Biochem 2017; 39:1-14. [DOI: 10.1016/j.jnutbio.2016.03.012] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Revised: 03/23/2016] [Accepted: 03/27/2016] [Indexed: 12/19/2022]
|
22
|
Evans JR, Feng FY, Chinnaiyan AM. The bright side of dark matter: lncRNAs in cancer. J Clin Invest 2016; 126:2775-82. [PMID: 27479746 DOI: 10.1172/jci84421] [Citation(s) in RCA: 338] [Impact Index Per Article: 42.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The traditional view of genome organization has been upended in the last decade with the discovery of vast amounts of non-protein-coding transcription. After initial concerns that this "dark matter" of the genome was transcriptional noise, it is apparent that a subset of these noncoding RNAs are functional. Long noncoding RNA (lncRNA) genes resemble protein-coding genes in several key aspects, and they have myriad molecular functions across many cellular pathways and processes, including oncogenic signaling. The number of lncRNA genes has recently been greatly expanded by our group to triple the number of protein-coding genes; therefore, lncRNAs are likely to play a role in many biological processes. Based on their large number and expression specificity in a variety of cancers, lncRNAs are likely to serve as the basis for many clinical applications in oncology.
Collapse
|
23
|
Sheshukova EV, Shindyapina AV, Komarova TV, Dorokhov YL. “Matreshka” genes with alternative reading frames. RUSS J GENET+ 2016. [DOI: 10.1134/s1022795416020149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
24
|
Yoshimoto R, Mayeda A, Yoshida M, Nakagawa S. MALAT1 long non-coding RNA in cancer. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2015; 1859:192-9. [PMID: 26434412 DOI: 10.1016/j.bbagrm.2015.09.012] [Citation(s) in RCA: 163] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 09/24/2015] [Accepted: 09/28/2015] [Indexed: 02/09/2023]
Abstract
A recent massive parallel sequencing analysis has shown the fact that more than 80% of the human genome is transcribed into RNA. Among many kinds of the non-protein coding RNAs, we focus on the metastasis associated lung adenocarcinoma transcript 1 (MALAT1) that is a long non-coding RNA upregulated in metastatic carcinoma cells. Two molecular functions of MALAT1 have been proposed, one is the control of alternative splicing and the other is the transcriptional regulation. In this review, we document the molecular characteristics and functions of MALAT1 and shed light on the implication in the molecular pathology of various cancers. This article is part of a Special Issue entitled: Clues to long noncoding RNA taxonomy1, edited by Dr. Tetsuro Hirose and Dr. Shinichi Nakagawa.
Collapse
Affiliation(s)
- Rei Yoshimoto
- Division of Gene Expression Mechanism, Institute for Comprehensive Medical Science, Fujita Health University, Kutsukake-cho, Toyoake, Aichi 470-1192, Japan; Chemical Genetics Laboratory, RIKEN, Hirosawa, Wako, Saitama 351-0198, Japan.
| | - Akila Mayeda
- Division of Gene Expression Mechanism, Institute for Comprehensive Medical Science, Fujita Health University, Kutsukake-cho, Toyoake, Aichi 470-1192, Japan
| | - Minoru Yoshida
- Chemical Genetics Laboratory, RIKEN, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Shinichi Nakagawa
- RNA Biology Laboratory, RIKEN, Hirosawa, Wako, Saitama 351-0198, Japan
| |
Collapse
|
25
|
Raabe CA, Brosius J. Does every transcript originate from a gene? Ann N Y Acad Sci 2015; 1341:136-48. [PMID: 25847549 DOI: 10.1111/nyas.12741] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 02/05/2015] [Accepted: 02/11/2015] [Indexed: 12/20/2022]
Abstract
Outdated gene definitions favored regions corresponding to mature messenger RNAs, in particular, the open reading frame. In eukaryotes, the intergenic space was widely regarded nonfunctional and devoid of RNA transcription. Original concepts were based on the assumption that RNA expression was restricted to known protein-coding genes and a few so-called structural RNA genes, such as ribosomal RNAs or transfer RNAs. With the discovery of introns and, more recently, sensitive techniques for monitoring genome-wide transcription, this view had to be substantially modified. Tiling microarrays and RNA deep sequencing revealed myriads of transcripts, which cover almost entire genomes. The tremendous complexity of non-protein-coding RNA transcription has to be integrated into novel gene definitions. Despite an ever-growing list of functional RNAs, questions concerning the mass of identified transcripts are under dispute. Here, we examined genome-wide transcription from various angles, including evolutionary considerations, and suggest, in analogy to novel alternative splice variants that do not persist, that the vast majority of transcripts represent raw material for potential, albeit rare, exaptation events.
Collapse
Affiliation(s)
- Carsten A Raabe
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| | | |
Collapse
|
26
|
Milligan MJ, Lipovich L. Pseudogene-derived lncRNAs: emerging regulators of gene expression. Front Genet 2015; 5:476. [PMID: 25699073 PMCID: PMC4316772 DOI: 10.3389/fgene.2014.00476] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Accepted: 12/25/2014] [Indexed: 01/11/2023] Open
Abstract
In the more than one decade since the completion of the Human Genome Project, the prevalence of non-protein-coding functional elements in the human genome has emerged as a key revelation in post-genomic biology. Highlighted by the ENCODE (Encyclopedia of DNA Elements) and FANTOM (Functional Annotation of Mammals) consortia, these elements include tens of thousands of pseudogenes, as well as comparably numerous long non-coding RNA (lncRNA) genes. Pseudogene transcription and function remain insufficiently understood. However, the field is of great importance for human disease due to the high sequence similarity between pseudogenes and their parental protein-coding genes, which generates the potential for sequence-specific regulation. Recent case studies have established essential and coordinated roles of both pseudogenes and lncRNAs in development and disease in metazoan systems, including functional impacts of lncRNA transcription at pseudogene loci on the regulation of the pseudogenes’ parental genes. This review synthesizes the nascent evidence for regulatory modalities jointly exerted by lncRNAs and pseudogenes in human disease, and for recent evolutionary origins of these systems.
Collapse
Affiliation(s)
- Michael J Milligan
- Center for Molecular Medicine and Genetics, Wayne State University School of Medicine , Detroit, MI, USA
| | - Leonard Lipovich
- Center for Molecular Medicine and Genetics, Wayne State University School of Medicine , Detroit, MI, USA
| |
Collapse
|
27
|
Richard JLC, Ogawa Y. Understanding the Complex Circuitry of lncRNAs at the X-inactivation Center and Its Implications in Disease Conditions. Curr Top Microbiol Immunol 2015; 394:1-27. [PMID: 25982976 DOI: 10.1007/82_2015_443] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Balanced gene expression is a high priority in order to maintain optimal functioning since alterations and variations could result in acute consequences. X chromosome inactivation (X-inactivation) is one such strategy utilized by mammalian species to silence the extra X chromosome in females to uphold a similar level of expression between the two sexes. A functionally versatile class of molecules called long noncoding RNA (lncRNA) has emerged as key regulators of gene expression and plays important roles during development. An lncRNA that is indispensable for X-inactivation is X-inactive specific transcript (Xist), which induces a repressive epigenetic landscape and creates the inactive X chromosome (Xi). With recent advents in the field of X-inactivation, novel positive and negative lncRNA regulators of Xist such as Jpx and Tsix, respectively, have broadened the regulatory network of X-inactivation. Xist expression failure or dysregulation has been implicated in producing developmental anomalies and disease states. Subsequently, reactivation of the Xi at a later stage of development has also been associated with certain tumors. With the recent influx of information about lncRNA biology and advancements in methods to probe lncRNA, we can now attempt to understand this complex network of Xist regulation in development and disease. It has become clear that the presence of an extra set of genes could be fatal for the organism. Only by understanding the precise ways in which lncRNAs function can treatments be developed to bring aberrations under control. This chapter summarizes our current understanding and knowledge with regard to how lncRNAs are orchestrated at the X-inactivation center (Xic), with a special focus on how genetic diseases come about as a consequence of lncRNA dysregulation.
Collapse
Affiliation(s)
- John Lalith Charles Richard
- Division of Reproductive Sciences, Cincinnati Children's Hospital Medical Center; Department of Pediatrics, University of Cincinnati College of Medicine, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA
| | - Yuya Ogawa
- Division of Reproductive Sciences, Cincinnati Children's Hospital Medical Center; Department of Pediatrics, University of Cincinnati College of Medicine, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| |
Collapse
|
28
|
Oteng-Pabi SK, Pardin C, Stoica M, Keillor JW. Site-specific protein labelling and immobilization mediated by microbial transglutaminase. Chem Commun (Camb) 2014; 50:6604-6. [DOI: 10.1039/c4cc00994k] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Microbial transglutaminase (mTG) mediates site-specific propargylation of target proteins, allowing their subsequent modification in in vitro bio-conjugation applications.
Collapse
Affiliation(s)
| | | | - Maria Stoica
- Department of Chemistry
- University of Ottawa
- Ottawa, Canada K1N 6N5
| | | |
Collapse
|
29
|
Chen G, Wang C, Shi L, Qu X, Chen J, Yang J, Shi C, Chen L, Zhou P, Ning B, Tong W, Shi T. Incorporating the human gene annotations in different databases significantly improved transcriptomic and genetic analyses. RNA (NEW YORK, N.Y.) 2013; 19:479-89. [PMID: 23431329 PMCID: PMC3677258 DOI: 10.1261/rna.037473.112] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 01/14/2013] [Indexed: 05/18/2023]
Abstract
Human gene annotation is crucial for conducting transcriptomic and genetic studies; however, the impacts of human gene annotations in diverse databases on related studies have been less evaluated. To enable full use of various human annotation resources and better understand the human transcriptome, here we systematically compare the human annotations present in RefSeq, Ensembl (GENCODE), and AceView on diverse transcriptomic and genetic analyses. We found that the human gene annotations in the three databases are far from complete. Although Ensembl and AceView annotated more genes than RefSeq, more than 15,800 genes from Ensembl (or AceView) are within the intergenic and intronic regions of AceView (or Ensembl) annotation. The human transcriptome annotations in RefSeq, Ensembl, and AceView had distinct effects on short-read mapping, gene and isoform expression profiling, and differential expression calling. Furthermore, our findings indicate that the integrated annotation of these databases can obtain a more complete gene set and significantly enhance those transcriptomic analyses. We also observed that many more known SNPs were located within genes annotated in Ensembl and AceView than in RefSeq. In particular, 1033 of 3041 trait/disease-associated SNPs involved in about 200 human traits/diseases that were previously reported to be in RefSeq intergenic regions could be relocated within Ensembl and AceView genes. Our findings illustrate that a more complete transcriptome generated by incorporating human gene annotations in diverse databases can strikingly improve the overall results of transcriptomic and genetic studies.
Collapse
Affiliation(s)
- Geng Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Charles Wang
- Functional Genomics Core, Beckman Research Institute, City of Hope Comprehensive Cancer Center, Duarte, California 91010, USA
| | - Leming Shi
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas 72079, USA
| | - Xiongfei Qu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Jiwei Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Jianmin Yang
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Caiping Shi
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Long Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Peiying Zhou
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Baitang Ning
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas 72079, USA
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas 72079, USA
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
- Corresponding authorE-mail
| |
Collapse
|
30
|
Bajetha G, Bhati J, Sarika, Iquebal MA, Rai A, Arora V, Kumar D. Analysis and functional annotation of expressed sequence tags of water buffalo. Anim Biotechnol 2013; 24:25-30. [PMID: 23394367 DOI: 10.1080/10495398.2012.737884] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
An elucidated genome of domestic livestock river buffalo will contribute enormously to economy and better understanding of genome evolution as well. An attempt is made to obtain genomic information on buffalo, based on total Expressed Sequence Tags (ESTs) of Bubalus bubalis available in public domain. These ESTs were annotated and classified into 15 different functional categories based on their homology to the known proteins. Interestingly, 41.79% of the contigs were found to be buffalo specific novel ESTs with respect to other species used in analysis which needs further studies. Also, 224 pSNPs (putative Single Nucleotide Polymorphism) were detected. This study will provide a home base for further genomic studies of buffalo and comparative studies enabling a starting point for the genome annotation of the organism. Supplementary materials are available for this article online.
Collapse
Affiliation(s)
- Garima Bajetha
- Center for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, India
| | | | | | | | | | | | | |
Collapse
|
31
|
Abstract
The discovery of numerous noncoding RNA (ncRNA) transcripts in species from yeast to mammals has dramatically altered our understanding of cell biology, especially the biology of diseases such as cancer. In humans, the identification of abundant long ncRNA (lncRNA) >200 bp has catalyzed their characterization as critical components of cancer biology. Recently, roles for lncRNAs as drivers of tumor suppressive and oncogenic functions have appeared in prevalent cancer types, such as breast and prostate cancer. In this review, we highlight the emerging impact of ncRNAs in cancer research, with a particular focus on the mechanisms and functions of lncRNAs.
Collapse
Affiliation(s)
- John R Prensner
- Michigan Center for Translational Pathology, Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
32
|
Multiple isoforms of the translation initiation factor eIF4GII are generated via use of alternative promoters, splice sites and a non-canonical initiation codon. Biochem J 2012; 448:1-11. [DOI: 10.1042/bj20111765] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
During the initiation stage of eukaryotic mRNA translation, the eIF4G (eukaryotic initiation factor 4G) proteins act as an aggregation point for recruiting the small ribosomal subunit to an mRNA. We previously used RNAi (RNA interference) to reduce expression of endogenous eIF4GI proteins, resulting in reduced protein synthesis rates and alterations in the morphology of cells. Expression of EIF4G1 cDNAs, encoding different isoforms (f–a) which arise through selection of alternative initiation codons, rescued translation to different extents. Furthermore, overexpression of the eIF4GII paralogue in the eIF4GI-knockdown background was unable to restore translation to the same extent as eIF4GIf/e isoforms, suggesting that translation events governed by this protein are different. In the present study we show that multiple isoforms of eIF4GII exist in mammalian cells, arising from multiple promoters and alternative splicing events, and have identified a non-canonical CUG initiation codon which extends the eIF4GII N-terminus. We further show that the rescue of translation in eIF4GI/eIF4GII double-knockdown cells by our novel isoforms of eIF4GII is as robust as that observed with either eIF4GIf or eIF4GIe, and more than that observed with the original eIF4GII. As the novel eIF4GII sequence diverges from eIF4GI, these data suggest that the eIF4GII N-terminus plays an alternative role in initiation factor assembly.
Collapse
|
33
|
Zhou S, Ji G, Liu X, Li P, Moler J, Karro JE, Liang C. Pattern analysis approach reveals restriction enzyme cutting abnormalities and other cDNA library construction artifacts using raw EST data. BMC Biotechnol 2012; 12:16. [PMID: 22554190 PMCID: PMC3424822 DOI: 10.1186/1472-6750-12-16] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2011] [Accepted: 03/15/2012] [Indexed: 11/12/2022] Open
Abstract
Background Expressed Sequence Tag (EST) sequences are widely used in applications such as genome annotation, gene discovery and gene expression studies. However, some of GenBank dbEST sequences have proven to be “unclean”. Identification of cDNA termini/ends and their structures in raw ESTs not only facilitates data quality control and accurate delineation of transcription ends, but also furthers our understanding of the potential sources of data abnormalities/errors present in the wet-lab procedures for cDNA library construction. Results After analyzing a total of 309,976 raw Pinus taeda ESTs, we uncovered many distinct variations of cDNA termini, some of which prove to be good indicators of wet-lab artifacts, and characterized each raw EST by its cDNA terminus structure patterns. In contrast to the expected patterns, many ESTs displayed complex and/or abnormal patterns that represent potential wet-lab errors such as: a failure of one or both of the restriction enzymes to cut the plasmid vector; a failure of the restriction enzymes to cut the vector at the correct positions; the insertion of two cDNA inserts into a single vector; the insertion of multiple and/or concatenated adapters/linkers; the presence of 3′-end terminal structures in designated 5′-end sequences or vice versa; and so on. With a close examination of these artifacts, many problematic ESTs that have been deposited into public databases by conventional bioinformatics pipelines or tools could be cleaned or filtered by our methodology. We developed a software tool for Abnormality Filtering and Sequence Trimming for ESTs (AFST, http://code.google.com/p/afst/) using a pattern analysis approach. To compare AFST with other pipelines that submitted ESTs into dbEST, we reprocessed 230,783 Pinus taeda and 38,709 Arachis hypogaea GenBank ESTs. We found 7.4% of Pinus taeda and 29.2% of Arachis hypogaea GenBank ESTs are “unclean” or abnormal, all of which could be cleaned or filtered by AFST. Conclusions cDNA terminal pattern analysis, as implemented in the AFST software tool, can be utilized to reveal wet-lab errors such as restriction enzyme cutting abnormities and chimeric EST sequences, detect various data abnormalities embedded in existing Sanger EST datasets, improve the accuracy of identifying and extracting bona fide cDNA inserts from raw ESTs, and therefore greatly benefit downstream EST-based applications.
Collapse
Affiliation(s)
- Sun Zhou
- Department of Automation, Xiamen University, Fujian, China.
| | | | | | | | | | | | | |
Collapse
|
34
|
Bastepe M. The GNAS Locus: Quintessential Complex Gene Encoding Gsalpha, XLalphas, and other Imprinted Transcripts. Curr Genomics 2011; 8:398-414. [PMID: 19412439 PMCID: PMC2671723 DOI: 10.2174/138920207783406488] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2007] [Revised: 09/22/2007] [Accepted: 09/28/2007] [Indexed: 12/14/2022] Open
Abstract
The currently estimated number of genes in the human genome is much smaller than previously predicted. As an explanation for this disparity, most individual genes have multiple transcriptional units that represent a variety of biologically important gene products. GNAS exemplifies a gene of such complexity. One of its products is the alpha-subunit of the stimulatory heterotrimeric G protein (Gsalpha), a ubiquitous signaling protein essential for numerous different cellular responses. Loss-of-function and gain-of-function mutations within Gsalpha-coding GNAS exons are found in various human disorders, including Albright's hereditary osteodystrophy, pseudohypoparathyroidism, fibrous dysplasia of bone, and some tumors of different origin. While Gsalpha expression in most tissues is biallelic, paternal Gsalpha expression is silenced in a small number of tissues, playing an important role in the development of phenotypes associated with GNAS mutations. Additional products derived exclusively from the paternal GNAS allele include XLalphas, a protein partially identical to Gsalpha, and two non-coding RNA molecules, the A/B transcript and the antisense transcript. The maternal GNAS allele leads to NESP55, a chromogranin-like neuroendocrine secretory protein. In vivo animal models have demonstrated the importance of each of the exclusively imprinted GNAS products in normal mammalian physiology. However, although one or more of these products are also disrupted by most naturally occurring GNAS mutations, their roles in disease pathogenesis remain unknown. To further our understanding of the significance of this gene in physiology and pathophysiology, it will be important to elucidate the cellular roles and the mechanisms regulating the expression of each GNAS product.
Collapse
Affiliation(s)
- Murat Bastepe
- Endocrine Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
35
|
Hegyi H, Kalmar L, Horvath T, Tompa P. Verification of alternative splicing variants based on domain integrity, truncation length and intrinsic protein disorder. Nucleic Acids Res 2010; 39:1208-19. [PMID: 20972208 PMCID: PMC3045584 DOI: 10.1093/nar/gkq843] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
According to current estimations ∼95% of multi-exonic human protein-coding genes undergo alternative splicing (AS). However, for 4000 human proteins in PDB, only 14 human proteins have structures of at least two alternative isoforms. Surveying these structural isoforms revealed that the maximum insertion accommodated by an isoform of a fully ordered protein domain was 5 amino acids, other instances of domain changes involved intrinsic structural disorder. After collecting 505 minor isoforms of human proteins with evidence for their existence we analyzed their length, protein disorder and exposed hydrophobic surface. We found that strict rules govern the selection of alternative splice variants aimed to preserve the integrity of globular domains: alternative splice sites (i) tend to avoid globular domains or (ii) affect them only marginally or (iii) tend to coincide with a location where the exposed hydrophobic surface is minimal or (iv) the protein is disordered. We also observed an inverse correlation between the domain fraction lost and the full length of the minor isoform containing the domain, possibly indicating a buffering effect for the isoform protein counteracting the domain truncation effect. These observations provide the basis for a prediction method (currently under development) to predict the viability of splice variants.
Collapse
Affiliation(s)
- Hedi Hegyi
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, PO Box 7, 1518 Budapest, Hungary.
| | | | | | | |
Collapse
|
36
|
Dunham I, Beare DM, Collins JE. The characteristics of human genes: analysis of human chromosome 22. Comp Funct Genomics 2010; 4:635-46. [PMID: 18629020 PMCID: PMC2447302 DOI: 10.1002/cfg.335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2003] [Revised: 09/04/2003] [Accepted: 09/08/2003] [Indexed: 11/11/2022] Open
Affiliation(s)
- Ian Dunham
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | |
Collapse
|
37
|
Simpson AJ, de Souza SJ, Camargo AA, Brentani RR. Definition of the gene content of the human genome: the need for deep experimental verification. Comp Funct Genomics 2010; 2:169-75. [PMID: 18628909 PMCID: PMC2447206 DOI: 10.1002/cfg.81] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2001] [Accepted: 04/05/2001] [Indexed: 11/06/2022] Open
Abstract
Based on the analysis of the drafts of the human genome sequence, it is being speculated that our species may possess an unexpectedly low number of genes. The quality of the drafts, the impossibility of accurate gene prediction and the lack of sufficient transcript sequence data, however, render such speculations very premature. The complexity of human gene structure requires additional and extensive experimental verification of transcripts that may result in major revisions of these early estimates of the number of human genes.
Collapse
Affiliation(s)
- A J Simpson
- The Ludwig Institute for Cancer Research, Rua Professor Antônio Prudente 109, São Paulo, SP 01509-010, Brazil.
| | | | | | | |
Collapse
|
38
|
Abstract
Many people expected the question 'How many genes in the human genome?' to be resolved with the publication of the genome sequence in 2001, but estimates continue to fluctuate.
Collapse
Affiliation(s)
- Mihaela Pertea
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | - Steven L Salzberg
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
39
|
An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome. PLoS One 2010; 5:e8949. [PMID: 20126623 PMCID: PMC2812506 DOI: 10.1371/journal.pone.0008949] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Accepted: 01/06/2010] [Indexed: 01/28/2023] Open
Abstract
Background Most protein mass spectrometry (MS) experiments rely on searches against a database of known or predicted proteins, limiting their ability as a gene discovery tool. Results Using a search against an in silico translation of the entire human genome, combined with a series of annotation filters, we identified 346 putative novel peptides [False Discovery Rate (FDR)<5%] in a MS dataset derived from two human breast epithelial cell lines. A subset of these were then successfully validated by a different MS technique. Two of these correspond to novel isoforms of Heterogeneous Ribonuclear Proteins, while the rest correspond to novel loci. Conclusions MS technology can be used for ab initio gene discovery in human data, which, since it is based on different underlying assumptions, identifies protein-coding genes not found by other techniques. As MS technology continues to evolve, such approaches will become increasingly powerful.
Collapse
|
40
|
|
41
|
Yang X, Xie L, Li Y, Wei C. More than 9,000,000 unique genes in human gut bacterial community: estimating gene numbers inside a human body. PLoS One 2009; 4:e6074. [PMID: 19562079 PMCID: PMC2699651 DOI: 10.1371/journal.pone.0006074] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Accepted: 05/29/2009] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Estimating the number of genes in human genome has been long an important problem in computational biology. With the new conception of considering human as a super-organism, it is also interesting to estimate the number of genes in this human super-organism. PRINCIPAL FINDINGS We presented our estimation of gene numbers in the human gut bacterial community, the largest microbial community inside the human super-organism. We got 552,700 unique genes from 202 complete human gut bacteria genomes. Then, a novel gene counting model was built to check the total number of genes by combining culture-independent sequence data and those complete genomes. 16S rRNAs were used to construct a three-level tree and different counting methods were introduced for the three levels: strain-to-species, species-to-genus, and genus-and-up. The model estimates that the total number of genes is about 9,000,000 after those with identity percentage of 97% or up were merged. CONCLUSION By combining completed genomes currently available and culture-independent sequencing data, we built a model to estimate the number of genes in human gut bacterial community. The total number of genes is estimated to be about 9 million. Although this number is huge, we believe it is underestimated. This is an initial step to tackle this gene counting problem for the human super-organism. It will still be an open problem in the near future. The list of genomes used in this paper can be found in the supplementary table.
Collapse
Affiliation(s)
- Xing Yang
- Shanghai Center for Bioinformation Technology, Shanghai, China
- School of Life Science and Technology, Tongji University, Shanghai, China
| | - Lu Xie
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Yixue Li
- Shanghai Center for Bioinformation Technology, Shanghai, China
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- Bioinformation Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- * E-mail: (YXL); (CCW)
| | - Chaochun Wei
- Shanghai Center for Bioinformation Technology, Shanghai, China
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- Lab of Molecular Microbial Ecology and Ecogenomics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- * E-mail: (YXL); (CCW)
| |
Collapse
|
42
|
Gu L, Guo R. Genome-wide detection and analysis of alternative splicing for nucleotide binding site-leucine-rich repeats sequences in rice. J Genet Genomics 2009; 34:247-57. [PMID: 17498622 DOI: 10.1016/s1673-8527(07)60026-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2006] [Accepted: 08/03/2006] [Indexed: 11/20/2022]
Abstract
Alternative splicing is a major contributor to genomic complexity and proteome diversity, yet the analysis of alternative splicing for the sequence containing nucleotide binding site and leucine-rich repeats (NBS-LRR) domain has not been explored in rice (Oryza sativa L.). Hidden Markov model (HMM) searches were performed for NBS-LRR domain. 875 NBS-LRR-encoding sequences were obtained from the Institute for Genomic Research (TIGR). All of them were used to blast Knowledge-based Oryza Molecular Biological Encyclopaedia (KOME), TIGR rice gene index (TGI), and Universal Protein Resource (UniProt) to obtain homologous full-length cDNAs (FL-cDNAs), tentative consensus sequences, and protein sequences. Alternative splicing events were detected from genomic alignment of FL-cDNAs, tentative consensus sequences, and protein sequences, which provide valuable information on splice variants of genes. These sequences were aligned to the corresponding BAC sequences using the Spidey and Sim4 programs and each of the proteins was aligned by tBLASTn. Of the 875 NBS-LRR sequences, 119 (13.6%) sequences had alternative splicing where multiple FL-cDNAs, TGI sequences and proteins corresponded to the same gene. 71 intron retention events, 20 exon skipping events, 16 alternative termination events, 25 alternative initiation events, 12 alternative 5' splicing events, and 16 alternative 3' splicing events were identified. Most of these alternative splices were supported by two or more transcripts. The data sets are available at http://www.bioinfor.org Furthermore, the bioinformatics analysis of splice boundaries showed that exon skipping and intron retention did not exhibit strong consensus. This implies a different regulation mechanism that guides the expression of splice isoforms. This article also presents the analysis of the effects of intron retention on proteins. The C-terminal regions of alternative proteins turned out to be more variable than the N-terminal regions. Finally, tissue distribution and protein localization of alternative splicing were explored. The largest categories of tissue distributions for alternative splicing were shoot and callus. More than one-thirds of protein localization for splice forms was plasma membrane and cytoplasm. All the NBS-LRR proteins for splice forms may have important function in disease resistance and activate downstream signaling pathways.
Collapse
Affiliation(s)
- Lianfeng Gu
- College of Agriculture, Guangdong Ocean University, Zhanjiang 524088, China
| | | |
Collapse
|
43
|
Scheibye-Alsing K, Hoffmann S, Frankel A, Jensen P, Stadler PF, Mang Y, Tommerup N, Gilchrist MJ, Nygård AB, Cirera S, Jørgensen CB, Fredholm M, Gorodkin J. Sequence assembly. Comput Biol Chem 2008; 33:121-36. [PMID: 19152793 DOI: 10.1016/j.compbiolchem.2008.11.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2008] [Revised: 11/28/2008] [Accepted: 11/28/2008] [Indexed: 01/20/2023]
Abstract
Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and plays an important role in processing the information generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly programs. We describe the basic principles of computational assembly along with the main concerns, such as repetitive sequences in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html.
Collapse
Affiliation(s)
- K Scheibye-Alsing
- Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C, Denmark
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Cao J, Wu X, Jin Y. Lower GC-content in editing exons: implications for regulation by molecular characteristics maintained by selection. Gene 2008; 421:14-9. [PMID: 18632225 DOI: 10.1016/j.gene.2008.05.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2007] [Revised: 03/01/2008] [Accepted: 05/21/2008] [Indexed: 01/26/2023]
Abstract
We unexpectedly discover that there are much lower GC3 and GC-content and higher Gibbs free energy on editing exons than other exons in the Drosophila synaptotagmin I transcripts, which was further, confirmed statistically by others 47 experimentally-validated samples. Sequence alignment, Ks and Ka/Ks assays suggest that rapidly ascending purifying selection occur in editing exons which constrains nucleotide divergency. The presence of specific molecular characteristics such as lower GC-content in editing exons imply an unexpected requirement and are likely to direct RNA editing occurrence. Thus, relations between molecular characteristics of DNA, RNA editing and purifying selection might be present.
Collapse
Affiliation(s)
- Jun Cao
- Institute of Biochemistry, College of Life Sciences, Zhejiang University (Zijingang Campus), Hangzhou, Zhejiang, ZJ310058, PR China
| | | | | |
Collapse
|
45
|
Zou X, Chung T, Lin X, Malakhova ML, Pike HM, Brown RE. Human glycolipid transfer protein (GLTP) genes: organization, transcriptional status and evolution. BMC Genomics 2008; 9:72. [PMID: 18261224 PMCID: PMC2262070 DOI: 10.1186/1471-2164-9-72] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Accepted: 02/08/2008] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Glycolipid transfer protein is the prototypical and founding member of the new GLTP superfamily distinguished by a novel conformational fold and glycolipid binding motif. The present investigation provides the first insights into the organization, transcriptional status, phylogenetic/evolutionary relationships of GLTP genes. RESULTS In human cells, single-copy GLTP genes were found in chromosomes 11 and 12. The gene at locus 11p15.1 exhibited several features of a potentially active retrogene, including a highly homologous (approximately 94%), full-length coding sequence containing all key amino acid residues involved in glycolipid liganding. To establish the transcriptional activity of each human GLTP gene, in silico EST evaluations, RT-PCR amplifications of GLTP transcript(s), and methylation analyses of regulator CpG islands were performed using various human cells. Active transcription was found for 12q24.11 GLTP but 11p15.1 GLTP was transcriptionally silent. Heterologous expression and purification of the GLTP paralogs showed glycolipid intermembrane transfer activity only for 12q24.11 GLTP. Phylogenetic/evolutionary analyses indicated that the 5-exon/4-intron organizational pattern and encoded sequence of 12q24.11 GLTP were highly conserved in therian mammals and other vertebrates. Orthologs of the intronless GLTP gene were observed in primates but not in rodentiates, carnivorates, cetartiodactylates, or didelphimorphiates, consistent with recent evolutionary development. CONCLUSION The results identify and characterize the gene responsible for GLTP expression in humans and provide the first evidence for the existence of a GLTP pseudogene, while demonstrating the rigorous approach needed to unequivocally distinguish transcriptionally-active retrogenes from silent pseudogenes. The results also rectify errors in the Ensembl database regarding the organizational structure of the actively transcribed GLTP gene in Pan troglodytes and establish the intronless GLTP as a primate-specific, processed pseudogene marker. A solid foundation has been established for future identification of hereditary defects in human GLTP genes.
Collapse
Affiliation(s)
- Xianqiong Zou
- The Hormel Institute, University of Minnesota, Austin, Minnesota 55912, USA.
| | | | | | | | | | | |
Collapse
|
46
|
Abstract
The principal route to understanding the biological significance of the genome sequence comes from discovery and characterization of that portion of the genome that is transcribed into RNA products. We now know that this ;transcriptome' is unexpectedly complex and its precise definition in any one species requires multiple technical approaches and an ability to work on a very large scale. A key step is the development of technologies able to capture snapshots of the complexity of the various kinds of RNA generated by the genome. As the human, mouse and other model genome sequencing projects approach completion, considerable effort has been focused on identifying and annotating the protein-coding genes as the principal output of the genome. In pursuing this aim, several key technologies have been developed to generate large numbers and highly diverse sets of full-length cDNAs and their variants. However, the search has identified another hidden transcriptional universe comprising a wide variety of non-protein coding RNA transcripts. Despite initial scepticism, various experiments and complementary technologies have demonstrated that these RNAs are dynamically transcribed and a subset of them can act as sense-antisense RNAs, which influence the transcriptional output of the genome. Recent experimental evidence suggests that the list of non-protein coding RNAs is still largely incomplete and that transcription is substantially more complex even than currently thought.
Collapse
Affiliation(s)
- Piero Carninci
- Genome Science Laboratory, Discovery and Research Institute, RIKEN Wako Institute, Wako, Saitama, Japan.
| |
Collapse
|
47
|
Abstract
The promise of the genome project was that a complete sequence would provide us with information that would transform biology and medicine. But the 'parts list' that has emerged from the genome project is far from the 'wiring diagram' and 'circuit logic' we need to understand the link between genotype, environment and phenotype. While genomic technologies such as DNA microarrays, proteomics and metabolomics have given us new tools and new sources of data to address these problems, a number of crucial elements remain to be addressed before we can begin to close the loop and develop a predictive quantitative biology that is the stated goal of so much of current biological research, including systems biology. Our approach to this problem has largely been one of integration, bringing together a vast wealth of information to better interpret the experimental data we are generating in genomic assays and creating publicly available databases and software tools to facilitate the work of others. Recently, we have used a similar approach to trying to understand the biological networks that underlie the phenotypic responses we observe and starting us on the road to developing a predictive biology.
Collapse
Affiliation(s)
- John Quackenbush
- Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
48
|
Abstract
In recent years, genome-wide detection of alternative splicing based on Expressed Sequence Tag (EST) sequence alignments with mRNA and genomic sequences has dramatically expanded our understanding of the role of alternative splicing in functional regulation. This chapter reviews the data, methodology, and technical challenges of these genome-wide analyses of alternative splicing, and briefly surveys some of the uses to which such alternative splicing databases have been put. For example, with proper alternative splicing database schema design, it is possible to query genome-wide for alternative splicing patterns that are specific to particular tissues, disease states (e.g., cancer), gender, or developmental stages. EST alignments can be used to estimate exon inclusion or exclusion level of alternatively spliced exons and evolutionary changes for various species can be inferred from exon inclusion level. Such databases can also help automate design of probes for RT-PCR and microarrays, enabling high throughput experimental measurement of alternative splicing.
Collapse
|
49
|
Dahinden C, Parmigiani G, Emerick MC, Bühlmann P. Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries. BMC Bioinformatics 2007; 8:476. [PMID: 18072965 PMCID: PMC2233645 DOI: 10.1186/1471-2105-8-476] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2007] [Accepted: 12/11/2007] [Indexed: 11/10/2022] Open
Abstract
Background The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species. Results We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ1-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries. Conclusion We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables.
Collapse
|
50
|
Liang C, Wang G, Liu L, Ji G, Fang L, Liu Y, Carter K, Webb JS, Dean JFD. ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs). BMC Genomics 2007; 8:134. [PMID: 17535431 PMCID: PMC1894976 DOI: 10.1186/1471-2164-8-134] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2006] [Accepted: 05/29/2007] [Indexed: 11/30/2022] Open
Abstract
Background With the advent of low-cost, high-throughput sequencing, the amount of public domain Expressed Sequence Tag (EST) sequence data available for both model and non-model organism is growing exponentially. While these data are widely used for characterizing various genomes, they also present a serious challenge for data quality control and validation due to their inherent deficiencies, particularly for species without genome sequences. Description ConiferEST is an integrated system for data reprocessing, visualization and mining of conifer ESTs. In its current release, Build 1.0, it houses 172,229 loblolly pine EST sequence reads, which were obtained from reprocessing raw DNA sequencer traces using our software – WebTraceMiner. The trace files were downloaded from NCBI Trace Archive. ConiferEST provides biologists unique, easy-to-use data visualization and mining tools for a variety of putative sequence features including cloning vector segments, adapter sequences, restriction endonuclease recognition sites, polyA and polyT runs, and their corresponding Phred quality values. Based on these putative features, verified sequence features such as 3' and/or 5' termini of cDNA inserts in either sense or non-sense strand have been identified in-silico. Interestingly, only 30.03% of the designated 3' ESTs were found to have an authenticated 5' terminus in the non-sense strand (i.e., polyT tails), while fewer than 5.34% of the designated 5' ESTs had a verified 5' terminus in the sense strand. Such previously ignored features provide valuable insight for data quality control and validation of error-prone ESTs, as well as the ability to identify novel functional motifs embedded in large EST datasets. We found that "double-termini adapters" were effective indicators of potential EST chimeras. For all sequences with in-silico verified termini/terminus, we used InterProScan to assign protein domain signatures, results of which are available for in-depth exploration using our biologist-friendly web interfaces. Conclusion ConiferEST represents a unique and complementary public resource for EST data integration and mining in conifers by reprocessing raw DNA traces, identifying putative sequence features and determining and annotating in-silico verified features. Seamlessly integrated with other public resources, ConiferEST provides biologists powerful tools to verify data, visualize abnormalities, including EST chimeras, and explore large EST datasets.
Collapse
Affiliation(s)
- Chun Liang
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Gang Wang
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Lin Liu
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian, 361005, China
| | - Lin Fang
- Beijing Genomics Institute, Beijing 101300, China
| | - Yuansheng Liu
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Kikia Carter
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Jason S Webb
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Jeffrey FD Dean
- Warnell School of Forestry and Natural Resources, University of Georgia, Athens, Georgia 30602, USA
| |
Collapse
|