1
|
Rodriguez JM, Abascal F, Cerdán-Vélez D, Gómez LM, Vázquez J, Tress ML. Evidence for widespread translation of 5' untranslated regions. Nucleic Acids Res 2024; 52:8112-8126. [PMID: 38953162 DOI: 10.1093/nar/gkae571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 06/07/2024] [Accepted: 06/19/2024] [Indexed: 07/03/2024] Open
Abstract
Ribosome profiling experiments support the translation of a range of novel human open reading frames. By contrast, most peptides from large-scale proteomics experiments derive from just one source, 5' untranslated regions. Across the human genome we find evidence for 192 translated upstream regions, most of which would produce protein isoforms with extended N-terminal ends. Almost all of these N-terminal extensions are from highly abundant genes, which suggests that the novel regions we detect are just the tip of the iceberg. These upstream regions have characteristics that are not typical of coding exons. Their GC-content is remarkably high, even higher than 5' regions in other genes, and a large majority have non-canonical start codons. Although some novel upstream regions have cross-species conservation - five have orthologues in invertebrates for example - the reading frames of two thirds are not conserved beyond simians. These non-conserved regions also have no evidence of purifying selection, which suggests that much of this translation is not functional. In addition, non-conserved upstream regions have significantly more peptides in cancer cell lines than would be expected, a strong indication that an aberrant or noisy translation initiation process may play an important role in translation from upstream regions.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA. UK
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Jesús Vázquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| |
Collapse
|
2
|
Qin Z, Yang J, Zhang K, Gao X, Ran Q, Xu Y, Wang Z, Lou D, Huang C, Zellmer L, Meng G, Chen N, Ma H, Wang Z, Liao DJ. Updating mRNA variants of the human RSK4 gene and their expression in different stressed situations. Heliyon 2024; 10:e27475. [PMID: 38560189 PMCID: PMC10980951 DOI: 10.1016/j.heliyon.2024.e27475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Revised: 02/11/2024] [Accepted: 02/29/2024] [Indexed: 04/04/2024] Open
Abstract
We determined RNA spectrum of the human RSK4 (hRSK4) gene (also called RPS6KA6) and identified 29 novel mRNA variants derived from alternative splicing, which, plus the NCBI-documented ones and the five we reported previously, totaled 50 hRSK4 RNAs that, by our bioinformatics analyses, encode 35 hRSK4 protein isoforms of 35-762 amino acids. Many of the mRNAs are bicistronic or tricistronic for hRSK4. The NCBI-normalized NM_014496.5 and the protein it encodes are designated herein as the Wt-1 mRNA and protein, respectively, whereas the NM_001330512.1 and the long protein it encodes are designated as the Wt-2 mRNA and protein, respectively. Many of the mRNA variants responded differently to different situations of stress, including serum starvation, a febrile temperature, treatment with ethanol or ethanol-extracted clove buds (an herbal medicine), whereas the same stressed situation often caused quite different alterations among different mRNA variants in different cell lines. Mosifloxacin, an antibiotics and also a functional inhibitor of hRSK4, could inhibit the expression of certain hRSK4 mRNA variants. The hRSK4 gene likely uses alternative splicing as a handy tool to adapt to different stressed situations, and the mRNA and protein multiplicities may partly explain the incongruous literature on its expression and comports.
Collapse
Affiliation(s)
- Zhenwei Qin
- Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
| | - Jianglin Yang
- Center for Clinical Laboratories, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Rd, Guiyang, 550004, Guizhou Province, China
- Key Lab of Endemic and Ethnic Diseases of the Ministry of Education of China in Guizhou Medical University, Guiyang, 550004, Guizhou Province, China
| | - Keyin Zhang
- Department of Pathology, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Xia Gao
- Department of Pathology, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Qianchuan Ran
- Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
| | - Yuanhong Xu
- Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
| | - Zhi Wang
- Department of Pathology, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Didong Lou
- Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
| | - Chunhua Huang
- Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
| | - Lucas Zellmer
- Department of Medicine, Hennepin County Medical Center, 730 South 8th St., Minneapolis, MN, 55415, USA
| | - Guangxue Meng
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Guizhou Medical University, 9 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Na Chen
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Guizhou Medical University, 9 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Hong Ma
- Department of Oral and Maxillofacial Surgery, School of Stomatology, Guizhou Medical University, 9 Beijing Road, Guiyang, 550004, Guizhou Province, China
| | - Zhe Wang
- State Key Laboratory of Cancer Biology, Department of Pathology, Xijing Hospital, Air Force Medical University, 169 Changle West Road, Xi'an, 710032, China
| | - Dezhong Joshua Liao
- Center for Clinical Laboratories, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Rd, Guiyang, 550004, Guizhou Province, China
- Key Lab of Endemic and Ethnic Diseases of the Ministry of Education of China in Guizhou Medical University, Guiyang, 550004, Guizhou Province, China
| |
Collapse
|
3
|
Kafita D, Nkhoma P, Dzobo K, Sinkala M. Shedding light on the dark genome: Insights into the genetic, CRISPR-based, and pharmacological dependencies of human cancers and disease aggressiveness. PLoS One 2023; 18:e0296029. [PMID: 38117798 PMCID: PMC10732413 DOI: 10.1371/journal.pone.0296029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 12/05/2023] [Indexed: 12/22/2023] Open
Abstract
Investigating the human genome is vital for identifying risk factors and devising effective therapies to combat genetic disorders and cancer. Despite the extensive knowledge of the "light genome", the poorly understood "dark genome" remains understudied. In this study, we integrated data from 20,412 protein-coding genes in Pharos and 8,395 patient-derived tumours from The Cancer Genome Atlas (TCGA) to examine the genetic and pharmacological dependencies in human cancers and their treatment implications. We discovered that dark genes exhibited high mutation rates in certain cancers, similar to light genes. By combining the drug response profiles of cancer cells with cell fitness post-CRISPR-mediated gene knockout, we identified the crucial vulnerabilities associated with both dark and light genes. Our analysis also revealed that tumours harbouring dark gene mutations displayed worse overall and disease-free survival rates than those without such mutations. Furthermore, dark gene expression levels significantly influenced patient survival outcomes. Our findings demonstrated a similar distribution of genetic and pharmacological dependencies across the light and dark genomes, suggesting that targeting the dark genome holds promise for cancer treatment. This study underscores the need for ongoing research on the dark genome to better comprehend the underlying mechanisms of cancer and develop more effective therapies.
Collapse
Affiliation(s)
- Doris Kafita
- Department of Biomedical Sciences, University of Zambia, School of Health Sciences, Lusaka, Zambia
| | - Panji Nkhoma
- Department of Biomedical Sciences, University of Zambia, School of Health Sciences, Lusaka, Zambia
| | - Kevin Dzobo
- Department of Medicine, Division of Dermatology, Hair and Skin Research Laboratory, Wound and Keloid Scarring Research Unit, The South African Medical Research Council, University of Cape Town, Cape Town, South Africa
| | - Musalula Sinkala
- Department of Biomedical Sciences, University of Zambia, School of Health Sciences, Lusaka, Zambia
- Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine and Department of Integrative Biomedical Sciences, University of Cape Town, Computational Biology Division, Cape Town, South Africa
| |
Collapse
|
4
|
Carrion SA, Michal JJ, Jiang Z. Alternative Transcripts Diversify Genome Function for Phenome Relevance to Health and Diseases. Genes (Basel) 2023; 14:2051. [PMID: 38002994 PMCID: PMC10671453 DOI: 10.3390/genes14112051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/26/2023] Open
Abstract
Manipulation using alternative exon splicing (AES), alternative transcription start (ATS), and alternative polyadenylation (APA) sites are key to transcript diversity underlying health and disease. All three are pervasive in organisms, present in at least 50% of human protein-coding genes. In fact, ATS and APA site use has the highest impact on protein identity, with their ability to alter which first and last exons are utilized as well as impacting stability and translation efficiency. These RNA variants have been shown to be highly specific, both in tissue type and stage, with demonstrated importance to cell proliferation, differentiation and the transition from fetal to adult cells. While alternative exon splicing has a limited effect on protein identity, its ubiquity highlights the importance of these minor alterations, which can alter other features such as localization. The three processes are also highly interwoven, with overlapping, complementary, and competing factors, RNA polymerase II and its CTD (C-terminal domain) chief among them. Their role in development means dysregulation leads to a wide variety of disorders and cancers, with some forms of disease disproportionately affected by specific mechanisms (AES, ATS, or APA). Challenges associated with the genome-wide profiling of RNA variants and their potential solutions are also discussed in this review.
Collapse
Affiliation(s)
| | | | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA 99164-7620, USA; (S.A.C.); (J.J.M.)
| |
Collapse
|
5
|
He H, Wu C, Saqib M, Hao R. Single-molecule fluorescence methods for protein biomarker analysis. Anal Bioanal Chem 2023:10.1007/s00216-022-04502-9. [PMID: 36609860 DOI: 10.1007/s00216-022-04502-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/07/2022] [Accepted: 12/20/2022] [Indexed: 01/09/2023]
Abstract
Proteins have been considered key building blocks of life. In particular, the protein content of an organism and a cell offers significant information for the in-depth understanding of the disease and biological processes. Single-molecule protein detection/sequencing tools will revolutionize clinical (proteomics) research, offering ultrasensitivity for low-abundance biomarker (protein) detection, which is important for the realization of early-stage disease diagnosis and single-cell proteomics. This improved detection/measurement capability delivers new sets of techniques to explore new frontiers and address important challenges in various interdisciplinary areas including nanostructured materials, molecular medicine, molecular biology, and chemistry. Importantly, fluorescence-based methods have emerged as indispensable tools for single protein detection/sequencing studies, providing a higher signal-to-noise ratio (SNR). Improvements in fluorescent dyes/probes and detector capabilities coupled with advanced (image) analysis strategies have fueled current developments for single protein biomarker detections. For example, in comparison to conventional ELISA (i.e., based on ensembled measurements), single-molecule fluorescence detection is more sensitive, faster, and more accurate with reduced background, high-throughput, and so on. In comparison to MS sequencing, fluorescence-based single-molecule protein sequencing can achieve the sequencing of peptides themselves with higher sensitivity. This review summarizes various typical single-molecule detection technologies including their methodology (modes of operation), detection limits, advantages and drawbacks, and current challenges with recent examples. We describe the fluorescence-based single-molecule protein sequencing/detection based on five kinds of technologies such as fluorosequencing, N-terminal amino acid binder, nanopore light sensing, and DNA nanotechnology. Finally, we present our perspective for developing high-performance fluorescence-based sequencing/detection techniques.
Collapse
Affiliation(s)
- Haihan He
- Department of Chemistry, Southern University of Science and Technology, Shenzhen, 518055, China.,Research Center for Chemical Biology and Omics Analysis, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Chuhong Wu
- Department of Chemistry, Southern University of Science and Technology, Shenzhen, 518055, China.,Research Center for Chemical Biology and Omics Analysis, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Muhammad Saqib
- Department of Chemistry, Southern University of Science and Technology, Shenzhen, 518055, China.,Research Center for Chemical Biology and Omics Analysis, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China.,Institute of Chemistry, Khwaja Fareed University of Engineering & Information Technology, Rahim Yar Khan 64200, Pakistan
| | - Rui Hao
- Department of Chemistry, Southern University of Science and Technology, Shenzhen, 518055, China. .,Research Center for Chemical Biology and Omics Analysis, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China.
| |
Collapse
|
6
|
Zhang J, Lin X, Chen Y, Li T, Lee AC, Chow EY, Cho WC, Chan T. LAFITE Reveals the Complexity of Transcript Isoforms in Subcellular Fractions. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2203480. [PMID: 36461702 PMCID: PMC9875686 DOI: 10.1002/advs.202203480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 10/28/2022] [Indexed: 06/17/2023]
Abstract
Characterization of the subcellular distribution of RNA is essential for understanding the molecular basis of biological processes. Here, the subcellular nanopore direct RNA-sequencing (DRS) of four lung cancer cell lines (A549, H1975, H358, and HCC4006) is performed, coupled with a computational pipeline, Low-abundance Aware Full-length Isoform clusTEr (LAFITE), to comprehensively analyze the full-length cytoplasmic and nuclear transcriptome. Using additional DRS and orthogonal data sets, it is shown that LAFITE outperforms current methods for detecting full-length transcripts, particularly for low-abundance isoforms that are usually overlooked due to poor read coverage. Experimental validation of six novel isoforms exclusively identified by LAFITE further confirms the reliability of this pipeline. By applying LAFITE to subcellular DRS data, the complexity of the nuclear transcriptome is revealed in terms of isoform diversity, 3'-UTR usage, m6A modification patterns, and intron retention. Overall, LAFITE provides enhanced full-length isoform identification and enables a high-resolution view of the RNA landscape at the isoform level.
Collapse
Affiliation(s)
- Jizhou Zhang
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
- State Key Laboratory of AgrobiotechnologyThe Chinese University of Hong KongShatinHong Kong SARChina
| | - Xiao Lin
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
- State Key Laboratory of AgrobiotechnologyThe Chinese University of Hong KongShatinHong Kong SARChina
| | - Yuelong Chen
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
| | - Tsz‐Ho Li
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
- State Key Laboratory of AgrobiotechnologyThe Chinese University of Hong KongShatinHong Kong SARChina
| | - Alan Chun‐Kit Lee
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
| | | | | | - Ting‐Fung Chan
- School of Life SciencesThe Chinese University of Hong KongShatinHong Kong SARChina
- State Key Laboratory of AgrobiotechnologyThe Chinese University of Hong KongShatinHong Kong SARChina
| |
Collapse
|
7
|
Singh D, Roy J. A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs. Nucleic Acids Res 2022; 50:12094-12111. [PMID: 36420898 PMCID: PMC9757047 DOI: 10.1093/nar/gkac1092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 10/22/2022] [Accepted: 10/28/2022] [Indexed: 11/27/2022] Open
Abstract
Identification of protein-coding and non-coding transcripts is paramount for understanding their biological roles. Computational approaches have been addressing this task for over a decade; however, generalized and high-performance models are still unreliable. This benchmark study assessed the performance of 24 tools producing >55 models on the datasets covering a wide range of species. We have collected 135 small and large transcriptomic datasets from existing studies for comparison and identified the potential bottlenecks hampering the performance of current tools. The key insights of this study include lack of standardized training sets, reliance on homogeneous training data, gradual changes in annotated data, lack of augmentation with homology searches, the presence of false positives and negatives in datasets and the lower performance of end-to-end deep learning models. We also derived a new dataset, RNAChallenge, from the benchmark considering hard instances that may include potential false alarms. The best and least well performing models under- and overfit the dataset, respectively, thereby serving a dual purpose. For computational approaches, it will be valuable to develop accurate and unbiased models. The identification of false alarms will be of interest for genome annotators, and experimental study of hard RNAs will help to untangle the complexity of the RNA world.
Collapse
Affiliation(s)
- Dalwinder Singh
- To whom correspondence should be addressed. Tel: +91 172 5221206;
| | - Joy Roy
- Correspondence may also be addressed to Joy Roy.
| |
Collapse
|
8
|
Martinez-Gomez L, Cerdán-Vélez D, Abascal F, Tress ML. Origins and Evolution of Human Tandem Duplicated Exon Substitution Events. Genome Biol Evol 2022; 14:6809199. [PMID: 36346145 PMCID: PMC9741552 DOI: 10.1093/gbe/evac162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 10/25/2022] [Accepted: 10/29/2022] [Indexed: 11/10/2022] Open
Abstract
The mutually exclusive splicing of tandem duplicated exons produces protein isoforms that are identical save for a homologous region that allows for the fine tuning of protein function. Tandem duplicated exon substitution events are rare, yet highly important alternative splicing events. Most events are ancient, their isoforms are highly expressed, and they have significantly more pathogenic mutations than other splice events. Here, we analyzed the physicochemical properties and functional roles of the homologous polypeptide regions produced by the 236 tandem duplicated exon substitutions annotated in the human gene set. We find that the most important structural and functional residues in these homologous regions are maintained, and that most changes are conservative rather than drastic. Three quarters of the isoforms produced from tandem duplicated exon substitution events are tissue-specific, particularly in nervous and cardiac tissues, and tandem duplicated exon substitution events are enriched in functional terms related to structures in the brain and skeletal muscle. We find considerable evidence for the convergent evolution of tandem duplicated exon substitution events in vertebrates, arthropods, and nematodes. Twelve human gene families have orthologues with tandem duplicated exon substitution events in both Drosophila melanogaster and Caenorhabditis elegans. Six of these gene families are ion transporters, suggesting that tandem exon duplication in genes that control the flow of ions into the cell has an adaptive benefit. The ancient origins, the strong indications of tissue-specific functions, and the evidence of convergent evolution suggest that these events may have played important roles in the evolution of animal tissues and organs.
Collapse
Affiliation(s)
- Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom
| | | |
Collapse
|
9
|
Wang WC, Lai YC. DUSP5 and PHLDA1 mutations in mature cystic teratomas of the ovary identified on whole-exome sequencing may explain teratoma characteristics. Hum Genomics 2022; 16:50. [PMID: 36289533 PMCID: PMC9609193 DOI: 10.1186/s40246-022-00424-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/19/2022] [Indexed: 11/21/2022] Open
Abstract
Background Mature cystic teratomas of the ovary are the most common type of germ cell tumor, comprising 33% of ovarian tumors. Studying these tumors may result in a better understanding of their stepwise developmental processes and molecular bases and provide useful information for the development of tissue-engineering technologies. Methods In the present study, 9 mature cystic teratomas of the ovary were analyzed by whole-exome sequencing and the results were compared with the Catalogue of Somatic Mutations in Cancer and dbSNP databases. Results Mutations were validated in 15 genes with alterations in all 9 (100%) samples and changes in protein coding. The top 10 mutated genes were FLG, MUC17, MUC5B, RP1L1, NBPF1, GOLGA6L2, SLC29A3, SGK223, PTGFRN, and FAM186A. Moreover, 7 variants in exons with changes in protein coding are likely of importance in the development of mature cystic teratomas of the ovary, namely PTGFRN, DUSP5, MPP2, PHLDA1, PRR21, GOLGA6L2, and KRTAP4-2. Conclusions These genetic alterations may play an important etiological role in teratoma formation. Moreover, novel mutations in DUSP5 and PHLDA1 genes found on whole-exome sequencing may help to explain the characteristics of teratomas. Supplementary Information The online version contains supplementary material available at 10.1186/s40246-022-00424-w.
Collapse
Affiliation(s)
- Wen-Chung Wang
- grid.414969.70000 0004 0642 8534Department of Obstetrics and Gynecology, Jen-Ai Hospital, Taichung, 412 Taiwan
| | - Yen-Chein Lai
- grid.411641.70000 0004 0532 2041Department of Medical Laboratory and Biotechnology, Chung Shan Medical University, No. 110, Sec. 1, Chien Kuo N. Road, Taichung, 402 Taiwan ,grid.411645.30000 0004 0638 9256Clinical Laboratory, Chung Shan Medical University Hospital, Taichung, Taiwan
| |
Collapse
|
10
|
Pozo F, Rodriguez JM, Martínez Gómez L, Vázquez J, Tress ML. APPRIS principal isoforms and MANE Select transcripts define reference splice variants. Bioinformatics 2022; 38:ii89-ii94. [PMID: 36124785 PMCID: PMC9486585 DOI: 10.1093/bioinformatics/btac473] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Selecting the splice variant that best represents a coding gene is a crucial first step in many experimental analyses, and vital for mapping clinically relevant variants. This study compares the longest isoforms, MANE Select transcripts, APPRIS principal isoforms, and expression data, and aims to determine which method is best for selecting biological important reference splice variants for large-scale analyses. RESULTS Proteomics analyses and human genetic variation data suggest that most coding genes have a single main protein isoform. We show that APPRIS principal isoforms and MANE Select transcripts best describe these main cellular isoforms, and find that using the longest splice variant as the representative is a poor strategy. Exons unique to the longest splice isoforms are not under selective pressure, and so are unlikely to be functionally relevant. Expression data are also a poor means of selecting the main splice variant. APPRIS principal and MANE Select exons are under purifying selection, while exons specific to alternative transcripts are not. There are MANE and APPRIS representatives for almost 95% of genes, and where they agree they are particularly effective, coinciding with the main proteomics isoform for over 98.2% of genes. AVAILABILITY AND IMPLEMENTATION APPRIS principal isoforms for human, mouse and other model species can be downloaded from the APPRIS database (https://appris.bioinfo.cnio.es), GENCODE genes (https://www.gencodegenes.org/) and the Ensembl website (https://www.ensembl.org). MANE Select transcripts for the human reference set are available from the Ensembl, GENCODE and RefSeq databases (https://www.ncbi.nlm.nih.gov/refseq/). Lists of splice variants where MANE and APPRIS coincide are available from the APPRIS database. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - José Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Jesús Vázquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain,CIBER de Investigaciones Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | | |
Collapse
|
11
|
The effects of sequencing depth on the assembly of coding and noncoding transcripts in the human genome. BMC Genomics 2022; 23:487. [PMID: 35787153 PMCID: PMC9251931 DOI: 10.1186/s12864-022-08717-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 06/16/2022] [Indexed: 12/30/2022] Open
Abstract
Investigating the functions and activities of genes requires proper annotation of the transcribed units. However, transcript assembly efforts have produced a surprisingly large variation in the number of transcripts, and especially so for noncoding transcripts. This heterogeneity in assembled transcript sets might be partially explained by sequencing depth. Here, we used real and simulated short-read sequencing data as well as long-read data to systematically investigate the impact of sequencing depths on the accuracy of assembled transcripts. We assembled and analyzed transcripts from 671 human short-read data sets and four long-read data sets. At the first level, there is a positive correlation between the number of reads and the number of recovered transcripts. However, the effect of the sequencing depth varied based on cell or tissue type, the type of read and the nature and expression levels of the transcripts. The detection of coding transcripts saturated rapidly with both short and long-reads, however, there was no sign of early saturation for noncoding transcripts at any sequencing depth. Increasing long-read sequencing depth specifically benefited transcripts containing transposable elements. Finally, we show how single-cell RNA-seq can be guided by transcripts assembled from bulk long-read samples, and demonstrate that noncoding transcripts are expressed at similar levels to coding transcripts but are expressed in fewer cells. This study highlights the impact of sequencing depth on transcript assembly.
Collapse
|
12
|
How Far Are We from the Completion of the Human Protein Interactome Reconstruction? Biomolecules 2022; 12:biom12010140. [PMID: 35053288 PMCID: PMC8774112 DOI: 10.3390/biom12010140] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 01/09/2022] [Accepted: 01/11/2022] [Indexed: 12/12/2022] Open
Abstract
After more than fifteen years from the first high-throughput experiments for human protein–protein interaction (PPI) detection, we are still wondering how close the completion of the genome-scale human PPI network reconstruction is, what needs to be further explored and whether the biological insights gained from the holistic investigation of the current network are valid and useful. The unique structure of PICKLE, a meta-database of the human experimentally determined direct PPI network developed by our group, presently covering ~80% of the UniProtKB/Swiss-Prot reviewed human complete proteome, enables the evaluation of the interactome expansion by comparing the successive PICKLE releases since 2013. We observe a gradual overall increase of 39%, 182%, and 67% in protein nodes, PPIs, and supporting references, respectively. Our results indicate that, in recent years, (a) the PPI addition rate has decreased, (b) the new PPIs are largely determined by high-throughput experiments and mainly concern existing protein nodes and (c), as we had predicted earlier, most of the newly added protein nodes have a low degree. These observations, combined with a largely overlapping k-core between PICKLE releases and a network density increase, imply that an almost complete picture of a structurally defined network has been reached. The comparative unsupervised application of two clustering algorithms indicated that exploring the full interactome topology can reveal the protein neighborhoods involved in closely related biological processes as transcriptional regulation, cell signaling and multiprotein complexes such as the connexon complex associated with cancers. A well-reconstructed human protein interactome is a powerful tool in network biology and medicine research forming the basis for multi-omic and dynamic analyses.
Collapse
|
13
|
Profit versus Quality: The Enigma of Scientific Wellness. J Pers Med 2022; 12:jpm12010034. [PMID: 35055349 PMCID: PMC8779909 DOI: 10.3390/jpm12010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 12/21/2021] [Accepted: 12/24/2021] [Indexed: 11/23/2022] Open
Abstract
The “best of both worlds” is not often the case when it comes to implementing new health models, particularly in community settings. It is often a struggle between choosing or balancing between two components: depth of research or financial profit. This has become even more apparent with the recent shift to move away from a traditionally reactive model of medicine toward a predictive/preventative one. This has given rise to many new concepts and approaches with a variety of often overlapping aims. The purpose of this perspective is to highlight the pros and cons of the numerous ventures already implementing new concepts, to varying degrees, in community settings of quite differing scales—some successful and some falling short. Scientific wellness is a complex, multifaceted concept that requires integrated experimental/analytical designs that demand both high-quality research/healthcare and significant funding. We currently see the more likely long-term success of those ventures in which any profit is largely reinvested into research efforts and health/healthspan is the primary focus.
Collapse
|
14
|
Porta-Pardo E, Ruiz-Serra V, Valentini S, Valencia A. The structural coverage of the human proteome before and after AlphaFold. PLoS Comput Biol 2022; 18:e1009818. [PMID: 35073311 PMCID: PMC8812986 DOI: 10.1371/journal.pcbi.1009818] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 02/03/2022] [Accepted: 01/07/2022] [Indexed: 12/12/2022] Open
Abstract
The protein structure field is experiencing a revolution. From the increased throughput of techniques to determine experimental structures, to developments such as cryo-EM that allow us to find the structures of large protein complexes or, more recently, the development of artificial intelligence tools, such as AlphaFold, that can predict with high accuracy the folding of proteins for which the availability of homology templates is limited. Here we quantify the effect of the recently released AlphaFold database of protein structural models in our knowledge on human proteins. Our results indicate that our current baseline for structural coverage of 48%, considering experimentally-derived or template-based homology models, elevates up to 76% when including AlphaFold predictions. At the same time the fraction of dark proteome is reduced from 26% to just 10% when AlphaFold models are considered. Furthermore, although the coverage of disease-associated genes and mutations was near complete before AlphaFold release (69% of Clinvar pathogenic mutations and 88% of oncogenic mutations), AlphaFold models still provide an additional coverage of 3% to 13% of these critically important sets of biomedical genes and mutations. Finally, we show how the contribution of AlphaFold models to the structural coverage of non-human organisms, including important pathogenic bacteria, is significantly larger than that of the human proteome. Overall, our results show that the sequence-structure gap of human proteins has almost disappeared, an outstanding success of direct consequences for the knowledge on the human genome and the derived medical applications. Protein structures are key to understand many biological phenomena at the molecular scale: from the effects of genetic variation to how different proteins interact with each other to create molecular pathways that, together, have a biological function. Obtaining experimental structures, however, is extremely consuming in terms of both, time and resources. For this and other reasons, scientists have long worked to develop computational approaches that predict the structure of a protein using only its sequence as input. Recently, a group of scientists at Deepmind have developed AlphaFold2, a computational tool that is extremely accurate at this task. Moreover, they have used this tool to predict the structures of all human proteins. In this manuscript we provide an overview of the structural coverage of the human proteome before AlphaFold models were released and how much we have gained thanks to these models. We also show how the gain affects our understanding of human pathogenic variants, both germline and somatic. Finally, we provide evidence suggesting that the gain in non-human organisms is larger than for the human proteome, particularly in the case of bacteria.
Collapse
Affiliation(s)
- Eduard Porta-Pardo
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain
- * E-mail: (EP-P); (AV)
| | - Victoria Ruiz-Serra
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain
| | - Samuel Valentini
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento, Italy
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Institució Catalana de Recerca Avançada (ICREA), Barcelona, Spain
- * E-mail: (EP-P); (AV)
| |
Collapse
|
15
|
Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021; 433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]
Abstract
Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| | - Chris P Ponting
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| |
Collapse
|
16
|
Carbonara K, Andonovski M, Coorssen JR. Proteomes Are of Proteoforms: Embracing the Complexity. Proteomes 2021; 9:38. [PMID: 34564541 PMCID: PMC8482110 DOI: 10.3390/proteomes9030038] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 08/24/2021] [Accepted: 08/29/2021] [Indexed: 12/17/2022] Open
Abstract
Proteomes are complex-much more so than genomes or transcriptomes. Thus, simplifying their analysis does not simplify the issue. Proteomes are of proteoforms, not canonical proteins. While having a catalogue of amino acid sequences provides invaluable information, this is the Proteome-lite. To dissect biological mechanisms and identify critical biomarkers/drug targets, we must assess the myriad of proteoforms that arise at any point before, after, and between translation and transcription (e.g., isoforms, splice variants, and post-translational modifications [PTM]), as well as newly defined species. There are numerous analytical methods currently used to address proteome depth and here we critically evaluate these in terms of the current 'state-of-the-field'. We thus discuss both pros and cons of available approaches and where improvements or refinements are needed to quantitatively characterize proteomes. To enable a next-generation approach, we suggest that advances lie in transdisciplinarity via integration of current proteomic methods to yield a unified discipline that capitalizes on the strongest qualities of each. Such a necessary (if not revolutionary) shift cannot be accomplished by a continued primary focus on proteo-genomics/-transcriptomics. We must embrace the complexity. Yes, these are the hard questions, and this will not be easy…but where is the fun in easy?
Collapse
Affiliation(s)
| | | | - Jens R. Coorssen
- Faculties of Applied Health Sciences and Mathematics & Science, Departments of Health Sciences and Biological Sciences, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON L2S 3A1, Canada; (K.C.); (M.A.)
| |
Collapse
|
17
|
Babarinde IA, Ma G, Li Y, Deng B, Luo Z, Liu H, Abdul MM, Ward C, Chen M, Fu X, Shi L, Duttlinger M, He J, Sun L, Li W, Zhuang Q, Tong G, Frampton J, Cazier JB, Chen J, Jauch R, Esteban MA, Hutchins AP. Transposable element sequence fragments incorporated into coding and noncoding transcripts modulate the transcriptome of human pluripotent stem cells. Nucleic Acids Res 2021; 49:9132-9153. [PMID: 34390351 PMCID: PMC8450112 DOI: 10.1093/nar/gkab710] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 07/29/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open
Abstract
Transposable elements (TEs) occupy nearly 40% of mammalian genomes and, whilst most are fragmentary and no longer capable of transposition, they can nevertheless contribute to cell function. TEs within genes transcribed by RNA polymerase II can be copied as parts of primary transcripts; however, their full contribution to mature transcript sequences remains unresolved. Here, using long and short read (LR and SR) RNA sequencing data, we show that 26% of coding and 65% of noncoding transcripts in human pluripotent stem cells (hPSCs) contain TE-derived sequences. Different TE families are incorporated into RNAs in unique patterns, with consequences to transcript structure and function. The presence of TE sequences within a transcript is correlated with TE-type specific changes in its subcellular distribution, alterations in steady-state levels and half-life, and differential association with RNA Binding Proteins (RBPs). We identify hPSC-specific incorporation of endogenous retroviruses (ERVs) and LINE:L1 into protein-coding mRNAs, which generate TE sequence-derived peptides. Finally, single cell RNA-seq reveals that hPSCs express ERV-containing transcripts, whilst differentiating subpopulations lack ERVs and express SINE and LINE-containing transcripts. Overall, our comprehensive analysis demonstrates that the incorporation of TE sequences into the RNAs of hPSCs is more widespread and has a greater impact than previously appreciated.
Collapse
Affiliation(s)
- Isaac A Babarinde
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Gang Ma
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yuhao Li
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Boping Deng
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK
| | - Zhiwei Luo
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Hao Liu
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Mazid Md Abdul
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Carl Ward
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Minchun Chen
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Xiuling Fu
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Liyang Shi
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Martha Duttlinger
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Jiangping He
- Center for Cell Lineage and Atlas (CCLA), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China
| | - Li Sun
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Wenjuan Li
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Qiang Zhuang
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Guoqing Tong
- Center for Reproductive Medicine, Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai 200120, China
| | - Jon Frampton
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK
| | - Jean-Baptiste Cazier
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK.,Centre for Computational Biology, University of Birmingham, Birmingham, UK
| | - Jiekai Chen
- Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Center for Cell Lineage and Atlas (CCLA), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China.,Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
| | - Ralf Jauch
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Miguel A Esteban
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China
| | - Andrew P Hutchins
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| |
Collapse
|
18
|
Martinez Gomez L, Pozo F, Walsh TA, Abascal F, Tress ML. The clinical importance of tandem exon duplication-derived substitutions. Nucleic Acids Res 2021; 49:8232-8246. [PMID: 34302486 PMCID: PMC8373072 DOI: 10.1093/nar/gkab623] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/21/2021] [Indexed: 01/04/2023] Open
Abstract
Most coding genes in the human genome are annotated with multiple alternative transcripts. However, clear evidence for the functional relevance of the protein isoforms produced by these alternative transcripts is often hard to find. Alternative isoforms generated from tandem exon duplication-derived substitutions are an exception. These splice events are rare, but have important functional consequences. Here, we have catalogued the 236 tandem exon duplication-derived substitutions annotated in the GENCODE human reference set. We find that more than 90% of the events have a last common ancestor in teleost fish, so are at least 425 million years old, and twenty-one can be traced back to the Bilateria clade. Alternative isoforms generated from tandem exon duplication-derived substitutions also have significantly more clinical impact than other alternative isoforms. Tandem exon duplication-derived substitutions have >25 times as many pathogenic and likely pathogenic mutations as other alternative events. Tandem exon duplication-derived substitutions appear to have vital functional roles in the cell and may have played a prominent part in metazoan evolution.
Collapse
Affiliation(s)
- Laura Martinez Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain.,Eukaryotic Annotation Team, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA. UK
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| |
Collapse
|
19
|
Pozo F, Martinez-Gomez L, Walsh TA, Rodriguez JM, Di Domenico T, Abascal F, Vazquez J, Tress ML. Assessing the functional relevance of splice isoforms. NAR Genom Bioinform 2021; 3:lqab044. [PMID: 34046593 PMCID: PMC8140736 DOI: 10.1093/nargab/lqab044] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 04/22/2021] [Accepted: 05/17/2021] [Indexed: 12/20/2022] Open
Abstract
Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. There is only limited evidence for alternative proteins in proteomics analyses and data from population genetic variation studies indicate that most alternative exons are evolving neutrally. Determining which transcripts produce biologically important isoforms is key to understanding isoform function and to interpreting the real impact of somatic mutations and germline variations. Here we have developed a method, TRIFID, to classify the functional importance of splice isoforms. TRIFID was trained on isoforms detected in large-scale proteomics analyses and distinguishes these biologically important splice isoforms with high confidence. Isoforms predicted as functionally important by the algorithm had measurable cross species conservation and significantly fewer broken functional domains. Additionally, exons that code for these functionally important protein isoforms are under purifying selection, while exons from low scoring transcripts largely appear to be evolving neutrally. TRIFID has been developed for the human genome, but it could in principle be applied to other well-annotated species. We believe that this method will generate valuable insights into the cellular importance of alternative splicing.
Collapse
Affiliation(s)
- Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - José Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Tomas Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Jesús Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
20
|
Agostini F, Zagalak J, Attig J, Ule J, Luscombe NM. Intergenic RNA mainly derives from nascent transcripts of known genes. Genome Biol 2021; 22:136. [PMID: 33952325 PMCID: PMC8097831 DOI: 10.1186/s13059-021-02350-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 04/12/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Eukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Transcription is not restricted to regions with annotated gene features but includes almost any genomic context. Currently, the source and function of most RNAs originating from intergenic regions in the human genome remain unclear. RESULTS We hypothesize that many intergenic RNAs can be ascribed to the presence of as-yet unannotated genes or the "fuzzy" transcription of known genes that extends beyond the annotated boundaries. To elucidate the contributions of these two sources, we assemble a dataset of more than 2.5 billion publicly available RNA-seq reads across 5 human cell lines and multiple cellular compartments to annotate transcriptional units in the human genome. About 80% of transcripts from unannotated intergenic regions can be attributed to the fuzzy transcription of existing genes; the remaining transcripts originate mainly from putative long non-coding RNA loci that are rarely spliced. We validate the transcriptional activity of these intergenic RNAs using independent measurements, including transcriptional start sites, chromatin signatures, and genomic occupancies of RNA polymerase II in various phosphorylation states. We also analyze the nuclear localization and sensitivities of intergenic transcripts to nucleases to illustrate that they tend to be rapidly degraded either on-chromatin by XRN2 or off-chromatin by the exosome. CONCLUSIONS We provide a curated atlas of intergenic RNAs that distinguishes between alternative processing of well-annotated genes from independent transcriptional units based on the combined analysis of chromatin signatures, nuclear RNA localization, and degradation pathways.
Collapse
Affiliation(s)
| | - Julian Zagalak
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
| | - Jan Attig
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | - Jernej Ule
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
| | - Nicholas M Luscombe
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
- UCL Genetics Institute, Department of Genetics, Environment and Evolution, University College London, Gower Street, London, WC1E 6BT, UK
- Okinawa Institute of Science & Technology Graduate University, 1919-1 Tancha, Onna-son, Kunigami-gun, Okinawa, 904-0495, Japan
| |
Collapse
|
21
|
Abstract
The number of complete genome sequences explodes more and more with each passing year. Thus, methods for genome annotation need to be honed constantly to handle the deluge of information. Annotation of pseudogenes (i.e., gene copies that appear not to make a functional protein) in genomes is a persistent problem; here, we overview pseudogene annotation methods that are based on the detection of sequence homology in genomic DNA.
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Biology, McGill University, Montreal, QC, Canada.
| |
Collapse
|
22
|
Li J, Zhan X. Mass spectrometry-based proteomics analyses of post-translational modifications and proteoforms in human pituitary adenomas. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1869:140584. [PMID: 33321259 DOI: 10.1016/j.bbapap.2020.140584] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 12/06/2020] [Accepted: 12/08/2020] [Indexed: 12/13/2022]
Abstract
Pituitary adenoma (PA) is a common intracranial neoplasm, which affects the hypothalamus-pituitary-target organ axis systems, and is hazardous to human health. Post-translational modifications (PTMs), including phosphorylation, ubiquitination, nitration, and sumoylation, are vitally important in the PA pathogenesis. The large-scale analysis of PTMs could provide a global view of molecular mechanisms for PA. Proteoforms, which are used to define various protein structural and functional forms originated from the same gene, are the future direction of proteomics research. The global studies of different proteoforms and PTMs of hypophyseal hormones such as growth hormone (GH) and prolactin (PRL) and the proportion change of different GH proteoforms or PRL proteoforms in human pituitary tissue could provide new insights into the clinical value of pituitary hormones in PAs. Multiple quantitative proteomics methods, including mass spectrometry (MS)-based label-free and stable isotope-labeled strategies in combination with different PTM-peptide enrichment methods such as TiO2 enrichment of tryptic phosphopeptides and antibody enrichment of other PTM-peptides increase the feasibility for researchers to study PA proteomes. This article reviews the research status of PTMs and proteoforms in PAs, including the enrichment method, technical limitation, quantitative proteomics strategies, and the future perspectives, to achieve the goals of in-depth understanding its molecular pathogenesis, and discovering effective biomarkers and clinical therapeutic targets for predictive, preventive, and personalized treatment of PA patients.
Collapse
Affiliation(s)
- Jiajia Li
- University Creative Research Initiatives Center, Shandong First Medical University, 6699 Qingdao Road, Jinan, Shandong 250117, P. R. China; Key Laboratory of Cancer Proteomics of Chinese Ministry of Health, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, Hunan 410008 P. R. China; State Local Joint Engineering Laboratory for Anticancer Drugs, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, Hunan 410008, PR China
| | - Xianquan Zhan
- University Creative Research Initiatives Center, Shandong First Medical University, 6699 Qingdao Road, Jinan, Shandong 250117, P. R. China; Key Laboratory of Cancer Proteomics of Chinese Ministry of Health, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, Hunan 410008 P. R. China; State Local Joint Engineering Laboratory for Anticancer Drugs, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, Hunan 410008, PR China; Department of Oncology, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, Hunan 410008, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, Hunan 410008, PR China.
| |
Collapse
|
23
|
Bogard B, Francastel C, Hubé F. Multiple information carried by RNAs: total eclipse or a light at the end of the tunnel? RNA Biol 2020; 17:1707-1720. [PMID: 32559119 PMCID: PMC7714488 DOI: 10.1080/15476286.2020.1783868] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 06/06/2020] [Accepted: 06/12/2020] [Indexed: 12/14/2022] Open
Abstract
The findings that an RNA is not necessarily either coding or non-coding, or that a precursor RNA can produce different types of mature RNAs, whether coding or non-coding, long or short, have challenged the dichotomous view of the RNA world almost 15 years ago. Since then, and despite an increasing number of studies, the diversity of information that can be conveyed by RNAs is rarely searched for, and when it is known, it remains largely overlooked in further functional studies. Here, we provide an update with prominent examples of multiple functions that are carried by the same RNA or are produced by the same precursor RNA, to emphasize their biological relevance in most living organisms. An important consequence is that the overall function of their locus of origin results from the balance between various RNA species with distinct functions and fates. The consideration of the molecular basis of this multiplicity of information is obviously crucial for downstream functional studies when the targeted functional molecule is often not the one that is believed.
Collapse
Affiliation(s)
- Baptiste Bogard
- Université De Paris, Epigenetics and Cell Fate, CNRS, Paris, France
| | | | - Florent Hubé
- Université De Paris, Epigenetics and Cell Fate, CNRS, Paris, France
| |
Collapse
|
24
|
Rodriguez JM, Pozo F, di Domenico T, Vazquez J, Tress ML. An analysis of tissue-specific alternative splicing at the protein level. PLoS Comput Biol 2020; 16:e1008287. [PMID: 33017396 PMCID: PMC7561204 DOI: 10.1371/journal.pcbi.1008287] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 10/15/2020] [Accepted: 08/25/2020] [Indexed: 01/09/2023] Open
Abstract
The role of alternative splicing is one of the great unanswered questions in cellular biology. There is strong evidence for alternative splicing at the transcript level, and transcriptomics experiments show that many splice events are tissue specific. It has been suggested that alternative splicing evolved in order to remodel tissue-specific protein-protein networks. Here we investigated the evidence for tissue-specific splicing among splice isoforms detected in a large-scale proteomics analysis. Although the data supporting alternative splicing is limited at the protein level, clear patterns emerged among the small numbers of alternative splice events that we could detect in the proteomics data. More than a third of these splice events were tissue-specific and most were ancient: over 95% of splice events that were tissue-specific in both proteomics and RNAseq analyses evolved prior to the ancestors of lobe-finned fish, at least 400 million years ago. By way of contrast, three in four alternative exons in the human gene set arose in the primate lineage, so our results cannot be extrapolated to the whole genome. Tissue-specific alternative protein forms in the proteomics analysis were particularly abundant in nervous and muscle tissues and their genes had roles related to the cytoskeleton and either the structure of muscle fibres or cell-cell connections. Our results suggest that this conserved tissue-specific alternative splicing may have played a role in the development of the vertebrate brain and heart. We manually curated a set of 255 splice events detected in a large-scale tissue-based proteomics experiment and found that more than a third had evidence of significant tissue-specific differences. Events that were significantly tissue-specific at the protein level were highly conserved; almost 75% evolved over 400 million years ago. The tissues in which we found most evidence for tissue-specific splicing were nervous tissues and cardiac tissues. Genes with tissue-specific events in these two tissues had functions related to important cellular structures in brain and heart tissues. These splice events may have been essential for the development of vertebrate heart and muscle. However, our data set may not be representative of alternative exons as a whole. We found that most tissue specific splicing was strongly conserved, but just 5% of annotated alternative exons in the human gene set are ancient. More than three quarters of alternative exons are primate-derived. Although the analysis does not provide a definitive answer to the question of the functional role of alternative splicing, our results do indicate that alternative splice variants may have played a significant part in the evolution of brain and heart tissues in vertebrates.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Calle Melchor Fernandez, Madrid, Spain
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez, Madrid, Spain
| | - Tomas di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez, Madrid, Spain
| | - Jesus Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Calle Melchor Fernandez, Madrid, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares (CIBERCV), Madrid, Spain
| | - Michael L. Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez, Madrid, Spain
- * E-mail:
| |
Collapse
|
25
|
Zile K, Dessimoz C, Wurm Y, Masel J. Only a Single Taxonomically Restricted Gene Family in the Drosophila melanogaster Subgroup Can Be Identified with High Confidence. Genome Biol Evol 2020; 12:1355-1366. [PMID: 32589737 PMCID: PMC8059200 DOI: 10.1093/gbe/evaa127] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/19/2020] [Indexed: 12/12/2022] Open
Abstract
Taxonomically restricted genes (TRGs) are genes that are present only in one clade. Protein-coding TRGs may evolve de novo from previously noncoding sequences: functional ncRNA, introns, or alternative reading frames of older protein-coding genes, or intergenic sequences. A major challenge in studying de novo genes is the need to avoid both false-positives (nonfunctional open reading frames and/or functional genes that did not arise de novo) and false-negatives. Here, we search conservatively for high-confidence TRGs as the most promising candidates for experimental studies, ensuring functionality through conservation across at least two species, and ensuring de novo status through examination of homologous noncoding sequences. Our pipeline also avoids ascertainment biases associated with preconceptions of how de novo genes are born. We identify one TRG family that evolved de novo in the Drosophila melanogaster subgroup. This TRG family contains single-copy genes in Drosophila simulans and Drosophila sechellia. It originated in an intron of a well-established gene, sharing that intron with another well-established gene upstream. These TRGs contain an intron that predates their open reading frame. These genes have not been previously reported as de novo originated, and to our knowledge, they are the best Drosophila candidates identified so far for experimental studies aimed at elucidating the properties of de novo genes.
Collapse
Affiliation(s)
- Karina Zile
- Division of Biosciences, University College London, United Kingdom
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, Switzerland
- Department of Genetics, Evolution and Environment, University College London, United Kingdom
- Department of Computer Science, University College London, United Kingdom
| | - Yannick Wurm
- School of Biological and Chemical Sciences, Queen Mary University of London, United Kingdom
- Alan Turing Institute, London, United Kingdom
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona
| |
Collapse
|
26
|
Rausell A, Luo Y, Lopez M, Seeleuthner Y, Rapaport F, Favier A, Stenson PD, Cooper DN, Patin E, Casanova JL, Quintana-Murci L, Abel L. Common homozygosity for predicted loss-of-function variants reveals both redundant and advantageous effects of dispensable human genes. Proc Natl Acad Sci U S A 2020; 117:13626-13636. [PMID: 32487729 PMCID: PMC7306792 DOI: 10.1073/pnas.1917993117] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Humans homozygous or hemizygous for variants predicted to cause a loss of function (LoF) of the corresponding protein do not necessarily present with overt clinical phenotypes. We report here 190 autosomal genes with 207 predicted LoF variants, for which the frequency of homozygous individuals exceeds 1% in at least one human population from five major ancestry groups. No such genes were identified on the X and Y chromosomes. Manual curation revealed that 28 variants (15%) had been misannotated as LoF. Of the 179 remaining variants in 166 genes, only 11 alleles in 11 genes had previously been confirmed experimentally to be LoF. The set of 166 dispensable genes was enriched in olfactory receptor genes (41 genes). The 41 dispensable olfactory receptor genes displayed a relaxation of selective constraints similar to that observed for other olfactory receptor genes. The 125 dispensable nonolfactory receptor genes also displayed a relaxation of selective constraints consistent with greater redundancy. Sixty-two of these 125 genes were found to be dispensable in at least three human populations, suggesting possible evolution toward pseudogenes. Of the 179 LoF variants, 68 could be tested for two neutrality statistics, and 8 displayed robust signals of positive selection. These latter variants included a known FUT2 variant that confers resistance to intestinal viruses, and an APOL3 variant involved in resistance to parasitic infections. Overall, the identification of 166 genes for which a sizeable proportion of humans are homozygous for predicted LoF alleles reveals both redundancies and advantages of such deficiencies for human survival.
Collapse
Affiliation(s)
- Antonio Rausell
- Clinical Bioinformatics Laboratory, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France;
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Yufei Luo
- Clinical Bioinformatics Laboratory, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Marie Lopez
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France
| | - Yoann Seeleuthner
- University of Paris, Imagine Institute, 75015 Paris, France
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
| | - Franck Rapaport
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
| | - Antoine Favier
- Clinical Bioinformatics Laboratory, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
- University of Paris, Imagine Institute, 75015 Paris, France
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, CF14 4XN Cardiff, United Kingdom
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, CF14 4XN Cardiff, United Kingdom
| | - Etienne Patin
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France
| | - Jean-Laurent Casanova
- University of Paris, Imagine Institute, 75015 Paris, France;
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
- Howard Hughes Medical Institute, New York, NY 10065
- Pediatric Hematology and Immunology Unit, Necker Hospital for Sick Children, 75015 Paris, France
| | - Lluis Quintana-Murci
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France
- Human Genomics and Evolution, Collège de France, Paris 75005, France
| | - Laurent Abel
- University of Paris, Imagine Institute, 75015 Paris, France;
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
| |
Collapse
|
27
|
Volders PJ, Anckaert J, Verheggen K, Nuytens J, Martens L, Mestdagh P, Vandesompele J. LNCipedia 5: towards a reference set of human long non-coding RNAs. Nucleic Acids Res 2020; 47:D135-D139. [PMID: 30371849 PMCID: PMC6323963 DOI: 10.1093/nar/gky1031] [Citation(s) in RCA: 343] [Impact Index Per Article: 85.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 10/17/2018] [Indexed: 12/20/2022] Open
Abstract
While long non-coding RNA (lncRNA) research in the past has primarily focused on the discovery of novel genes, today it has shifted towards functional annotation of this large class of genes. With thousands of lncRNA studies published every year, the current challenge lies in keeping track of which lncRNAs are functionally described. This is further complicated by the fact that lncRNA nomenclature is not straightforward and lncRNA annotation is scattered across different resources with their own quality metrics and definition of a lncRNA. To overcome this issue, large scale curation and annotation is needed. Here, we present the fifth release of the human lncRNA database LNCipedia (https://lncipedia.org). The most notable improvements include manual literature curation of 2482 lncRNA articles and the use of official gene symbols when available. In addition, an improved filtering pipeline results in a higher quality reference lncRNA gene set.
Collapse
Affiliation(s)
- Pieter-Jan Volders
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
- Center for Medical Genetics (CMGG), Ghent University, 9000 Ghent, Belgium
- VIB-UGent Center for Medical Biotechnology, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
- To whom correspondence should be addressed. Tel: +32 9 332 6979; Fax: +32 9 332 6549;
| | - Jasper Anckaert
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
- Center for Medical Genetics (CMGG), Ghent University, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| | - Kenneth Verheggen
- VIB-UGent Center for Medical Biotechnology, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| | - Justine Nuytens
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
- Center for Medical Genetics (CMGG), Ghent University, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| | - Pieter Mestdagh
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
- Center for Medical Genetics (CMGG), Ghent University, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| | - Jo Vandesompele
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
- Center for Medical Genetics (CMGG), Ghent University, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
28
|
Brasó-Vives M, Povolotskaya IS, Hartasánchez DA, Farré X, Fernandez-Callejo M, Raveendran M, Harris RA, Rosene DL, Lorente-Galdos B, Navarro A, Marques-Bonet T, Rogers J, Juan D. Copy number variants and fixed duplications among 198 rhesus macaques (Macaca mulatta). PLoS Genet 2020; 16:e1008742. [PMID: 32392208 PMCID: PMC7241854 DOI: 10.1371/journal.pgen.1008742] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 05/21/2020] [Accepted: 03/27/2020] [Indexed: 01/01/2023] Open
Abstract
The rhesus macaque is an abundant species of Old World monkeys and a valuable model organism for biomedical research due to its close phylogenetic relationship to humans. Copy number variation is one of the main sources of genomic diversity within and between species and a widely recognized cause of inter-individual differences in disease risk. However, copy number differences among rhesus macaques and between the human and macaque genomes, as well as the relevance of this diversity to research involving this nonhuman primate, remain understudied. Here we present a high-resolution map of sequence copy number for the rhesus macaque genome constructed from a dataset of 198 individuals. Our results show that about one-eighth of the rhesus macaque reference genome is composed of recently duplicated regions, either copy number variable regions or fixed duplications. Comparison with human genomic copy number maps based on previously published data shows that, despite overall similarities in the genome-wide distribution of these regions, there are specific differences at the chromosome level. Some of these create differences in the copy number profile between human disease genes and their rhesus macaque orthologs. Our results highlight the importance of addressing the number of copies of target genes in the design of experiments and cautions against human-centered assumptions in research conducted with model organisms. Overall, we present a genome-wide copy number map from a large sample of rhesus macaque individuals representing an important novel contribution concerning the evolution of copy number in primate genomes.
Collapse
Affiliation(s)
- Marina Brasó-Vives
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
- Laboratoire de Biométrie et Biologie Évolutive UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Villeurbanne, France
| | - Inna S. Povolotskaya
- Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Moscow, Russia
| | - Diego A. Hartasánchez
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
| | - Xavier Farré
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
| | - Marcos Fernandez-Callejo
- National Centre for Genomic Analysis-Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - R. Alan Harris
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Douglas L. Rosene
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Belen Lorente-Galdos
- Department of Neuroscience, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Arcadi Navarro
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
- National Institute for Bioinformatics (INB), Barcelona, Catalonia, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Catalonia, Spain
| | - Tomas Marques-Bonet
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
- National Centre for Genomic Analysis-Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Catalonia, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Catalonia, Spain
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - David Juan
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
| |
Collapse
|
29
|
Impey RE, Lee M, Hawkins DA, Sutton JM, Panjikar S, Perugini MA, Soares da Costa TP. Mis-annotations of a promising antibiotic target in high-priority gram-negative pathogens. FEBS Lett 2020; 594:1453-1463. [PMID: 31943170 DOI: 10.1002/1873-3468.13733] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 12/17/2019] [Accepted: 12/17/2019] [Indexed: 11/09/2022]
Abstract
The rise of antibiotic resistance combined with the lack of new products entering the market has led to bacterial infections becoming one of the biggest threats to global health. Therefore, there is an urgent need to identify novel antibiotic targets, such as dihydrodipicolinate synthase (DHDPS), an enzyme involved in the production of essential metabolites in cell wall and protein synthesis. Here, we utilised a 7-residue sequence motif to identify mis-annotation of multiple DHDPS genes in the high-priority Gram-negative bacteria Acinetobacter baumannii and Klebsiella pneumoniae. We subsequently confirmed these mis-annotations using a combination of enzyme kinetics and X-ray crystallography. Thus, this study highlights the need to ensure genes encoding promising drug targets, like DHDPS, are annotated correctly, especially for clinically important pathogens. PDB ID: 6UE0.
Collapse
Affiliation(s)
- Rachael E Impey
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, Australia
| | - Mihwa Lee
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, Australia
| | - Daniel A Hawkins
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, Australia
| | - J Mark Sutton
- National Infection Service, Research and Development Institute, Public Health England, Salisbury, UK
| | - Santosh Panjikar
- Australian Synchrotron, ANSTO, Clayton, VIC, Australia.,Department of Molecular Biology and Biochemistry, Monash University, Melbourne, VIC, Australia
| | - Matthew A Perugini
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, Australia
| | - Tatiana P Soares da Costa
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, Australia
| |
Collapse
|
30
|
Martinez-Gomez L, Abascal F, Jungreis I, Pozo F, Kellis M, Mudge JM, Tress ML. Few SINEs of life: Alu elements have little evidence for biological relevance despite elevated translation. NAR Genom Bioinform 2019; 2:lqz023. [PMID: 31886458 PMCID: PMC6924539 DOI: 10.1093/nargab/lqz023] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/30/2019] [Accepted: 12/12/2019] [Indexed: 12/12/2022] Open
Abstract
Transposable elements colonize genomes and with time may end up being incorporated into functional regions. SINE Alu elements, which appeared in the primate lineage, are ubiquitous in the human genome and more than a thousand overlap annotated coding exons. Although almost all Alu-derived coding exons appear to be in alternative transcripts, they have been incorporated into the main coding transcript in at least 11 genes. The extent to which Alu regions are incorporated into functional proteins is unclear, but we detected reliable peptide evidence to support the translation to protein of 33 Alu-derived exons. All but one of the Alu elements for which we detected peptides were frame-preserving and there was proportionally seven times more peptide evidence for Alu elements as for other primate exons. Despite this strong evidence for translation to protein we found no evidence of selection, either from cross species alignments or human population variation data, among these Alu-derived exons. Overall, our results confirm that SINE Alu elements have contributed to the expansion of the human proteome, and this contribution appears to be stronger than might be expected over such a relatively short evolutionary timeframe. Despite this, the biological relevance of these modifications remains open to question.
Collapse
Affiliation(s)
- Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | | | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA and Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA and Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre, 28029 Madrid, Spain
- To whom correspondence should be addressed. Tel: +34 91 732 8000; Fax: +34 91 224 6980;
| |
Collapse
|
31
|
Innovating the Concept and Practice of Two-Dimensional Gel Electrophoresis in the Analysis of Proteomes at the Proteoform Level. Proteomes 2019; 7:proteomes7040036. [PMID: 31671630 PMCID: PMC6958347 DOI: 10.3390/proteomes7040036] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/15/2019] [Accepted: 10/28/2019] [Indexed: 12/21/2022] Open
Abstract
Two-dimensional gel electrophoresis (2DE) is an important and well-established technical platform enabling extensive top-down proteomic analysis. However, the long-held but now largely outdated conventional concepts of 2DE have clearly impacted its application to in-depth investigations of proteomes at the level of protein species/proteoforms. It is time to popularize a new concept of 2DE for proteomics. With the development and enrichment of the proteome concept, any given “protein” is now recognized to consist of a series of proteoforms. Thus, it is the proteoform, rather than the canonical protein, that is the basic unit of a proteome, and each proteoform has a specific isoelectric point (pI) and relative mass (Mr). Accordingly, using 2DE, each proteoform can routinely be resolved and arrayed according to its different pI and Mr. Each detectable spot contains multiple proteoforms derived from the same gene, as well as from different genes. Proteoforms derived from the same gene are distributed into different spots in a 2DE pattern. High-resolution 2DE is thus actually an initial level of separation to address proteome complexity and is effectively a pre-fractionation method prior to analysis using mass spectrometry (MS). Furthermore, stable isotope-labeled 2DE coupled with high-sensitivity liquid chromatography-tandem MS (LC-MS/MS) has tremendous potential for the large-scale detection, identification, and quantification of the proteoforms that constitute proteomes.
Collapse
|
32
|
Hatje K, Mühlhausen S, Simm D, Kollmar M. The Protein-Coding Human Genome: Annotating High-Hanging Fruits. Bioessays 2019; 41:e1900066. [PMID: 31544971 DOI: 10.1002/bies.201900066] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 08/07/2019] [Indexed: 12/19/2022]
Abstract
The major transcript variants of human protein-coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes-they can be protein-coding, noncoding, or pseudogenes-and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro-exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re-assignments. The emerging human pan-genome is necessitating distinctive annotations incorporating differences between individuals and between populations.
Collapse
Affiliation(s)
- Klas Hatje
- Roche Pharmaceutical Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstr. 124, 4070, Basel, Switzerland
| | - Stefanie Mühlhausen
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| | - Dominic Simm
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany.,Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Goldschmidtstr. 7, 37077, Göttingen, Germany
| | - Martin Kollmar
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| |
Collapse
|
33
|
Mudge JM, Jungreis I, Hunt T, Gonzalez JM, Wright JC, Kay M, Davidson C, Fitzgerald S, Seal R, Tweedie S, He L, Waterhouse RM, Li Y, Bruford E, Choudhary JS, Frankish A, Kellis M. Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci. Genome Res 2019; 29:2073-2087. [PMID: 31537640 PMCID: PMC6886504 DOI: 10.1101/gr.246462.118] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 09/09/2019] [Indexed: 12/15/2022]
Abstract
The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.
Collapse
Affiliation(s)
- Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - James C Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, London SW7 3RP, United Kingdom
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Stephen Fitzgerald
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Ruth Seal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.,Department of Haematology, University of Cambridge, Cambridge CB2 0PT, United Kingdom
| | - Susan Tweedie
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Liang He
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Robert M Waterhouse
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Yue Li
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Elspeth Bruford
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.,Department of Haematology, University of Cambridge, Cambridge CB2 0PT, United Kingdom
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, London SW7 3RP, United Kingdom
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
34
|
Machado KCT, Fortuin S, Tomazella GG, Fonseca AF, Warren RM, Wiker HG, de Souza SJ, de Souza GA. On the Impact of the Pangenome and Annotation Discrepancies While Building Protein Sequence Databases for Bacteria Proteogenomics. Front Microbiol 2019; 10:1410. [PMID: 31281302 PMCID: PMC6596428 DOI: 10.3389/fmicb.2019.01410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 06/05/2019] [Indexed: 01/19/2023] Open
Abstract
In proteomics, peptide information within mass spectrometry (MS) data from a specific organism sample is routinely matched against a protein sequence database that best represent such organism. However, if the species/strain in the sample is unknown or genetically poorly characterized, it becomes challenging to determine a database which can represent such sample. Building customized protein sequence databases merging multiple strains for a given species has become a strategy to overcome such restrictions. However, as more genetic information is publicly available and interesting genetic features such as the existence of pan- and core genes within a species are revealed, we questioned how efficient such merging strategies are to report relevant information. To test this assumption, we constructed databases containing conserved and unique sequences for 10 different species. Features that are relevant for probabilistic-based protein identification by proteomics were then monitored. As expected, increase in database complexity correlates with pangenomic complexity. However, Mycobacterium tuberculosis and Bordetella pertussis generated very complex databases even having low pangenomic complexity. We further tested database performance by using MS data from eight clinical strains from M. tuberculosis, and from two published datasets from Staphylococcus aureus. We show that by using an approach where database size is controlled by removing repeated identical tryptic sequences across strains/species, computational time can be reduced drastically as database complexity increases.
Collapse
Affiliation(s)
- Karla C T Machado
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
| | - Suereta Fortuin
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research/SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Gisele Guicardi Tomazella
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
- The Gade Research Group for Infection and Immunity, Department of Clinical Science, University of Bergen, Bergen, Norway
- The Institute of Bioinformatics and Biotechnology, Natal, Brazil
| | - Andre F Fonseca
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
| | - Robin Mark Warren
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research/SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Harald G Wiker
- The Gade Research Group for Infection and Immunity, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Sandro Jose de Souza
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
- The Brain Institute, Universidade Federal do Rio Grande do Norte, Natal, Brazil
| | - Gustavo Antonio de Souza
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
- Department of Biochemistry, Federal University of Rio Grande do Norte (UFRN), Natal, Brazil
| |
Collapse
|
35
|
Babarinde IA, Li Y, Hutchins AP. Computational Methods for Mapping, Assembly and Quantification for Coding and Non-coding Transcripts. Comput Struct Biotechnol J 2019; 17:628-637. [PMID: 31193391 PMCID: PMC6526290 DOI: 10.1016/j.csbj.2019.04.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 04/24/2019] [Accepted: 04/29/2019] [Indexed: 12/17/2022] Open
Abstract
The measurement of gene expression has long provided significant insight into biological functions. The development of high-throughput short-read sequencing technology has revealed transcriptional complexity at an unprecedented scale, and informed almost all areas of biology. However, as researchers have sought to gather more insights from the data, these new technologies have also increased the computational analysis burden. In this review, we describe typical computational pipelines for RNA-Seq analysis and discuss their strengths and weaknesses for the assembly, quantification and analysis of coding and non-coding RNAs. We also discuss the assembly of transposable elements into transcripts, and the difficulty these repetitive elements pose. In summary, RNA-Seq is a powerful technology that is likely to remain a key asset in the biologist's toolkit.
Collapse
Affiliation(s)
| | | | - Andrew P. Hutchins
- Department of Biology, Southern University of Science and Technology, 1088 Xueyuan Lu, Shenzhen, China
| |
Collapse
|
36
|
Rao MS, Van Vleet TR, Ciurlionis R, Buck WR, Mittelstadt SW, Blomme EAG, Liguori MJ. Comparison of RNA-Seq and Microarray Gene Expression Platforms for the Toxicogenomic Evaluation of Liver From Short-Term Rat Toxicity Studies. Front Genet 2019; 9:636. [PMID: 30723492 PMCID: PMC6349826 DOI: 10.3389/fgene.2018.00636] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 11/27/2018] [Indexed: 12/12/2022] Open
Abstract
Gene expression profiling is a useful tool to predict and interrogate mechanisms of toxicity. RNA-Seq technology has emerged as an attractive alternative to traditional microarray platforms for conducting transcriptional profiling. The objective of this work was to compare both transcriptomic platforms to determine whether RNA-Seq offered significant advantages over microarrays for toxicogenomic studies. RNA samples from the livers of rats treated for 5 days with five tool hepatotoxicants (α-naphthylisothiocyanate/ANIT, carbon tetrachloride/CCl4, methylenedianiline/MDA, acetaminophen/APAP, and diclofenac/DCLF) were analyzed with both gene expression platforms (RNA-Seq and microarray). Data were compared to determine any potential added scientific (i.e., better biological or toxicological insight) value offered by RNA-Seq compared to microarrays. RNA-Seq identified more differentially expressed protein-coding genes and provided a wider quantitative range of expression level changes when compared to microarrays. Both platforms identified a larger number of differentially expressed genes (DEGs) in livers of rats treated with ANIT, MDA, and CCl4 compared to APAP and DCLF, in agreement with the severity of histopathological findings. Approximately 78% of DEGs identified with microarrays overlapped with RNA-Seq data, with a Spearman’s correlation of 0.7 to 0.83. Consistent with the mechanisms of toxicity of ANIT, APAP, MDA and CCl4, both platforms identified dysregulation of liver relevant pathways such as Nrf2, cholesterol biosynthesis, eiF2, hepatic cholestasis, glutathione and LPS/IL-1 mediated RXR inhibition. RNA-Seq data showed additional DEGs that not only significantly enriched these pathways, but also suggested modulation of additional liver relevant pathways. In addition, RNA-Seq enabled the identification of non-coding DEGs that offer a potential for improved mechanistic clarity. Overall, these results indicate that RNA-Seq is an acceptable alternative platform to microarrays for rat toxicogenomic studies with several advantages. Because of its wider dynamic range as well as its ability to identify a larger number of DEGs, RNA-Seq may generate more insight into mechanisms of toxicity. However, more extensive reference data will be necessary to fully leverage these additional RNA-Seq data, especially for non-coding sequences.
Collapse
Affiliation(s)
- Mohan S Rao
- Investigative Toxicology and Pathology, Global Preclinical Safety, AbbVie, North Chicago, IL, United States
| | - Terry R Van Vleet
- Investigative Toxicology and Pathology, Global Preclinical Safety, AbbVie, North Chicago, IL, United States
| | - Rita Ciurlionis
- Investigative Toxicology and Pathology, Global Preclinical Safety, AbbVie, North Chicago, IL, United States
| | - Wayne R Buck
- Investigative Toxicology and Pathology, Global Preclinical Safety, AbbVie, North Chicago, IL, United States
| | - Scott W Mittelstadt
- Investigative Toxicology and Pathology, Global Preclinical Safety, AbbVie, North Chicago, IL, United States
| | - Eric A G Blomme
- Investigative Toxicology and Pathology, Global Preclinical Safety, AbbVie, North Chicago, IL, United States
| | - Michael J Liguori
- Investigative Toxicology and Pathology, Global Preclinical Safety, AbbVie, North Chicago, IL, United States
| |
Collapse
|
37
|
Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, Barnes I, Berry A, Bignell A, Carbonell Sala S, Chrast J, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Hunt T, Izuogu OG, Lagarde J, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FC, Parker A, Pei B, Pozo F, Ruffier M, Schmitt BM, Stapleton E, Suner MM, Sycheva I, Uszczynska-Ratajczak B, Xu J, Yates A, Zerbino D, Zhang Y, Aken B, Choudhary JS, Gerstein M, Guigó R, Hubbard TJ, Kellis M, Paten B, Reymond A, Tress ML, Flicek P. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 2019; 47:D766-D773. [PMID: 30357393 PMCID: PMC6323946 DOI: 10.1093/nar/gky955] [Citation(s) in RCA: 1814] [Impact Index Per Article: 362.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/20/2018] [Accepted: 10/08/2018] [Indexed: 02/06/2023] Open
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anne-Maud Ferreira
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Rory Johnson
- Department of Medical Oncology, Inselspital, University Hospital, University of Bern, Bern, Switzerland
- Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Bioscience, Brunel University London, Uxbridge UB8 3PH, UK
| | - James Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
| | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Silvia Carbonell Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Jacqueline Chrast
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tomás Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Osagie G Izuogu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Shamika Mohanan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Muir
- Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Fabio C P Navarro
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Baikang Pei
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eloise Stapleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Jinuri Xu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yan Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Bronwen Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology & Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
38
|
Morris RJ. Thy-1, a Pathfinder Protein for the Post-genomic Era. Front Cell Dev Biol 2018; 6:173. [PMID: 30619853 PMCID: PMC6305390 DOI: 10.3389/fcell.2018.00173] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 12/06/2018] [Indexed: 12/21/2022] Open
Abstract
Thy-1 is possibly the smallest of cell surface proteins – 110 amino acids folded into an Immunoglobulin variable domain, tethered to the outer leaflet of the cell surface membrane via just the two saturated fatty acids of its glycosylphosphatidylinositol (GPI) anchor. Yet Thy-1 is emerging as a key regulator of differentiation in cells of endodermal, mesodermal, and ectodermal origin, acting as both a ligand (for certain integrins and other receptors), and as a receptor, able to modulate signaling and hence differentiation in the Thy-1-expressing cell. This is an extraordinary diversity of molecular pathways to be controlled by a molecule that does not even cross the cell membrane. Here I review aspects of the cell biology of Thy-1, and studies of its role as deduced from gene knock-out studies, that suggest how this protein can participate in so many different signaling-related functions. While mechanisms differ in molecular detail, it appears overall that Thy-1 dampens down signaling to control function.
Collapse
Affiliation(s)
- Roger J Morris
- Department of Chemistry, King's College London, London, United Kingdom
| |
Collapse
|
39
|
Sinha S, Eisenhaber B, Jensen LJ, Kalbuaji B, Eisenhaber F. Darkness in the Human Gene and Protein Function Space: Widely Modest or Absent Illumination by the Life Science Literature and the Trend for Fewer Protein Function Discoveries Since 2000. Proteomics 2018; 18:e1800093. [PMID: 30265449 PMCID: PMC6282819 DOI: 10.1002/pmic.201800093] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 09/07/2018] [Indexed: 12/15/2022]
Abstract
The mentioning of gene names in the body of the scientific literature 1901-2017 and their fractional counting is used as a proxy to assess the level of biological function discovery. A literature score of one has been defined as full publication equivalent (FPE), the amount of literature necessary to achieve one publication solely dedicated to a gene. It has been found that less than 5000 human genes have each at least 100 FPEs in the available literature corpus. This group of elite genes (4817 protein-coding genes, 119 non-coding RNAs) attracts the overwhelming majority of the scientific literature about genes. Yet, thousands of proteins have never been mentioned at all, ≈2000 further proteins have not even one FPE of literature and, for ≈4600 additional proteins, the FPE count is below 10. The protein function discovery rate measured as numbers of proteins first mentioned or crossing a threshold of accumulated FPEs in a given year has grown until 2000 but is in decline thereafter. This drop is partially offset by function discoveries for non-coding RNAs. The full human genome sequencing does not boost the function discovery rate. Since 2000, the fastest growing group in the literature is that with at least 500 FPEs per gene.
Collapse
Affiliation(s)
- Swati Sinha
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein ResearchFaculty of Health and Medical SciencesUniversity of CopenhagenDK-2200 CopenhagenDenmark
| | - Bharata Kalbuaji
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII)Agency for Science and Technology (A*STAR)Matrix138671Singapore
- School of Computer Science and Engineering (SCSE)Nanyang Technological University (NTU)637553Singapore
| |
Collapse
|
40
|
Melo Z, Ishida C, Goldaraz MDLP, Rojo R, Echavarria R. Novel Roles of Non-Coding RNAs in Opioid Signaling and Cardioprotection. Noncoding RNA 2018; 4:ncrna4030022. [PMID: 30227648 PMCID: PMC6162605 DOI: 10.3390/ncrna4030022] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 09/10/2018] [Accepted: 09/12/2018] [Indexed: 12/16/2022] Open
Abstract
Cardiovascular disease (CVD) is a significant cause of morbidity and mortality across the world. A large proportion of CVD deaths are secondary to coronary artery disease (CAD) and myocardial infarction (MI). Even though prevention is the best strategy to reduce risk factors associated with MI, the use of cardioprotective interventions aimed at improving patient outcomes is of great interest. Opioid conditioning has been shown to be effective in reducing myocardial ischemia-reperfusion injury (IRI) and cardiomyocyte death. However, the molecular mechanisms behind these effects are under investigation and could provide the basis for the development of novel therapeutic approaches in the treatment of CVD. Non-coding RNAs (ncRNAs), which are functional RNA molecules that do not translate into proteins, are critical modulators of cardiac gene expression during heart development and disease. Moreover, ncRNAs such as microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) are known to be induced by opioid receptor activation and regulate opioid signaling pathways. Recent advances in experimental and computational tools have accelerated the discovery and functional characterization of ncRNAs. In this study, we review the current understanding of the role of ncRNAs in opioid signaling and opioid-induced cardioprotection.
Collapse
Affiliation(s)
- Zesergio Melo
- CONACyT-Centro de Investigacion Biomedica de Occidente, Instituto Mexicano del Seguro Social, Sierra Mojada #800 Col. Independencia, Guadalajara 44340, Jalisco, Mexico.
| | - Cecilia Ishida
- Programa de Genomica Computacional, Centro de Ciencias Genomicas, Universidad Nacional Autonoma de Mexico, Cuernavaca 62210, Morelos, Mexico.
| | - Maria de la Paz Goldaraz
- Departamento de Anestesiologia, Hospital de Especialidades UMAE CMNO, Instituto Mexicano del Seguro Social, Guadalajara 44340, Jalisco, Mexico.
| | - Rocio Rojo
- Departamento de Anestesiologia, Hospital de Especialidades UMAE CMNO, Instituto Mexicano del Seguro Social, Guadalajara 44340, Jalisco, Mexico.
| | - Raquel Echavarria
- CONACyT-Centro de Investigacion Biomedica de Occidente, Instituto Mexicano del Seguro Social, Sierra Mojada #800 Col. Independencia, Guadalajara 44340, Jalisco, Mexico.
| |
Collapse
|