Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Abascal F, Juan D, Jungreis I, Kellis M, Martinez L, Rigau M, Rodriguez JM, Vazquez J, Tress ML. Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Res 2019;46:7070-7084. [PMID: 29982784 PMCID: PMC6101605 DOI: 10.1093/nar/gky587] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 06/18/2018] [Indexed: 12/16/2022] Open

For:	Abascal F, Juan D, Jungreis I, Kellis M, Martinez L, Rigau M, Rodriguez JM, Vazquez J, Tress ML. Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Res 2019;46:7070-7084. [PMID: 29982784 PMCID: PMC6101605 DOI: 10.1093/nar/gky587] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 06/18/2018] [Indexed: 12/16/2022] Open

Number

Cited by Other Article(s)

Rodriguez JM, Abascal F, Cerdán-Vélez D, Gómez LM, Vázquez J, Tress ML. Evidence for widespread translation of 5' untranslated regions. Nucleic Acids Res 2024;52:8112-8126. [PMID: 38953162 DOI: 10.1093/nar/gkae571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 06/07/2024] [Accepted: 06/19/2024] [Indexed: 07/03/2024] Open

Qin Z, Yang J, Zhang K, Gao X, Ran Q, Xu Y, Wang Z, Lou D, Huang C, Zellmer L, Meng G, Chen N, Ma H, Wang Z, Liao DJ. Updating mRNA variants of the human RSK4 gene and their expression in different stressed situations. Heliyon 2024;10:e27475. [PMID: 38560189 PMCID: PMC10980951 DOI: 10.1016/j.heliyon.2024.e27475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Revised: 02/11/2024] [Accepted: 02/29/2024] [Indexed: 04/04/2024] Open

Affiliation(s)

Zhenwei Qin Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
Jianglin Yang Center for Clinical Laboratories, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Rd, Guiyang, 550004, Guizhou Province, China Key Lab of Endemic and Ethnic Diseases of the Ministry of Education of China in Guizhou Medical University, Guiyang, 550004, Guizhou Province, China
Keyin Zhang Department of Pathology, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Road, Guiyang, 550004, Guizhou Province, China
Xia Gao Department of Pathology, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Road, Guiyang, 550004, Guizhou Province, China
Qianchuan Ran Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
Yuanhong Xu Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
Zhi Wang Department of Pathology, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Road, Guiyang, 550004, Guizhou Province, China
Didong Lou Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
Chunhua Huang Section of Forensic Science and Pathology, School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Dong-Qing-Nan Road, Guiyang, 550025, Guizhou Province, China
Lucas Zellmer Department of Medicine, Hennepin County Medical Center, 730 South 8th St., Minneapolis, MN, 55415, USA
Guangxue Meng Department of Oral and Maxillofacial Surgery, School of Stomatology, Guizhou Medical University, 9 Beijing Road, Guiyang, 550004, Guizhou Province, China
Na Chen Department of Oral and Maxillofacial Surgery, School of Stomatology, Guizhou Medical University, 9 Beijing Road, Guiyang, 550004, Guizhou Province, China
Hong Ma Department of Oral and Maxillofacial Surgery, School of Stomatology, Guizhou Medical University, 9 Beijing Road, Guiyang, 550004, Guizhou Province, China
Zhe Wang State Key Laboratory of Cancer Biology, Department of Pathology, Xijing Hospital, Air Force Medical University, 169 Changle West Road, Xi'an, 710032, China
Dezhong Joshua Liao Center for Clinical Laboratories, The Affiliated Hospital of Guizhou Medical University, 4 Beijing Rd, Guiyang, 550004, Guizhou Province, China Key Lab of Endemic and Ethnic Diseases of the Ministry of Education of China in Guizhou Medical University, Guiyang, 550004, Guizhou Province, China

Collapse

Kafita D, Nkhoma P, Dzobo K, Sinkala M. Shedding light on the dark genome: Insights into the genetic, CRISPR-based, and pharmacological dependencies of human cancers and disease aggressiveness. PLoS One 2023;18:e0296029. [PMID: 38117798 PMCID: PMC10732413 DOI: 10.1371/journal.pone.0296029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 12/05/2023] [Indexed: 12/22/2023] Open

Carrion SA, Michal JJ, Jiang Z. Alternative Transcripts Diversify Genome Function for Phenome Relevance to Health and Diseases. Genes (Basel) 2023;14:2051. [PMID: 38002994 PMCID: PMC10671453 DOI: 10.3390/genes14112051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/26/2023] Open

He H, Wu C, Saqib M, Hao R. Single-molecule fluorescence methods for protein biomarker analysis. Anal Bioanal Chem 2023:10.1007/s00216-022-04502-9. [PMID: 36609860 DOI: 10.1007/s00216-022-04502-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/07/2022] [Accepted: 12/20/2022] [Indexed: 01/09/2023]

Abstract

Proteins have been considered key building blocks of life. In particular, the protein content of an organism and a cell offers significant information for the in-depth understanding of the disease and biological processes. Single-molecule protein detection/sequencing tools will revolutionize clinical (proteomics) research, offering ultrasensitivity for low-abundance biomarker (protein) detection, which is important for the realization of early-stage disease diagnosis and single-cell proteomics. This improved detection/measurement capability delivers new sets of techniques to explore new frontiers and address important challenges in various interdisciplinary areas including nanostructured materials, molecular medicine, molecular biology, and chemistry. Importantly, fluorescence-based methods have emerged as indispensable tools for single protein detection/sequencing studies, providing a higher signal-to-noise ratio (SNR). Improvements in fluorescent dyes/probes and detector capabilities coupled with advanced (image) analysis strategies have fueled current developments for single protein biomarker detections. For example, in comparison to conventional ELISA (i.e., based on ensembled measurements), single-molecule fluorescence detection is more sensitive, faster, and more accurate with reduced background, high-throughput, and so on. In comparison to MS sequencing, fluorescence-based single-molecule protein sequencing can achieve the sequencing of peptides themselves with higher sensitivity. This review summarizes various typical single-molecule detection technologies including their methodology (modes of operation), detection limits, advantages and drawbacks, and current challenges with recent examples. We describe the fluorescence-based single-molecule protein sequencing/detection based on five kinds of technologies such as fluorosequencing, N-terminal amino acid binder, nanopore light sensing, and DNA nanotechnology. Finally, we present our perspective for developing high-performance fluorescence-based sequencing/detection techniques.

Collapse

Zhang J, Lin X, Chen Y, Li T, Lee AC, Chow EY, Cho WC, Chan T. LAFITE Reveals the Complexity of Transcript Isoforms in Subcellular Fractions. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023;10:e2203480. [PMID: 36461702 PMCID: PMC9875686 DOI: 10.1002/advs.202203480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 10/28/2022] [Indexed: 06/17/2023]

Singh D, Roy J. A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs. Nucleic Acids Res 2022;50:12094-12111. [PMID: 36420898 PMCID: PMC9757047 DOI: 10.1093/nar/gkac1092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 10/22/2022] [Accepted: 10/28/2022] [Indexed: 11/27/2022] Open

Martinez-Gomez L, Cerdán-Vélez D, Abascal F, Tress ML. Origins and Evolution of Human Tandem Duplicated Exon Substitution Events. Genome Biol Evol 2022;14:6809199. [PMID: 36346145 PMCID: PMC9741552 DOI: 10.1093/gbe/evac162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 10/25/2022] [Accepted: 10/29/2022] [Indexed: 11/10/2022] Open

Wang WC, Lai YC. DUSP5 and PHLDA1 mutations in mature cystic teratomas of the ovary identified on whole-exome sequencing may explain teratoma characteristics. Hum Genomics 2022;16:50. [PMID: 36289533 PMCID: PMC9609193 DOI: 10.1186/s40246-022-00424-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/19/2022] [Indexed: 11/21/2022] Open

Pozo F, Rodriguez JM, Martínez Gómez L, Vázquez J, Tress ML. APPRIS principal isoforms and MANE Select transcripts define reference splice variants. Bioinformatics 2022;38:ii89-ii94. [PMID: 36124785 PMCID: PMC9486585 DOI: 10.1093/bioinformatics/btac473] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Abstract

MOTIVATION

Selecting the splice variant that best represents a coding gene is a crucial first step in many experimental analyses, and vital for mapping clinically relevant variants. This study compares the longest isoforms, MANE Select transcripts, APPRIS principal isoforms, and expression data, and aims to determine which method is best for selecting biological important reference splice variants for large-scale analyses.

RESULTS

Proteomics analyses and human genetic variation data suggest that most coding genes have a single main protein isoform. We show that APPRIS principal isoforms and MANE Select transcripts best describe these main cellular isoforms, and find that using the longest splice variant as the representative is a poor strategy. Exons unique to the longest splice isoforms are not under selective pressure, and so are unlikely to be functionally relevant. Expression data are also a poor means of selecting the main splice variant. APPRIS principal and MANE Select exons are under purifying selection, while exons specific to alternative transcripts are not. There are MANE and APPRIS representatives for almost 95% of genes, and where they agree they are particularly effective, coinciding with the main proteomics isoform for over 98.2% of genes.

AVAILABILITY AND IMPLEMENTATION

APPRIS principal isoforms for human, mouse and other model species can be downloaded from the APPRIS database (https://appris.bioinfo.cnio.es), GENCODE genes (https://www.gencodegenes.org/) and the Ensembl website (https://www.ensembl.org). MANE Select transcripts for the human reference set are available from the Ensembl, GENCODE and RefSeq databases (https://www.ncbi.nlm.nih.gov/refseq/). Lists of splice variants where MANE and APPRIS coincide are available from the APPRIS database.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

The effects of sequencing depth on the assembly of coding and noncoding transcripts in the human genome. BMC Genomics 2022;23:487. [PMID: 35787153 PMCID: PMC9251931 DOI: 10.1186/s12864-022-08717-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 06/16/2022] [Indexed: 12/30/2022] Open

How Far Are We from the Completion of the Human Protein Interactome Reconstruction? Biomolecules 2022;12:biom12010140. [PMID: 35053288 PMCID: PMC8774112 DOI: 10.3390/biom12010140] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 01/09/2022] [Accepted: 01/11/2022] [Indexed: 12/12/2022] Open

Abstract

After more than fifteen years from the first high-throughput experiments for human protein–protein interaction (PPI) detection, we are still wondering how close the completion of the genome-scale human PPI network reconstruction is, what needs to be further explored and whether the biological insights gained from the holistic investigation of the current network are valid and useful. The unique structure of PICKLE, a meta-database of the human experimentally determined direct PPI network developed by our group, presently covering ~80% of the UniProtKB/Swiss-Prot reviewed human complete proteome, enables the evaluation of the interactome expansion by comparing the successive PICKLE releases since 2013. We observe a gradual overall increase of 39%, 182%, and 67% in protein nodes, PPIs, and supporting references, respectively. Our results indicate that, in recent years, (a) the PPI addition rate has decreased, (b) the new PPIs are largely determined by high-throughput experiments and mainly concern existing protein nodes and (c), as we had predicted earlier, most of the newly added protein nodes have a low degree. These observations, combined with a largely overlapping k-core between PICKLE releases and a network density increase, imply that an almost complete picture of a structurally defined network has been reached. The comparative unsupervised application of two clustering algorithms indicated that exploring the full interactome topology can reveal the protein neighborhoods involved in closely related biological processes as transcriptional regulation, cell signaling and multiprotein complexes such as the connexon complex associated with cancers. A well-reconstructed human protein interactome is a powerful tool in network biology and medicine research forming the basis for multi-omic and dynamic analyses.

Collapse

Profit versus Quality: The Enigma of Scientific Wellness. J Pers Med 2022;12:jpm12010034. [PMID: 35055349 PMCID: PMC8779909 DOI: 10.3390/jpm12010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 12/21/2021] [Accepted: 12/24/2021] [Indexed: 11/23/2022] Open

Porta-Pardo E, Ruiz-Serra V, Valentini S, Valencia A. The structural coverage of the human proteome before and after AlphaFold. PLoS Comput Biol 2022;18:e1009818. [PMID: 35073311 PMCID: PMC8812986 DOI: 10.1371/journal.pcbi.1009818] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 02/03/2022] [Accepted: 01/07/2022] [Indexed: 12/12/2022] Open

Abstract

The protein structure field is experiencing a revolution. From the increased throughput of techniques to determine experimental structures, to developments such as cryo-EM that allow us to find the structures of large protein complexes or, more recently, the development of artificial intelligence tools, such as AlphaFold, that can predict with high accuracy the folding of proteins for which the availability of homology templates is limited. Here we quantify the effect of the recently released AlphaFold database of protein structural models in our knowledge on human proteins. Our results indicate that our current baseline for structural coverage of 48%, considering experimentally-derived or template-based homology models, elevates up to 76% when including AlphaFold predictions. At the same time the fraction of dark proteome is reduced from 26% to just 10% when AlphaFold models are considered. Furthermore, although the coverage of disease-associated genes and mutations was near complete before AlphaFold release (69% of Clinvar pathogenic mutations and 88% of oncogenic mutations), AlphaFold models still provide an additional coverage of 3% to 13% of these critically important sets of biomedical genes and mutations. Finally, we show how the contribution of AlphaFold models to the structural coverage of non-human organisms, including important pathogenic bacteria, is significantly larger than that of the human proteome. Overall, our results show that the sequence-structure gap of human proteins has almost disappeared, an outstanding success of direct consequences for the knowledge on the human genome and the derived medical applications.

Protein structures are key to understand many biological phenomena at the molecular scale: from the effects of genetic variation to how different proteins interact with each other to create molecular pathways that, together, have a biological function. Obtaining experimental structures, however, is extremely consuming in terms of both, time and resources.

For this and other reasons, scientists have long worked to develop computational approaches that predict the structure of a protein using only its sequence as input. Recently, a group of scientists at Deepmind have developed AlphaFold2, a computational tool that is extremely accurate at this task. Moreover, they have used this tool to predict the structures of all human proteins.

In this manuscript we provide an overview of the structural coverage of the human proteome before AlphaFold models were released and how much we have gained thanks to these models. We also show how the gain affects our understanding of human pathogenic variants, both germline and somatic. Finally, we provide evidence suggesting that the gain in non-human organisms is larger than for the human proteome, particularly in the case of bacteria.

Collapse

Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021;433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]

Abstract

Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.

Collapse

Carbonara K, Andonovski M, Coorssen JR. Proteomes Are of Proteoforms: Embracing the Complexity. Proteomes 2021;9:38. [PMID: 34564541 PMCID: PMC8482110 DOI: 10.3390/proteomes9030038] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 08/24/2021] [Accepted: 08/29/2021] [Indexed: 12/17/2022] Open

Babarinde IA, Ma G, Li Y, Deng B, Luo Z, Liu H, Abdul MM, Ward C, Chen M, Fu X, Shi L, Duttlinger M, He J, Sun L, Li W, Zhuang Q, Tong G, Frampton J, Cazier JB, Chen J, Jauch R, Esteban MA, Hutchins AP. Transposable element sequence fragments incorporated into coding and noncoding transcripts modulate the transcriptome of human pluripotent stem cells. Nucleic Acids Res 2021;49:9132-9153. [PMID: 34390351 PMCID: PMC8450112 DOI: 10.1093/nar/gkab710] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 07/29/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open

Affiliation(s)

Isaac A Babarinde Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
Gang Ma Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
Yuhao Li Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
Boping Deng Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK
Zhiwei Luo Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
Hao Liu Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
Mazid Md Abdul Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
Carl Ward Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
Minchun Chen Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
Xiuling Fu Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
Liyang Shi Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
Martha Duttlinger Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
Jiangping He Center for Cell Lineage and Atlas (CCLA), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China
Li Sun Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
Wenjuan Li Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
Qiang Zhuang Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
Guoqing Tong Center for Reproductive Medicine, Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai 200120, China
Jon Frampton Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK
Jean-Baptiste Cazier Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK.,Centre for Computational Biology, University of Birmingham, Birmingham, UK
Jiekai Chen Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Center for Cell Lineage and Atlas (CCLA), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China.,Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
Ralf Jauch School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
Miguel A Esteban Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China
Andrew P Hutchins Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China

Collapse

Martinez Gomez L, Pozo F, Walsh TA, Abascal F, Tress ML. The clinical importance of tandem exon duplication-derived substitutions. Nucleic Acids Res 2021;49:8232-8246. [PMID: 34302486 PMCID: PMC8373072 DOI: 10.1093/nar/gkab623] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/21/2021] [Indexed: 01/04/2023] Open

Pozo F, Martinez-Gomez L, Walsh TA, Rodriguez JM, Di Domenico T, Abascal F, Vazquez J, Tress ML. Assessing the functional relevance of splice isoforms. NAR Genom Bioinform 2021;3:lqab044. [PMID: 34046593 PMCID: PMC8140736 DOI: 10.1093/nargab/lqab044] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 04/22/2021] [Accepted: 05/17/2021] [Indexed: 12/20/2022] Open

Agostini F, Zagalak J, Attig J, Ule J, Luscombe NM. Intergenic RNA mainly derives from nascent transcripts of known genes. Genome Biol 2021;22:136. [PMID: 33952325 PMCID: PMC8097831 DOI: 10.1186/s13059-021-02350-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 04/12/2021] [Indexed: 12/16/2022] Open

Harrison PM. Computational Methods for Pseudogene Annotation Based on Sequence Homology. Methods Mol Biol 2021;2324:35-48. [PMID: 34165707 DOI: 10.1007/978-1-0716-1503-4_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Li J, Zhan X. Mass spectrometry-based proteomics analyses of post-translational modifications and proteoforms in human pituitary adenomas. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020;1869:140584. [PMID: 33321259 DOI: 10.1016/j.bbapap.2020.140584] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 12/06/2020] [Accepted: 12/08/2020] [Indexed: 12/13/2022]

Bogard B, Francastel C, Hubé F. Multiple information carried by RNAs: total eclipse or a light at the end of the tunnel? RNA Biol 2020;17:1707-1720. [PMID: 32559119 PMCID: PMC7714488 DOI: 10.1080/15476286.2020.1783868] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 06/06/2020] [Accepted: 06/12/2020] [Indexed: 12/14/2022] Open

Rodriguez JM, Pozo F, di Domenico T, Vazquez J, Tress ML. An analysis of tissue-specific alternative splicing at the protein level. PLoS Comput Biol 2020;16:e1008287. [PMID: 33017396 PMCID: PMC7561204 DOI: 10.1371/journal.pcbi.1008287] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 10/15/2020] [Accepted: 08/25/2020] [Indexed: 01/09/2023] Open

Abstract

The role of alternative splicing is one of the great unanswered questions in cellular biology. There is strong evidence for alternative splicing at the transcript level, and transcriptomics experiments show that many splice events are tissue specific. It has been suggested that alternative splicing evolved in order to remodel tissue-specific protein-protein networks. Here we investigated the evidence for tissue-specific splicing among splice isoforms detected in a large-scale proteomics analysis. Although the data supporting alternative splicing is limited at the protein level, clear patterns emerged among the small numbers of alternative splice events that we could detect in the proteomics data. More than a third of these splice events were tissue-specific and most were ancient: over 95% of splice events that were tissue-specific in both proteomics and RNAseq analyses evolved prior to the ancestors of lobe-finned fish, at least 400 million years ago. By way of contrast, three in four alternative exons in the human gene set arose in the primate lineage, so our results cannot be extrapolated to the whole genome. Tissue-specific alternative protein forms in the proteomics analysis were particularly abundant in nervous and muscle tissues and their genes had roles related to the cytoskeleton and either the structure of muscle fibres or cell-cell connections. Our results suggest that this conserved tissue-specific alternative splicing may have played a role in the development of the vertebrate brain and heart.

We manually curated a set of 255 splice events detected in a large-scale tissue-based proteomics experiment and found that more than a third had evidence of significant tissue-specific differences. Events that were significantly tissue-specific at the protein level were highly conserved; almost 75% evolved over 400 million years ago. The tissues in which we found most evidence for tissue-specific splicing were nervous tissues and cardiac tissues. Genes with tissue-specific events in these two tissues had functions related to important cellular structures in brain and heart tissues. These splice events may have been essential for the development of vertebrate heart and muscle. However, our data set may not be representative of alternative exons as a whole. We found that most tissue specific splicing was strongly conserved, but just 5% of annotated alternative exons in the human gene set are ancient. More than three quarters of alternative exons are primate-derived. Although the analysis does not provide a definitive answer to the question of the functional role of alternative splicing, our results do indicate that alternative splice variants may have played a significant part in the evolution of brain and heart tissues in vertebrates.

Collapse

Zile K, Dessimoz C, Wurm Y, Masel J. Only a Single Taxonomically Restricted Gene Family in the Drosophila melanogaster Subgroup Can Be Identified with High Confidence. Genome Biol Evol 2020;12:1355-1366. [PMID: 32589737 PMCID: PMC8059200 DOI: 10.1093/gbe/evaa127] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/19/2020] [Indexed: 12/12/2022] Open

Rausell A, Luo Y, Lopez M, Seeleuthner Y, Rapaport F, Favier A, Stenson PD, Cooper DN, Patin E, Casanova JL, Quintana-Murci L, Abel L. Common homozygosity for predicted loss-of-function variants reveals both redundant and advantageous effects of dispensable human genes. Proc Natl Acad Sci U S A 2020;117:13626-13636. [PMID: 32487729 PMCID: PMC7306792 DOI: 10.1073/pnas.1917993117] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Affiliation(s)

Antonio Rausell Clinical Bioinformatics Laboratory, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France; University of Paris, Imagine Institute, 75015 Paris, France
Yufei Luo Clinical Bioinformatics Laboratory, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France University of Paris, Imagine Institute, 75015 Paris, France
Marie Lopez Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France
Yoann Seeleuthner University of Paris, Imagine Institute, 75015 Paris, France Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France
Franck Rapaport St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065
Antoine Favier Clinical Bioinformatics Laboratory, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France University of Paris, Imagine Institute, 75015 Paris, France
Peter D Stenson Institute of Medical Genetics, School of Medicine, Cardiff University, CF14 4XN Cardiff, United Kingdom
David N Cooper Institute of Medical Genetics, School of Medicine, Cardiff University, CF14 4XN Cardiff, United Kingdom
Etienne Patin Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France
Jean-Laurent Casanova University of Paris, Imagine Institute, 75015 Paris, France; Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065 Howard Hughes Medical Institute, New York, NY 10065 Pediatric Hematology and Immunology Unit, Necker Hospital for Sick Children, 75015 Paris, France
Lluis Quintana-Murci Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France Human Genomics and Evolution, Collège de France, Paris 75005, France
Laurent Abel University of Paris, Imagine Institute, 75015 Paris, France; Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065

Collapse

Volders PJ, Anckaert J, Verheggen K, Nuytens J, Martens L, Mestdagh P, Vandesompele J. LNCipedia 5: towards a reference set of human long non-coding RNAs. Nucleic Acids Res 2020;47:D135-D139. [PMID: 30371849 PMCID: PMC6323963 DOI: 10.1093/nar/gky1031] [Citation(s) in RCA: 343] [Impact Index Per Article: 85.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 10/17/2018] [Indexed: 12/20/2022] Open

Affiliation(s)

Pieter-Jan Volders Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium Center for Medical Genetics (CMGG), Ghent University, 9000 Ghent, Belgium VIB-UGent Center for Medical Biotechnology, 9000 Ghent, Belgium Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium To whom correspondence should be addressed. Tel: +32 9 332 6979; Fax: +32 9 332 6549;
Jasper Anckaert Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium Center for Medical Genetics (CMGG), Ghent University, 9000 Ghent, Belgium Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
Kenneth Verheggen VIB-UGent Center for Medical Biotechnology, 9000 Ghent, Belgium Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
Justine Nuytens Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium Center for Medical Genetics (CMGG), Ghent University, 9000 Ghent, Belgium Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
Lennart Martens VIB-UGent Center for Medical Biotechnology, 9000 Ghent, Belgium Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
Pieter Mestdagh Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium Center for Medical Genetics (CMGG), Ghent University, 9000 Ghent, Belgium Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
Jo Vandesompele Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium Center for Medical Genetics (CMGG), Ghent University, 9000 Ghent, Belgium Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium

Collapse

Brasó-Vives M, Povolotskaya IS, Hartasánchez DA, Farré X, Fernandez-Callejo M, Raveendran M, Harris RA, Rosene DL, Lorente-Galdos B, Navarro A, Marques-Bonet T, Rogers J, Juan D. Copy number variants and fixed duplications among 198 rhesus macaques (Macaca mulatta). PLoS Genet 2020;16:e1008742. [PMID: 32392208 PMCID: PMC7241854 DOI: 10.1371/journal.pgen.1008742] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 05/21/2020] [Accepted: 03/27/2020] [Indexed: 01/01/2023] Open

Affiliation(s)

Marina Brasó-Vives Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain Laboratoire de Biométrie et Biologie Évolutive UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Villeurbanne, France
Inna S. Povolotskaya Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Moscow, Russia
Diego A. Hartasánchez Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
Xavier Farré Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
Marcos Fernandez-Callejo National Centre for Genomic Analysis-Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
Muthuswamy Raveendran Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
R. Alan Harris Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
Douglas L. Rosene Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
Belen Lorente-Galdos Department of Neuroscience, Yale School of Medicine, New Haven, Connecticut, United States of America
Arcadi Navarro Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain National Institute for Bioinformatics (INB), Barcelona, Catalonia, Spain Institució Catalana de Recerca i Estudis Avançats, Barcelona, Catalonia, Spain
Tomas Marques-Bonet Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain National Centre for Genomic Analysis-Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain Institució Catalana de Recerca i Estudis Avançats, Barcelona, Catalonia, Spain Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Catalonia, Spain
Jeffrey Rogers Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
David Juan Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain

Collapse

Impey RE, Lee M, Hawkins DA, Sutton JM, Panjikar S, Perugini MA, Soares da Costa TP. Mis-annotations of a promising antibiotic target in high-priority gram-negative pathogens. FEBS Lett 2020;594:1453-1463. [PMID: 31943170 DOI: 10.1002/1873-3468.13733] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 12/17/2019] [Accepted: 12/17/2019] [Indexed: 11/09/2022]

Martinez-Gomez L, Abascal F, Jungreis I, Pozo F, Kellis M, Mudge JM, Tress ML. Few SINEs of life: Alu elements have little evidence for biological relevance despite elevated translation. NAR Genom Bioinform 2019;2:lqz023. [PMID: 31886458 PMCID: PMC6924539 DOI: 10.1093/nargab/lqz023] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/30/2019] [Accepted: 12/12/2019] [Indexed: 12/12/2022] Open

Innovating the Concept and Practice of Two-Dimensional Gel Electrophoresis in the Analysis of Proteomes at the Proteoform Level. Proteomes 2019;7:proteomes7040036. [PMID: 31671630 PMCID: PMC6958347 DOI: 10.3390/proteomes7040036] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/15/2019] [Accepted: 10/28/2019] [Indexed: 12/21/2022] Open

Hatje K, Mühlhausen S, Simm D, Kollmar M. The Protein-Coding Human Genome: Annotating High-Hanging Fruits. Bioessays 2019;41:e1900066. [PMID: 31544971 DOI: 10.1002/bies.201900066] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 08/07/2019] [Indexed: 12/19/2022]

Mudge JM, Jungreis I, Hunt T, Gonzalez JM, Wright JC, Kay M, Davidson C, Fitzgerald S, Seal R, Tweedie S, He L, Waterhouse RM, Li Y, Bruford E, Choudhary JS, Frankish A, Kellis M. Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci. Genome Res 2019;29:2073-2087. [PMID: 31537640 PMCID: PMC6886504 DOI: 10.1101/gr.246462.118] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 09/09/2019] [Indexed: 12/15/2022]

Abstract

The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.

Collapse

Affiliation(s)

Jonathan M Mudge European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Irwin Jungreis MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
Toby Hunt European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Jose Manuel Gonzalez European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
James C Wright Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, London SW7 3RP, United Kingdom
Mike Kay European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Claire Davidson European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Stephen Fitzgerald Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom
Ruth Seal European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.,Department of Haematology, University of Cambridge, Cambridge CB2 0PT, United Kingdom
Susan Tweedie European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Liang He MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
Robert M Waterhouse Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland.,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
Yue Li MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
Elspeth Bruford European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.,Department of Haematology, University of Cambridge, Cambridge CB2 0PT, United Kingdom
Jyoti S Choudhary Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, London SW7 3RP, United Kingdom
Adam Frankish European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Manolis Kellis MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA

Collapse

Machado KCT, Fortuin S, Tomazella GG, Fonseca AF, Warren RM, Wiker HG, de Souza SJ, de Souza GA. On the Impact of the Pangenome and Annotation Discrepancies While Building Protein Sequence Databases for Bacteria Proteogenomics. Front Microbiol 2019;10:1410. [PMID: 31281302 PMCID: PMC6596428 DOI: 10.3389/fmicb.2019.01410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 06/05/2019] [Indexed: 01/19/2023] Open

Babarinde IA, Li Y, Hutchins AP. Computational Methods for Mapping, Assembly and Quantification for Coding and Non-coding Transcripts. Comput Struct Biotechnol J 2019;17:628-637. [PMID: 31193391 PMCID: PMC6526290 DOI: 10.1016/j.csbj.2019.04.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 04/24/2019] [Accepted: 04/29/2019] [Indexed: 12/17/2022] Open

Rao MS, Van Vleet TR, Ciurlionis R, Buck WR, Mittelstadt SW, Blomme EAG, Liguori MJ. Comparison of RNA-Seq and Microarray Gene Expression Platforms for the Toxicogenomic Evaluation of Liver From Short-Term Rat Toxicity Studies. Front Genet 2019;9:636. [PMID: 30723492 PMCID: PMC6349826 DOI: 10.3389/fgene.2018.00636] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 11/27/2018] [Indexed: 12/12/2022] Open

Abstract

Gene expression profiling is a useful tool to predict and interrogate mechanisms of toxicity. RNA-Seq technology has emerged as an attractive alternative to traditional microarray platforms for conducting transcriptional profiling. The objective of this work was to compare both transcriptomic platforms to determine whether RNA-Seq offered significant advantages over microarrays for toxicogenomic studies. RNA samples from the livers of rats treated for 5 days with five tool hepatotoxicants (α-naphthylisothiocyanate/ANIT, carbon tetrachloride/CCl₄, methylenedianiline/MDA, acetaminophen/APAP, and diclofenac/DCLF) were analyzed with both gene expression platforms (RNA-Seq and microarray). Data were compared to determine any potential added scientific (i.e., better biological or toxicological insight) value offered by RNA-Seq compared to microarrays. RNA-Seq identified more differentially expressed protein-coding genes and provided a wider quantitative range of expression level changes when compared to microarrays. Both platforms identified a larger number of differentially expressed genes (DEGs) in livers of rats treated with ANIT, MDA, and CCl₄ compared to APAP and DCLF, in agreement with the severity of histopathological findings. Approximately 78% of DEGs identified with microarrays overlapped with RNA-Seq data, with a Spearman’s correlation of 0.7 to 0.83. Consistent with the mechanisms of toxicity of ANIT, APAP, MDA and CCl₄, both platforms identified dysregulation of liver relevant pathways such as Nrf2, cholesterol biosynthesis, eiF2, hepatic cholestasis, glutathione and LPS/IL-1 mediated RXR inhibition. RNA-Seq data showed additional DEGs that not only significantly enriched these pathways, but also suggested modulation of additional liver relevant pathways. In addition, RNA-Seq enabled the identification of non-coding DEGs that offer a potential for improved mechanistic clarity. Overall, these results indicate that RNA-Seq is an acceptable alternative platform to microarrays for rat toxicogenomic studies with several advantages. Because of its wider dynamic range as well as its ability to identify a larger number of DEGs, RNA-Seq may generate more insight into mechanisms of toxicity. However, more extensive reference data will be necessary to fully leverage these additional RNA-Seq data, especially for non-coding sequences.

Collapse

Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, Barnes I, Berry A, Bignell A, Carbonell Sala S, Chrast J, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Hunt T, Izuogu OG, Lagarde J, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FC, Parker A, Pei B, Pozo F, Ruffier M, Schmitt BM, Stapleton E, Suner MM, Sycheva I, Uszczynska-Ratajczak B, Xu J, Yates A, Zerbino D, Zhang Y, Aken B, Choudhary JS, Gerstein M, Guigó R, Hubbard TJ, Kellis M, Paten B, Reymond A, Tress ML, Flicek P. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 2019;47:D766-D773. [PMID: 30357393 PMCID: PMC6323946 DOI: 10.1093/nar/gky955] [Citation(s) in RCA: 1814] [Impact Index Per Article: 362.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/20/2018] [Accepted: 10/08/2018] [Indexed: 02/06/2023] Open

Affiliation(s)

Adam Frankish European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Mark Diekhans UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
Anne-Maud Ferreira Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
Rory Johnson Department of Medical Oncology, Inselspital, University Hospital, University of Bern, Bern, Switzerland Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland
Irwin Jungreis MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
Jane Loveland European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Jonathan M Mudge European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Cristina Sisu Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA Department of Bioscience, Brunel University London, Uxbridge UB8 3PH, UK
James Wright Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
Joel Armstrong UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
If Barnes European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Andrew Berry European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Alexandra Bignell European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Silvia Carbonell Sala Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
Jacqueline Chrast Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
Fiona Cunningham European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Tomás Di Domenico Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
Sarah Donaldson European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Ian T Fiddes UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
Carlos García Girón European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Jose Manuel Gonzalez European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Tiago Grego European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Matthew Hardy European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Thibaut Hourlier European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Toby Hunt European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Osagie G Izuogu European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Julien Lagarde Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
Fergal J Martin European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Laura Martínez Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
Shamika Mohanan European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Paul Muir Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT 06520, USA Systems Biology Institute, Yale University, West Haven, CT 06516, USA
Fabio C P Navarro Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Anne Parker European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Baikang Pei Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Fernando Pozo Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
Magali Ruffier European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Bianca M Schmitt European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Eloise Stapleton European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Marie-Marthe Suner European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Irina Sycheva European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Barbara Uszczynska-Ratajczak Centre of New Technologies, University of Warsaw, Warsaw, Poland
Jinuri Xu Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Andrew Yates European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Daniel Zerbino European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Yan Zhang Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
Bronwen Aken European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Jyoti S Choudhary Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
Mark Gerstein Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA Program in Computational Biology & Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
Roderic Guigó Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
Tim J P Hubbard Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
Manolis Kellis MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
Benedict Paten UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
Alexandre Reymond Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
Michael L Tress Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
Paul Flicek European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Collapse

Morris RJ. Thy-1, a Pathfinder Protein for the Post-genomic Era. Front Cell Dev Biol 2018;6:173. [PMID: 30619853 PMCID: PMC6305390 DOI: 10.3389/fcell.2018.00173] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 12/06/2018] [Indexed: 12/21/2022] Open

Sinha S, Eisenhaber B, Jensen LJ, Kalbuaji B, Eisenhaber F. Darkness in the Human Gene and Protein Function Space: Widely Modest or Absent Illumination by the Life Science Literature and the Trend for Fewer Protein Function Discoveries Since 2000. Proteomics 2018;18:e1800093. [PMID: 30265449 PMCID: PMC6282819 DOI: 10.1002/pmic.201800093] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 09/07/2018] [Indexed: 12/15/2022]

Melo Z, Ishida C, Goldaraz MDLP, Rojo R, Echavarria R. Novel Roles of Non-Coding RNAs in Opioid Signaling and Cardioprotection. Noncoding RNA 2018;4:ncrna4030022. [PMID: 30227648 PMCID: PMC6162605 DOI: 10.3390/ncrna4030022] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 09/10/2018] [Accepted: 09/12/2018] [Indexed: 12/16/2022] Open