1
|
Li HD, Xu Y, Zhu X, Liu Q, Omenn GS, Wang J. ClusterMine: A knowledge-integrated clustering approach based on expression profiles of gene sets. J Bioinform Comput Biol 2021; 18:2040009. [PMID: 32698720 DOI: 10.1142/s0219720020400090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Clustering analysis of gene expression data is essential for understanding complex biological data, and is widely used in important biological applications such as the identification of cell subpopulations and disease subtypes. In commonly used methods such as hierarchical clustering (HC) and consensus clustering (CC), holistic expression profiles of all genes are often used to assess the similarity between samples for clustering. While these methods have been proven successful in identifying sample clusters in many areas, they do not provide information about which gene sets (functions) contribute most to the clustering, thus limiting the interpretability of the resulting cluster. We hypothesize that integrating prior knowledge of annotated gene sets would not only achieve satisfactory clustering performance but also, more importantly, enable potential biological interpretation of clusters. Here we report ClusterMine, an approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets in functional annotation databases such as Gene Ontology. In addition to the cluster membership of each sample as provided by conventional approaches, it also outputs gene sets that most likely contribute to the clustering, thus facilitating biological interpretation. We compare ClusterMine with conventional approaches on nine real-world experimental datasets that represent different application scenarios in biology. We find that ClusterMine achieves better performances and that the gene sets prioritized by our method are biologically meaningful. ClusterMine is implemented as an R package and is freely available at: www.genemine.org/clustermine.php.
Collapse
Affiliation(s)
- Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| | - Yunpei Xu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| | - Xiaoshu Zhu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China.,School of Computer Science and Engineering, Yulin Normal University, Yulin, Guangxi, P. R. China
| | - Quan Liu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| | - Gilbert S Omenn
- Departments of Computational Medicine and Bioinformatics, Internal Medicine, Human Genetics and School of Public Health, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| |
Collapse
|
2
|
Shaw D, Chen H, Jiang T. DeepIsoFun: a deep domain adaptation approach to predict isoform functions. Bioinformatics 2020; 35:2535-2544. [PMID: 30535380 DOI: 10.1093/bioinformatics/bty1017] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/07/2018] [Accepted: 12/08/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Isoforms are mRNAs produced from the same gene locus by alternative splicing and may have different functions. Although gene functions have been studied extensively, little is known about the specific functions of isoforms. Recently, some computational approaches based on multiple instance learning have been proposed to predict isoform functions from annotated gene functions and expression data, but their performance is far from being desirable primarily due to the lack of labeled training data. To improve the performance on this problem, we propose a novel deep learning method, DeepIsoFun, that combines multiple instance learning with domain adaptation. The latter technique helps to transfer the knowledge of gene functions to the prediction of isoform functions and provides additional labeled training data. Our model is trained on a deep neural network architecture so that it can adapt to different expression distributions associated with different gene ontology terms. RESULTS We evaluated the performance of DeepIsoFun on three expression datasets of human and mouse collected from SRA studies at different times. On each dataset, DeepIsoFun performed significantly better than the existing methods. In terms of area under the receiver operating characteristics curve, our method acquired at least 26% improvement and in terms of area under the precision-recall curve, it acquired at least 10% improvement over the state-of-the-art methods. In addition, we also study the divergence of the functions predicted by our method for isoforms from the same gene and the overall correlation between expression similarity and the similarity of predicted functions. AVAILABILITY AND IMPLEMENTATION https://github.com/dls03/DeepIsoFun/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA.,Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
3
|
Oulas A, Minadakis G, Zachariou M, Sokratous K, Bourdakou MM, Spyrou GM. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches. Brief Bioinform 2019; 20:806-824. [PMID: 29186305 PMCID: PMC6585387 DOI: 10.1093/bib/bbx151] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/17/2017] [Indexed: 02/01/2023] Open
Abstract
Systems Bioinformatics is a relatively new approach, which lies in the intersection of systems biology and classical bioinformatics. It focuses on integrating information across different levels using a bottom-up approach as in systems biology with a data-driven top-down approach as in bioinformatics. The advent of omics technologies has provided the stepping-stone for the emergence of Systems Bioinformatics. These technologies provide a spectrum of information ranging from genomics, transcriptomics and proteomics to epigenomics, pharmacogenomics, metagenomics and metabolomics. Systems Bioinformatics is the framework in which systems approaches are applied to such data, setting the level of resolution as well as the boundary of the system of interest and studying the emerging properties of the system as a whole rather than the sum of the properties derived from the system's individual components. A key approach in Systems Bioinformatics is the construction of multiple networks representing each level of the omics spectrum and their integration in a layered network that exchanges information within and between layers. Here, we provide evidence on how Systems Bioinformatics enhances computational therapeutics and diagnostics, hence paving the way to precision medicine. The aim of this review is to familiarize the reader with the emerging field of Systems Bioinformatics and to provide a comprehensive overview of its current state-of-the-art methods and technologies. Moreover, we provide examples of success stories and case studies that utilize such methods and tools to significantly advance research in the fields of systems biology and systems medicine.
Collapse
Affiliation(s)
- Anastasis Oulas
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - George Minadakis
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Margarita Zachariou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Kleitos Sokratous
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Marilena M Bourdakou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - George M Spyrou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| |
Collapse
|
4
|
Jeong SK, Kim CY, Paik YK. ASV-ID, a Proteogenomic Workflow To Predict Candidate Protein Isoforms on the Basis of Transcript Evidence. J Proteome Res 2018; 17:4235-4242. [PMID: 30289715 DOI: 10.1021/acs.jproteome.8b00548] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
One of the goals of the Chromosome-Centric Human Proteome Project (C-HPP) is to map and characterize the functions of protein isoforms produced by alternative splicing of genes. However, identifying alternative splice variants (ASVs) via mass spectrometry remains a major challenge, because ASVs usually contain highly homologous peptide sequences. A routine protein sequence analysis suggests that more than half of the investigated proteins do not generate two or more uniquely mapping peptides that would enable their isoforms to be distinguished. Here, we develop a new proteogenomics method, named "ASV-ID" (alternative splicing variants identification), which enables identification of ASVs by using a cell type-specific protein sequence database that is supported by RNA-Seq data. Using this workflow, we identify 1935 distinct proteins under highly stringent conditions. In fact, transcript evidence on these 841 proteins helps us distinguish them from other isoforms, despite the fact that these proteins are not predicted to make 2 or more uniquely mapping peptides. We also demonstrate that ASV-ID enables detection of 19 differently expressed isoforms present in several cell lines. Thus, a new workflow using ASV-ID has the potential to map yet-to-be-identified difficult protein isoforms in a simple and robust way.
Collapse
|
5
|
Wu P, Zhou D, Lin W, Li Y, Wei H, Qian X, Jiang Y, He F. Cell-type-resolved alternative splicing patterns in mouse liver. DNA Res 2018; 25:4793385. [PMID: 29325017 PMCID: PMC6014294 DOI: 10.1093/dnares/dsx055] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 12/26/2017] [Indexed: 12/20/2022] Open
Abstract
Alternative splicing (AS) is an important post-transcriptional regulatory mechanism to generate transcription diversity. However, the functional roles of AS in multiple cell types from one organ have not been reported. Here, we provide the most comprehensive profile for cell-type-resolved AS patterns in mouse liver. A total of 13,637 AS events are detected, representing 81.5% of all known AS events in the database. About 46.2% of multi-exon genes undergo AS from the four cell types of mouse liver: hepatocyte, liver sinusoidal endothelial cell, Kupffer cell and hepatic stellate cell, which regulates cell-specific functions and maintains cell characteristics. We also present a cell-type-specific splicing factors network in these four cell types of mouse liver, allowing data mining and generating knowledge to elucidate the roles of splicing factors in sustaining the cell-type-specialized AS profiles and functions. The splicing switching of Tak1 gene between different cell types is firstly discovered and the specific Tak1 isoform regulates hepatic cell-type-specific functions is verified. Thus, our work constructs a hepatic cell-specific splicing landscape and reveals the considerable contribution of AS to the cell type constitution and organ features.
Collapse
Affiliation(s)
- Peng Wu
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Donghu Zhou
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Weiran Lin
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yanyan Li
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Handong Wei
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Xiaohong Qian
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Ying Jiang
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Fuchu He
- State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing 102206, China
| |
Collapse
|
6
|
Saha A, Kim Y, Gewirtz ADH, Jo B, Gao C, McDowell IC, Engelhardt BE, Battle A. Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Res 2017; 27:1843-1858. [PMID: 29021288 PMCID: PMC5668942 DOI: 10.1101/gr.216721.116] [Citation(s) in RCA: 106] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 08/22/2017] [Indexed: 11/24/2022]
Abstract
Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.
Collapse
|
7
|
Tranchevent LC, Aubé F, Dulaurier L, Benoit-Pilven C, Rey A, Poret A, Chautard E, Mortada H, Desmet FO, Chakrama FZ, Moreno-Garcia MA, Goillot E, Janczarski S, Mortreux F, Bourgeois CF, Auboeuf D. Identification of protein features encoded by alternative exons using Exon Ontology. Genome Res 2017; 27:1087-1097. [PMID: 28420690 PMCID: PMC5453322 DOI: 10.1101/gr.212696.116] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 03/28/2017] [Indexed: 12/16/2022]
Abstract
Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward this end, we developed a computational approach, named “Exon Ontology,” based on terms corresponding to well-characterized protein features organized in an ontology tree. Exon Ontology is conceptually similar to Gene Ontology-based approaches but focuses on exon-encoded protein features instead of gene level functional annotations. Exon Ontology describes the protein features encoded by a selected list of exons and looks for potential Exon Ontology term enrichment. By applying this strategy to exons that are differentially spliced between epithelial and mesenchymal cells and after extensive experimental validation, we demonstrate that Exon Ontology provides support to discover specific protein features regulated by alternative splicing. We also show that Exon Ontology helps to unravel biological processes that depend on suites of coregulated alternative exons, as we uncovered a role of epithelial cell-enriched splicing factors in the AKT signaling pathway and of mesenchymal cell-enriched splicing factors in driving splicing events impacting on autophagy. Freely available on the web, Exon Ontology is the first computational resource that allows getting a quick insight into the protein features encoded by alternative exons and investigating whether coregulated exons contain the same biological information.
Collapse
Affiliation(s)
- Léon-Charles Tranchevent
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fabien Aubé
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Louis Dulaurier
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Clara Benoit-Pilven
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Amandine Rey
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Arnaud Poret
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Emilie Chautard
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, UMR CNRS 5558, INRIA Erable, Villeurbanne, F-69622, France
| | - Hussein Mortada
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - François-Olivier Desmet
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fatima Zahra Chakrama
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Maira Alejandra Moreno-Garcia
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Evelyne Goillot
- Institut NeuroMyoGène, CNRS UMR 5310, INSERM U1217, Université Lyon 1, Lyon, F-69007 France
| | - Stéphane Janczarski
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Franck Mortreux
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Cyril F Bourgeois
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Didier Auboeuf
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| |
Collapse
|
8
|
Yang IS, Son H, Kim S, Kim S. ISOexpresso: a web-based platform for isoform-level expression analysis in human cancer. BMC Genomics 2016; 17:631. [PMID: 27519173 PMCID: PMC4983006 DOI: 10.1186/s12864-016-2852-6] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 06/20/2016] [Indexed: 11/21/2022] Open
Abstract
Background Alternative splicing events that result in the production of multiple gene isoforms reveals important molecular mechanisms. Gene isoforms are often differentially expressed across organs and tissues, developmental stages, and disease conditions. Specifically, recent studies show that aberrant regulation of alternative splicing frequently occurs in cancer to affect tumor cell transformation and growth. While analysis of isoform expression is important for discovering tumor-specific isoform signatures and interpreting relevant genomic mutations, there is currently no web-based, easy-to-use, and publicly available platform for this purpose. Description We developed ISOexpresso to provide information regarding isoform existence and expression, which can be grouped by cancer vs. normal conditions, cancer types, and tissue types. ISOexpresso implements two main functions: First, the Isoform Expression View function creates visualizations for condition-specific RNA/isoform expression patterns upon query of a gene of interest. With this function, users can easily determine the major isoform (the most expressed isoform in a sample) of a gene with respect to the condition and check whether it matches the known canonical isoform. ISOexpresso outputs expression levels of all known transcripts to check alterations of expression landscape and to find potential tumor-specific isoforms. Second, the User Data Annotation function supports annotation of genomic variants to determine the most plausible consequence of a variation (e.g., an amino acid change) among many possible interpretations. As most coding sequence mutations are effective through the subsequent transcription and translation, ISOexpresso automatically prioritizes transcripts that act as backbones for mutation effect prediction by their relative expression. By employing ISOexpresso, we could investigate the consistency between the most expressed and known canonical/principal isoforms, as well as infer candidate tumor-specific isoforms based on their expression levels. In addition, we confirmed that ISOexpresso could easily reproduce previously known isoform expression patterns: recurrent observation of a major isoform across tissues, differential isoform expression patterns in a given tissue, and switching of major isoform during tumorigenesis. Conclusions ISOexpresso serves as a web-based, easy-to-use platform for isoform expression and alteration analysis based on large-scale cancer database. We anticipate that ISOexpresso will expedite formulation and confirmation of novel hypotheses by providing isoform-level perspectives on cancer research. The ISOexpresso database is available online at http://wiki.tgilab.org/ISOexpresso/. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2852-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- In Seok Yang
- Severance Biomedical Science Institute, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seoul, 03722, Korea
| | - Hyeonju Son
- Severance Biomedical Science Institute, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seoul, 03722, Korea.,Brain Korea 21 PLUS Project for Medical Sciences, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seoul, 03722, Korea
| | - Sora Kim
- Severance Biomedical Science Institute, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seoul, 03722, Korea.,Brain Korea 21 PLUS Project for Medical Sciences, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seoul, 03722, Korea
| | - Sangwoo Kim
- Severance Biomedical Science Institute, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seoul, 03722, Korea. .,Brain Korea 21 PLUS Project for Medical Sciences, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seoul, 03722, Korea.
| |
Collapse
|
9
|
Li HD, Omenn GS, Guan Y. A proteogenomic approach to understand splice isoform functions through sequence and expression-based computational modeling. Brief Bioinform 2016; 17:1024-1031. [PMID: 26740460 DOI: 10.1093/bib/bbv109] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 11/03/2015] [Indexed: 01/23/2023] Open
Abstract
The products of multi-exon genes are a mixture of alternatively spliced isoforms, from which the translated proteins can have similar, different or even opposing functions. It is therefore essential to differentiate and annotate functions for individual isoforms. Computational approaches provide an efficient complement to expensive and time-consuming experimental studies. The input data of these methods range from DNA sequence, to RNA selection pressure, to expressed sequence tags, to full-length complementary DNA, to exon array, to RNA-seq expression, to proteomic data. Notably, RNA-seq technology generates quantitative profiling of transcript expression at the genome scale, with an unprecedented amount of expression data available for developing isoform function prediction methods. Integrative analysis of these data at different molecular levels enables a proteogenomic approach to systematically interrogate isoform functions. Here, we briefly review the state-of-the-art methods according to their input data sources, discuss their advantages and limitations and point out potential ways to improve prediction accuracies.
Collapse
|
10
|
Li HD, Menon R, Govindarajoo B, Panwar B, Zhang Y, Omenn GS, Guan Y. Functional Networks of Highest-Connected Splice Isoforms: From The Chromosome 17 Human Proteome Project. J Proteome Res 2015. [PMID: 26216192 DOI: 10.1021/acs.jproteome.5b00494] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Alternative splicing allows a single gene to produce multiple transcript-level splice isoforms from which the translated proteins may show differences in their expression and function. Identifying the major functional or canonical isoform is important for understanding gene and protein functions. Identification and characterization of splice isoforms is a stated goal of the HUPO Human Proteome Project and of neXtProt. Multiple efforts have catalogued splice isoforms as "dominant", "principal", or "major" isoforms based on expression or evolutionary traits. In contrast, we recently proposed highest connected isoforms (HCIs) as a new class of canonical isoforms that have the strongest interactions in a functional network and revealed their significantly higher (differential) transcript-level expression compared to nonhighest connected isoforms (NCIs) regardless of tissues/cell lines in the mouse. HCIs and their expression behavior in the human remain unexplored. Here we identified HCIs for 6157 multi-isoform genes using a human isoform network that we constructed by integrating a large compendium of heterogeneous genomic data. We present examples for pairs of transcript isoforms of ABCC3, RBM34, ERBB2, and ANXA7. We found that functional networks of isoforms of the same gene can show large differences. Interestingly, differential expression between HCIs and NCIs was also observed in the human on an independent set of 940 RNA-seq samples across multiple tissues, including heart, kidney, and liver. Using proteomic data from normal human retina and placenta, we showed that HCIs are a promising indicator of expressed protein isoforms exemplified by NUDFB6 and M6PR. Furthermore, we found that a significant percentage (20%, p = 0.0003) of human and mouse HCIs are homologues, suggesting their conservation between species. Our identified HCIs expand the repertoire of canonical isoforms and are expected to facilitate studying main protein products, understanding gene regulation, and possibly evolution. The network is available through our web server as a rich resource for investigating isoform functional relationships (http://guanlab.ccmb.med.umich.edu/hisonet). All MS/MS data were available at ProteomeXchange Web site (http://www.proteomexchange.org) through their identifiers (retina: PXD001242, placenta: PXD000754).
Collapse
Affiliation(s)
- Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Bharat Panwar
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|
11
|
Panwar B, Menon R, Eksi R, Omenn GS, Guan Y. MI-PVT: A Tool for Visualizing the Chromosome-Centric Human Proteome. J Proteome Res 2015. [PMID: 26204236 DOI: 10.1021/acs.jproteome.5b00525] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
We have developed the web-based Michigan Proteome Visualization Tool (MI-PVT) to visualize and compare protein expression and isoform-level function across human chromosomes and tissues (http://guanlab.ccmb.med.umich.edu/mipvt). As proof of principle, we have populated the tool with Human Proteome Map (HPM) data. We were able to observe many biologically interesting features. From the vantage point of our chromosome 17 team, for example, we found more than 300 proteins from chromosome 17 expressed in each of the 30 tissues and cell types studied, with the highest number of expressed proteins being 685 in testis. Comparisons of expression levels across tissues showed low numbers of proteins expressed in esophagus, but esophagus had 12 cytoskeletal proteins coded on chromosome 17 with very high expression (>1000 spectral counts). This customized MI-PVT should be helpful for biologists to browse and study specific proteins and protein data sets across tissues and chromosomes. Users can upload any data of interest in MI-PVT for visualization. Our aim is to integrate extensive mass-spectrometric proteomic data into the tool to facilitate finding chromosome-centric protein expression and correlation across tissues.
Collapse
Affiliation(s)
- Bharat Panwar
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Ridvan Eksi
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|