1
|
Lin Z, Qin Y, Chen H, Shi D, Zhong M, An T, Chen L, Wang Y, Lin F, Li G, Ji ZL. TransIntegrator: capture nearly full protein-coding transcript variants via integrating Illumina and PacBio transcriptomes. Brief Bioinform 2023; 24:bbad334. [PMID: 37779246 DOI: 10.1093/bib/bbad334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/23/2023] [Accepted: 08/30/2023] [Indexed: 10/03/2023] Open
Abstract
Genes have the ability to produce transcript variants that perform specific cellular functions. However, accurately detecting all transcript variants remains a long-standing challenge, especially when working with poorly annotated genomes or without a known genome. To address this issue, we have developed a new computational method, TransIntegrator, which enables transcriptome-wide detection of novel transcript variants. For this, we determined 10 Illumina sequencing transcriptomes and a PacBio full-length transcriptome for consecutive embryo development stages of amphioxus, a species of great evolutionary importance. Based on the transcriptomes, we employed TransIntegrator to create a comprehensive transcript variant library, namely iTranscriptome. The resulting iTrancriptome contained 91 915 distinct transcript variants, with an average of 2.4 variants per gene. This substantially improved current amphioxus genome annotation by expanding the number of genes from 21 954 to 38 777. Further analysis manifested that the gene expansion was largely ascribed to integration of multiple Illumina datasets instead of involving the PacBio data. Moreover, we demonstrated an example application of TransIntegrator, via generating iTrancriptome, in aiding accurate transcriptome assembly, which significantly outperformed other hybrid methods such as IDP-denovo and Trinity. For user convenience, we have deposited the source codes of TransIntegrator on GitHub as well as a conda package in Anaconda. In summary, this study proposes an affordable but efficient method for reliable transcriptomic research in most species.
Collapse
Affiliation(s)
- Zhe Lin
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
- National Institute for Data Science in Health and Medicine, Xiamen University, 361102, Xiamen, China
| | - Yangmei Qin
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Hao Chen
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Dan Shi
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Mindong Zhong
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Te An
- School of Informatics, Xiamen University, 361005, Xiamen, China
| | - Linshan Chen
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Yiquan Wang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Fan Lin
- National Institute for Data Science in Health and Medicine, Xiamen University, 361102, Xiamen, China
- School of Informatics, Xiamen University, 361005, Xiamen, China
| | - Guang Li
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Zhi-Liang Ji
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
- National Institute for Data Science in Health and Medicine, Xiamen University, 361102, Xiamen, China
| |
Collapse
|
2
|
Zhou Y, Song BL. An urgent call on revisions to current genome annotation strategies. SCIENCE CHINA. LIFE SCIENCES 2023; 66:1942-1943. [PMID: 37118509 DOI: 10.1007/s11427-023-2350-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 04/21/2023] [Indexed: 04/30/2023]
Affiliation(s)
- Yu Zhou
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, Wuhan University, Wuhan, 430072, China.
| | - Bao-Liang Song
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, Wuhan University, Wuhan, 430072, China.
- TaiKang Medical School, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|
3
|
Yang Y, Wen X, Wu Z, Wang K, Zhu Y. Large-scale long terminal repeat insertions produced a significant set of novel transcripts in cotton. SCIENCE CHINA. LIFE SCIENCES 2023; 66:1711-1724. [PMID: 37079218 DOI: 10.1007/s11427-022-2341-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 04/03/2023] [Indexed: 04/21/2023]
Abstract
Genomic analysis has revealed that the 1,637-Mb Gossypium arboreum genome contains approximately 81% transposable elements (TEs), while only 57% of the 735-Mb G. raimondii genome is occupied by TEs. In this study, we investigated whether there were unknown transcripts associated with TE or TE fragments and, if so, how these new transcripts were evolved and regulated. As sequence depths increased from 4 to 100 G, a total of 10,284 novel intergenic transcripts (intergenic genes) were discovered. On average, approximately 84% of these intergenic transcripts possibly overlapped with the long terminal repeat (LTR) insertions in the otherwise untranscribed intergenic regions and were expressed at relatively low levels. Most of these intergenic transcripts possessed no transcription activation markers, while the majority of the regular genic genes possessed at least one such marker. Genes without transcription activation markers formed their+1 and -1 nucleosomes more closely (only (117±1.4)bp apart), while twice as big spaces (approximately (403.5±46.0) bp apart) were detected for genes with the activation markers. The analysis of 183 previously assembled genomes across three different kingdoms demonstrated systematically that intergenic transcript numbers in a given genome correlated positively with its LTR content. Evolutionary analysis revealed that genic genes originated during one of the whole-genome duplication events around 137.7 million years ago (MYA) for all eudicot genomes or 13.7 MYA for the Gossypium family, respectively, while the intergenic transcripts evolved around 1.6 MYA, resultant of the last LTR insertion. The characterization of these low-transcribed intergenic transcripts can facilitate our understanding of the potential biological roles played by LTRs during speciation and diversifications.
Collapse
Affiliation(s)
- Yan Yang
- Institute for Advanced Studies, Wuhan University, Wuhan, 430072, China
| | - Xingpeng Wen
- Institute for Advanced Studies, Wuhan University, Wuhan, 430072, China
- College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Zhiguo Wu
- College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Kun Wang
- College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Yuxian Zhu
- Institute for Advanced Studies, Wuhan University, Wuhan, 430072, China.
- College of Life Sciences, Wuhan University, Wuhan, 430072, China.
- Hubei Hongshan Laboratory, Wuhan, 430072, China.
- TaiKang Center for Life and Medical Sciences, RNA Institute, Remin Hospital, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|
4
|
da Silva EMG, Rebello KM, Choi YJ, Gregorio V, Paschoal AR, Mitreva M, McKerrow JH, Neves-Ferreira AGDC, Passetti F. Identification of Novel Genes and Proteoforms in Angiostrongylus costaricensis through a Proteogenomic Approach. Pathogens 2022; 11:1273. [PMID: 36365024 PMCID: PMC9694666 DOI: 10.3390/pathogens11111273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/15/2022] [Accepted: 10/20/2022] [Indexed: 07/22/2023] Open
Abstract
RNA sequencing (RNA-Seq) and mass-spectrometry-based proteomics data are often integrated in proteogenomic studies to assist in the prediction of eukaryote genome features, such as genes, splicing, single-nucleotide (SNVs), and single-amino-acid variants (SAAVs). Most genomes of parasite nematodes are draft versions that lack transcript- and protein-level information and whose gene annotations rely only on computational predictions. Angiostrongylus costaricensis is a roundworm species that causes an intestinal inflammatory disease, known as abdominal angiostrongyliasis (AA). Currently, there is no drug available that acts directly on this parasite, mostly due to the sparse understanding of its molecular characteristics. The available genome of A. costaricensis, specific to the Costa Rica strain, is a draft version that is not supported by transcript- or protein-level evidence. This study used RNA-Seq and MS/MS data to perform an in-depth annotation of the A. costaricensis genome. Our prediction improved the reference annotation with (a) novel coding and non-coding genes; (b) pieces of evidence of alternative splicing generating new proteoforms; and (c) a list of SNVs between the Brazilian (Crissiumal) and the Costa Rica strain. To the best of our knowledge, this is the first time that a multi-omics approach has been used to improve the genome annotation of A. costaricensis. We hope this improved genome annotation can assist in the future development of drugs, kits, and vaccines to treat, diagnose, and prevent AA caused by either the Brazil strain (Crissiumal) or the Costa Rica strain.
Collapse
Affiliation(s)
- Esdras Matheus Gomes da Silva
- Instituto Carlos Chagas, Fiocruz, Curitiba 81350-010, PR, Brazil
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro 21040-900, RJ, Brazil
| | - Karina Mastropasqua Rebello
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro 21040-900, RJ, Brazil
- Laboratory of Integrated Studies in Protozoology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro 21040-360, RJ, Brazil
| | - Young-Jun Choi
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Vitor Gregorio
- Bioinformatics and Pattern Recognition Group (Bioinfo-CP), Department of Computer Science (DACOM), Federal University of Technology-Parana (UTFPR), Cornélio Procópio 86300-000, PR, Brazil
| | - Alexandre Rossi Paschoal
- Bioinformatics and Pattern Recognition Group (Bioinfo-CP), Department of Computer Science (DACOM), Federal University of Technology-Parana (UTFPR), Cornélio Procópio 86300-000, PR, Brazil
| | - Makedonka Mitreva
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - James H. McKerrow
- Center for Discovery and Innovation in Parasitic Diseases, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA 92093, USA
| | | | - Fabio Passetti
- Instituto Carlos Chagas, Fiocruz, Curitiba 81350-010, PR, Brazil
| |
Collapse
|
5
|
Wu P, Pu L, Deng B, Li Y, Chen Z, Liu W. PASS: A Proteomics Alternative Splicing Screening Pipeline. Proteomics 2019; 19:e1900041. [PMID: 31095856 DOI: 10.1002/pmic.201900041] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 04/28/2019] [Indexed: 12/11/2022]
Abstract
Alternative splicing (AS) has been well-investigated at the trancriptome level by the application of RNA-seq technology. There is an ongoing debate on the biological importance of AS to proteome complexity. A toolkit for accurately identifying AS from proteome data is urgently needed. Here, a software called PASS is developed to comprehensively detect AS events for the proteomics mass spectrometry (MS) data. Moreover, PASS is well compatible with MS identification by the proteogenomics approach, which provides novel AS candidates for proteome identification. The workflow of PASS mainly contains five core steps: transcripts reconstruction from RNA-Seq data, novel protein sequence generation, MS data searching, proSAM file formatting, and AS detection. Access to the program from either step is supported. PASS is successfully applied to proteome data of mouse hepatocytes and 407 AS events are first identified with proteomics MS evidences. PASS is expected to be widely used to identify AS events on proteome data and provide a deeper understanding of the proteome isoforms. The PASS software is freely available at https://github.com/wupengomics/PASS.
Collapse
Affiliation(s)
- Peng Wu
- State Key Laboratory of Experimental Hematology, Institute of Hematology and Blood Disease Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300020, China.,Center for Stem Cell Medicine, Chinese Academy of Medical Sciences, Tianjin, 300020, China
| | - Lingling Pu
- Tianjin Institute of Environmental and Operational Medicine, Tianjin, 300050, China
| | - Bingnan Deng
- Tianjin Institute of Environmental and Operational Medicine, Tianjin, 300050, China
| | - Yingying Li
- Tianjin Institute of Environmental and Operational Medicine, Tianjin, 300050, China
| | - Zhaoli Chen
- Tianjin Institute of Environmental and Operational Medicine, Tianjin, 300050, China
| | - Weili Liu
- Tianjin Institute of Environmental and Operational Medicine, Tianjin, 300050, China
| |
Collapse
|
6
|
Manzoni C, Kia DA, Vandrovcova J, Hardy J, Wood NW, Lewis PA, Ferrari R. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Brief Bioinform 2019; 19:286-302. [PMID: 27881428 PMCID: PMC6018996 DOI: 10.1093/bib/bbw114] [Citation(s) in RCA: 395] [Impact Index Per Article: 65.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Indexed: 02/07/2023] Open
Abstract
Advances in the technologies and informatics used to generate and process large biological data sets (omics data) are promoting a critical shift in the study of biomedical sciences. While genomics, transcriptomics and proteinomics, coupled with bioinformatics and biostatistics, are gaining momentum, they are still, for the most part, assessed individually with distinct approaches generating monothematic rather than integrated knowledge. As other areas of biomedical sciences, including metabolomics, epigenomics and pharmacogenomics, are moving towards the omics scale, we are witnessing the rise of inter-disciplinary data integration strategies to support a better understanding of biological systems and eventually the development of successful precision medicine. This review cuts across the boundaries between genomics, transcriptomics and proteomics, summarizing how omics data are generated, analysed and shared, and provides an overview of the current strengths and weaknesses of this global approach. This work intends to target students and researchers seeking knowledge outside of their field of expertise and fosters a leap from the reductionist to the global-integrative analytical approach in research.
Collapse
Affiliation(s)
- Claudia Manzoni
- School of Pharmacy, University of Reading, Whiteknights, Reading, United Kingdom.,Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Demis A Kia
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Jana Vandrovcova
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - John Hardy
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Nicholas W Wood
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Patrick A Lewis
- School of Pharmacy, University of Reading, Whiteknights, Reading, United Kingdom.,Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Raffaele Ferrari
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| |
Collapse
|
7
|
Peng X, Xu X, Wang Y, Hawke DH, Yu S, Han L, Zhou Z, Mojumdar K, Jeong KJ, Labrie M, Tsang YH, Zhang M, Lu Y, Hwu P, Scott KL, Liang H, Mills GB. A-to-I RNA Editing Contributes to Proteomic Diversity in Cancer. Cancer Cell 2018; 33:817-828.e7. [PMID: 29706454 PMCID: PMC5953833 DOI: 10.1016/j.ccell.2018.03.026] [Citation(s) in RCA: 147] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Revised: 02/05/2018] [Accepted: 03/26/2018] [Indexed: 01/30/2023]
Abstract
Adenosine (A) to inosine (I) RNA editing introduces many nucleotide changes in cancer transcriptomes. However, due to the complexity of post-transcriptional regulation, the contribution of RNA editing to proteomic diversity in human cancers remains unclear. Here, we performed an integrated analysis of TCGA genomic data and CPTAC proteomic data. Despite limited site diversity, we demonstrate that A-to-I RNA editing contributes to proteomic diversity in breast cancer through changes in amino acid sequences. We validate the presence of editing events at both RNA and protein levels. The edited COPA protein increases proliferation, migration, and invasion of cancer cells in vitro. Our study suggests an important contribution of A-to-I RNA editing to protein diversity in cancer and highlights its translational potential.
Collapse
Affiliation(s)
- Xinxin Peng
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Xiaoyan Xu
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Department of Pathophysiology, College of Basic Medicine, China Medical University, Shenyang, Liaoning Province 110122, China
| | - Yumeng Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX 77030, USA
| | - David H Hawke
- The Proteomics and Metabolomics Facility, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Shuangxing Yu
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Leng Han
- Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center at Houston McGovern Medical School, Houston, TX 77030, USA
| | - Zhicheng Zhou
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Kamalika Mojumdar
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Kang Jin Jeong
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Marilyne Labrie
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Yiu Huen Tsang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Minying Zhang
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Yiling Lu
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Patrick Hwu
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Department of Sarcoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Kenneth L Scott
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Han Liang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX 77030, USA; Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| | - Gordon B Mills
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
8
|
Li X, Brock GN, Rouchka EC, Cooper NGF, Wu D, O’Toole TE, Gill RS, Eteleeb AM, O’Brien L, Rai SN. A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data. PLoS One 2017; 12:e0176185. [PMID: 28459823 PMCID: PMC5411036 DOI: 10.1371/journal.pone.0176185] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2016] [Accepted: 04/06/2017] [Indexed: 01/08/2023] Open
Abstract
Normalization is an essential step with considerable impact on high-throughput RNA sequencing (RNA-seq) data analysis. Although there are numerous methods for read count normalization, it remains a challenge to choose an optimal method due to multiple factors contributing to read count variability that affects the overall sensitivity and specificity. In order to properly determine the most appropriate normalization methods, it is critical to compare the performance and shortcomings of a representative set of normalization routines based on different dataset characteristics. Therefore, we set out to evaluate the performance of the commonly used methods (DESeq, TMM-edgeR, FPKM-CuffDiff, TC, Med UQ and FQ) and two new methods we propose: Med-pgQ2 and UQ-pgQ2 (per-gene normalization after per-sample median or upper-quartile global scaling). Our per-gene normalization approach allows for comparisons between conditions based on similar count levels. Using the benchmark Microarray Quality Control Project (MAQC) and simulated datasets, we performed differential gene expression analysis to evaluate these methods. When evaluating MAQC2 with two replicates, we observed that Med-pgQ2 and UQ-pgQ2 achieved a slightly higher area under the Receiver Operating Characteristic Curve (AUC), a specificity rate > 85%, the detection power > 92% and an actual false discovery rate (FDR) under 0.06 given the nominal FDR (≤0.05). Although the top commonly used methods (DESeq and TMM-edgeR) yield a higher power (>93%) for MAQC2 data, they trade off with a reduced specificity (<70%) and a slightly higher actual FDR than our proposed methods. In addition, the results from an analysis based on the qualitative characteristics of sample distribution for MAQC2 and human breast cancer datasets show that only our gene-wise normalization methods corrected data skewed towards lower read counts. However, when we evaluated MAQC3 with less variation in five replicates, all methods performed similarly. Thus, our proposed Med-pgQ2 and UQ-pgQ2 methods perform slightly better for differential gene analysis of RNA-seq data skewed towards lowly expressed read counts with high variation by improving specificity while maintaining a good detection power with a control of the nominal FDR level.
Collapse
Affiliation(s)
- Xiaohong Li
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, United States of America
- Department of Anatomical Sciences and Neurobiology, University of Louisville, Louisville, KY, United States of America
| | - Guy N. Brock
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, United States of America
- Department of Biomedical Informatics, Ohio State University, Columbus, OH, United States of America
| | - Eric C. Rouchka
- Department of Computer Engineering Computer Science, University of Louisville, Louisville, KY, United States of America
| | - Nigel G. F. Cooper
- Department of Anatomical Sciences and Neurobiology, University of Louisville, Louisville, KY, United States of America
| | - Dongfeng Wu
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, United States of America
| | - Timothy E. O’Toole
- Department of Cardiology, University of Louisville, Louisville, KY, United States of America
| | - Ryan S. Gill
- Department of Mathematics, University of Louisville, Louisville, KY, United States of America
| | - Abdallah M. Eteleeb
- Department of Internal Medicine, Oncology Division, Washington University, St. Louis, MO, United States of America
| | - Liz O’Brien
- Department of Epidemiology, University of Louisville, Louisville, KY, United States of America
| | - Shesh N. Rai
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, United States of America
- * E-mail:
| |
Collapse
|
9
|
Fu S, Liu X, Luo M, Xie K, Nice EC, Zhang H, Huang C. Proteogenomic studies on cancer drug resistance: towards biomarker discovery and target identification. Expert Rev Proteomics 2017; 14:351-362. [PMID: 28276747 DOI: 10.1080/14789450.2017.1299006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
INTRODUCTION Chemoresistance is a major obstacle for current cancer treatment. Proteogenomics is a powerful multi-omics research field that uses customized protein sequence databases generated by genomic and transcriptomic information to identify novel genes (e.g. noncoding, mutation and fusion genes) from mass spectrometry-based proteomic data. By identifying aberrations that are differentially expressed between tumor and normal pairs, this approach can also be applied to validate protein variants in cancer, which may reveal the response to drug treatment. Areas covered: In this review, we will present recent advances in proteogenomic investigations of cancer drug resistance with an emphasis on integrative proteogenomic pipelines and the biomarker discovery which contributes to achieving the goal of using precision/personalized medicine for cancer treatment. Expert commentary: The discovery and comprehensive understanding of potential biomarkers help identify the cohort of patients who may benefit from particular treatments, and will assist real-time clinical decision-making to maximize therapeutic efficacy and minimize adverse effects. With the development of MS-based proteomics and NGS-based sequencing, a growing number of proteogenomic tools are being developed specifically to investigate cancer drug resistance.
Collapse
Affiliation(s)
- Shuyue Fu
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| | - Xiang Liu
- b Department of Pathology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Maochao Luo
- c West China School of Public Health, Sichuan University , Chengdu , P.R.China
| | - Ke Xie
- d Department of Oncology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Edouard C Nice
- e Department of Biochemistry and Molecular Biology , Monash University , Clayton , Australia
| | - Haiyuan Zhang
- f School of Medicine , Yangtze University , P. R. China
| | - Canhua Huang
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| |
Collapse
|
10
|
Ma C, Xu S, Liu G, Liu X, Xu X, Wen B, Liu S. Improvement of peptide identification with considering the abundance of mRNA and peptide. BMC Bioinformatics 2017; 18:109. [PMID: 28201984 PMCID: PMC5311845 DOI: 10.1186/s12859-017-1491-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Accepted: 01/20/2017] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Tandem mass spectrometry (MS/MS) followed by database search is a main approach to identify peptides/proteins in proteomic studies. A lot of effort has been devoted to improve the identification accuracy and sensitivity for peptides/proteins, such as developing advanced algorithms and expanding protein databases. RESULTS Herein, we described a new strategy for enhancing the sensitivity of protein/peptide identification through combination of mRNA and peptide abundance in Percolator. In our strategy, a new workflow for peptide identification is established on the basis of the abundance of transcripts and potential novel transcripts derived from RNA-Seq and abundance of peptides towards the same life species. We demonstrate the utility of this strategy by two MS/MS datasets and the results indicate that about 5% ~ 8% improvement of peptide identification can be achieved with 1% FDR in peptide level by integrating the peptide abundance, the transcript abundance and potential novel transcripts from RNA-Seq data. Meanwhile, 181 and 154 novel peptides were identified in the two datasets, respectively. CONCLUSIONS We have demonstrated that this strategy could enable improvement of peptide/protein identification and discovery of novel peptides, as compared with the traditional search methods.
Collapse
Affiliation(s)
| | | | - Geng Liu
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Xin Liu
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Bo Wen
- BGI-Shenzhen, Shenzhen, 518083, China.
| | - Siqi Liu
- BGI-Shenzhen, Shenzhen, 518083, China.
| |
Collapse
|
11
|
Prasad TSK, Mohanty AK, Kumar M, Sreenivasamurthy SK, Dey G, Nirujogi RS, Pinto SM, Madugundu AK, Patil AH, Advani J, Manda SS, Gupta MK, Dwivedi SB, Kelkar DS, Hall B, Jiang X, Peery A, Rajagopalan P, Yelamanchi SD, Solanki HS, Raja R, Sathe GJ, Chavan S, Verma R, Patel KM, Jain AP, Syed N, Datta KK, Khan AA, Dammalli M, Jayaram S, Radhakrishnan A, Mitchell CJ, Na CH, Kumar N, Sinnis P, Sharakhov IV, Wang C, Gowda H, Tu Z, Kumar A, Pandey A. Integrating transcriptomic and proteomic data for accurate assembly and annotation of genomes. Genome Res 2016; 27:133-144. [PMID: 28003436 PMCID: PMC5204337 DOI: 10.1101/gr.201368.115] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 11/10/2016] [Indexed: 01/05/2023]
Abstract
Complementing genome sequence with deep transcriptome and proteome data could enable more accurate assembly and annotation of newly sequenced genomes. Here, we provide a proof-of-concept of an integrated approach for analysis of the genome and proteome of Anopheles stephensi, which is one of the most important vectors of the malaria parasite. To achieve broad coverage of genes, we carried out transcriptome sequencing and deep proteome profiling of multiple anatomically distinct sites. Based on transcriptomic data alone, we identified and corrected 535 events of incomplete genome assembly involving 1196 scaffolds and 868 protein-coding gene models. This proteogenomic approach enabled us to add 365 genes that were missed during genome annotation and identify 917 gene correction events through discovery of 151 novel exons, 297 protein extensions, 231 exon extensions, 192 novel protein start sites, 19 novel translational frames, 28 events of joining of exons, and 76 events of joining of adjacent genes as a single gene. Incorporation of proteomic evidence allowed us to change the designation of more than 87 predicted “noncoding RNAs” to conventional mRNAs coded by protein-coding genes. Importantly, extension of the newly corrected genome assemblies and gene models to 15 other newly assembled Anopheline genomes led to the discovery of a large number of apparent discrepancies in assembly and annotation of these genomes. Our data provide a framework for how future genome sequencing efforts should incorporate transcriptomic and proteomic analysis in combination with simultaneous manual curation to achieve near complete assembly and accurate annotation of genomes.
Collapse
Affiliation(s)
- T S Keshava Prasad
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,YU-IOB Center for Systems Biology and Molecular Medicine, Yenepoya University, Mangalore 575018, India.,NIMHANS-IOB Proteomics and Bioinformatics Laboratory, Neurobiology Research Centre, National Institute of Mental Health and Neuro Sciences, Bangalore, Karnataka 560029, India
| | - Ajeet Kumar Mohanty
- National Institute of Malaria Research, Field Station, Goa 403001, India.,Department of Zoology, Goa University, Taleigao Plateau, Goa 403206, India
| | - Manish Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Sreelakshmi K Sreenivasamurthy
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Gourav Dey
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Raja Sekhar Nirujogi
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Centre for Bioinformatics, Pondicherry University, Puducherry 605014, India
| | - Sneha M Pinto
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,YU-IOB Center for Systems Biology and Molecular Medicine, Yenepoya University, Mangalore 575018, India
| | - Anil K Madugundu
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Centre for Bioinformatics, Pondicherry University, Puducherry 605014, India
| | - Arun H Patil
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Jayshree Advani
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Srikanth S Manda
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Centre for Bioinformatics, Pondicherry University, Puducherry 605014, India
| | - Manoj Kumar Gupta
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Sutopa B Dwivedi
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India
| | - Dhanashree S Kelkar
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India
| | - Brantley Hall
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Xiaofang Jiang
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Ashley Peery
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Pavithra Rajagopalan
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Soujanya D Yelamanchi
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Hitendra S Solanki
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Remya Raja
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India
| | - Gajanan J Sathe
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Sandip Chavan
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Renu Verma
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Krishna M Patel
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India
| | - Ankit P Jain
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Nazia Syed
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Department of Biochemistry and Molecular Biology, Pondicherry University, Puducherry 605014, India
| | - Keshava K Datta
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Aafaque Ahmed Khan
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Manjunath Dammalli
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Department of Biotechnology, Siddaganga Institute of Technology, Tumkur, Karnataka 572103, India
| | - Savita Jayaram
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Aneesha Radhakrishnan
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Department of Biochemistry and Molecular Biology, Pondicherry University, Puducherry 605014, India
| | - Christopher J Mitchell
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | - Chan-Hyun Na
- Department of Neurology, Johns Hopkins University, Baltimore, Maryland 21205, USA
| | - Nirbhay Kumar
- Department of Tropical Medicine, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana 70112, USA
| | - Photini Sinnis
- Malaria Research Institute, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA
| | - Igor V Sharakhov
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Charles Wang
- Center for Genomics and Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, California 92350, USA
| | - Harsha Gowda
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,YU-IOB Center for Systems Biology and Molecular Medicine, Yenepoya University, Mangalore 575018, India
| | - Zhijian Tu
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Ashwani Kumar
- National Institute of Malaria Research, Field Station, Goa 403001, India
| | - Akhilesh Pandey
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.,Department of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.,Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.,Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| |
Collapse
|
12
|
Wen B, Xu S, Zhou R, Zhang B, Wang X, Liu X, Xu X, Liu S. PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq. BMC Bioinformatics 2016; 17:244. [PMID: 27316337 PMCID: PMC4912784 DOI: 10.1186/s12859-016-1133-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Accepted: 06/09/2016] [Indexed: 11/27/2022] Open
Abstract
Background Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary. Results A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/, and the example reports are available at http://wenbostar.github.io/PGA/. Conclusions The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1133-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bo Wen
- BGI-Shenzhen, Shenzhen, 518083, China
| | | | - Ruo Zhou
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Bing Zhang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Xiaojing Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Xin Liu
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Siqi Liu
- BGI-Shenzhen, Shenzhen, 518083, China. .,Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
| |
Collapse
|
13
|
Sallou O, Duek PD, Darde TA, Collin O, Lane L, Chalmel F. PepPSy: a web server to prioritize gene products in experimental and biocuration workflows. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw070. [PMID: 27173522 PMCID: PMC4865363 DOI: 10.1093/database/baw070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Accepted: 04/13/2016] [Indexed: 12/03/2022]
Abstract
Among the 20 000 human gene products predicted from genome annotation, about 3000 still lack validation at protein level. We developed PepPSy, a user-friendly gene expression-based prioritization system, to help investigators to determine in which human tissues they should look for an unseen protein. PepPSy can also be used by biocurators to revisit the annotation of specific categories of proteins based on the ‘omics’ data housed by the system. In this study, it was used to prioritize 21 dubious protein-coding genes among the 616 annotated in neXtProt for reannotation. PepPSy is freely available at http://peppsy.genouest.org. Database URL:http://peppsy.genouest.org.
Collapse
Affiliation(s)
- Olivier Sallou
- Genouest Bioinformatics Platform, IRISA, Campus de Beaulieu, Rennes 35042, France
| | - Paula D Duek
- CALIPHO Group, SIB Swiss Institute of Bioinformatics, CMU, Michel Servet 1, Geneva 1211, Switzerland
| | - Thomas A Darde
- Genouest Bioinformatics Platform, IRISA, Campus de Beaulieu, Rennes 35042, France IRSET, Inserm U1085, 9 avenue du Professeur Léon Bernard, Rennes 35000, France
| | - Olivier Collin
- Genouest Bioinformatics Platform, IRISA, Campus de Beaulieu, Rennes 35042, France
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics, CMU, Michel Servet 1, Geneva 1211, Switzerland Department of Human Protein Sciences, Faculty of Medicine, University of Geneva, CMU, Michel Servet 1, Geneva 1211, Switzerland
| | - Frédéric Chalmel
- IRSET, Inserm U1085, 9 avenue du Professeur Léon Bernard, Rennes 35000, France
| |
Collapse
|
14
|
Xiong Y, Guo Y, Xiao W, Cao Q, Li S, Qi X, Zhang Z, Wang Q, Shui W. An NGS-Independent Strategy for Proteome-Wide Identification of Single Amino Acid Polymorphisms by Mass Spectrometry. Anal Chem 2016; 88:2784-91. [PMID: 26810586 DOI: 10.1021/acs.analchem.5b04417] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Detection of proteins containing single amino acid polymorphisms (SAPs) encoded by nonsynonymous SNPs (nsSNPs) can aid researchers in studying the functional significance of protein variants. Most proteogenomic approaches for large-scale SAPs mapping require construction of a sample-specific database containing protein variants predicted from the next-generation sequencing (NGS) data. Searching shotgun proteomic data sets against these NGS-derived databases allowed for identification of SAP peptides, thus validating the proteome-level sequence variation. Contrary to the conventional approaches, our study presents a novel strategy for proteome-wide SAP detection without relying on sample-specific NGS data. By searching a deep-coverage proteomic data set from an industrial thermotolerant yeast strain using our strategy, we identified 337 putative SAPs compared to the reference genome. Among the SAP peptides identified with stringent criteria, 85.2% of SAP sites were validated using whole-genome sequencing data obtained for this organism, which indicates high accuracy of SAP identification with our strategy. More interestingly, for certain SAP peptides that cannot be predicted by genomic sequencing, we used synthetic peptide standards to verify expression of peptide variants in the proteome. Our study has provided a unique tool for proteogenomics to enable proteome-wide direct SAP identification and capture nongenetic protein variants not linked to nsSNPs.
Collapse
Affiliation(s)
- Yun Xiong
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Yufeng Guo
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Weidi Xiao
- College of Life Sciences, Nankai University , Tianjin 300071, China
| | - Qichen Cao
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Shanshan Li
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Xianni Qi
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Zhidan Zhang
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Qinhong Wang
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| | - Wenqing Shui
- Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , Tianjin 300308, China
| |
Collapse
|
15
|
Santin I, Dos Santos RS, Eizirik DL. Pancreatic Beta Cell Survival and Signaling Pathways: Effects of Type 1 Diabetes-Associated Genetic Variants. Methods Mol Biol 2016; 1433:21-54. [PMID: 26936771 DOI: 10.1007/7651_2015_291] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Type 1 diabetes (T1D) is a complex autoimmune disease in which pancreatic beta cells are specifically destroyed by the immune system. The disease has an important genetic component and more than 50 loci across the genome have been associated with risk of developing T1D. The molecular mechanisms by which these putative T1D candidate genes modulate disease risk, however, remain poorly characterized and little is known about their effects in pancreatic beta cells. Functional studies in in vitro models of pancreatic beta cells, based on techniques to inhibit or overexpress T1D candidate genes, allow the functional characterization of several T1D candidate genes. This requires a multistage procedure comprising two major steps, namely accurate selection of genes of potential interest and then in vitro and/or in vivo mechanistic approaches to characterize their role in pancreatic beta cell dysfunction and death in T1D. This chapter details the methods and settings used by our groups to characterize the role of T1D candidate genes on pancreatic beta cell survival and signaling pathways, with particular focus on potentially relevant pathways in the pathogenesis of T1D, i.e., inflammation and innate immune responses, apoptosis, beta cell metabolism and function.
Collapse
Affiliation(s)
- Izortze Santin
- ULB Center for Diabetes Research, Medical Faculty, Université Libre de Bruxelles (ULB), Brussels, Belgium.
- Endocrinology and Diabetes Research Group, BioCruces Health Research Institute, CIBERDEM, Spain.
| | - Reinaldo S Dos Santos
- ULB Center for Diabetes Research, Medical Faculty, Université Libre de Bruxelles (ULB), Brussels, Belgium
| | - Decio L Eizirik
- ULB Center for Diabetes Research, Medical Faculty, Université Libre de Bruxelles (ULB), Brussels, Belgium
| |
Collapse
|
16
|
Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome. Sci Rep 2015; 5:18019. [PMID: 26658305 PMCID: PMC4676012 DOI: 10.1038/srep18019] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Accepted: 11/10/2015] [Indexed: 01/24/2023] Open
Abstract
High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
Collapse
|
17
|
Zhou T, Sha J, Guo X. The need to revisit published data: A concept and framework for complementary proteomics. Proteomics 2015; 16:6-11. [PMID: 26552962 DOI: 10.1002/pmic.201500170] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Revised: 08/26/2015] [Accepted: 11/04/2015] [Indexed: 12/14/2022]
Abstract
Tandem proteomic strategies based on large-scale and high-resolution mass spectrometry have been widely applied in various biomedical studies. However, protein sequence databases and proteomic software are continuously updated. Proteomic studies should not be ended with a stable list of proteins. It is necessary and beneficial to regularly revise the results. Besides, the original proteomic studies usually focused on a limited aspect of protein information and valuable information may remain undiscovered in the raw spectra. Several studies have reported novel findings by reanalyzing previously published raw data. However, there are still no standard guidelines for comprehensive reanalysis. In the present study, we proposed the concept and draft framework for complementary proteomics, which are aimed to revise protein list or mine new discoveries by revisiting published data.
Collapse
Affiliation(s)
- Tao Zhou
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, P. R. China
| | - Jiahao Sha
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, P. R. China
| | - Xuejiang Guo
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, P. R. China
| |
Collapse
|
18
|
Xu X, Liu T, Ren X, Liu B, Yang J, Chen L, Wei C, Zheng J, Dong J, Sun L, Zhu Y, Jin Q. Proteogenomic Analysis of Trichophyton rubrum Aided by RNA Sequencing. J Proteome Res 2015; 14:2207-18. [PMID: 25868943 DOI: 10.1021/acs.jproteome.5b00009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Infections caused by dermatophytes, Trichophyton rubrum in particular, are among the most common diseases in humans. In this study, we present a proteogenomic analysis of T. rubrum based on whole-genome proteomics and RNA-Seq studies. We confirmed 4291 expressed proteins in T. rubrum and validated their annotated gene structures based on 35 874 supporting peptides. In addition, we identified 323 novel peptides (not present in the current annotated protein database of T. rubrum) that can be used to enhance current T. rubrum annotations. A total of 104 predicted genes supported by novel peptides were identified, and 127 gene models suggested by the novel peptides that conflicted with existing annotations were manually assigned based on transcriptomic evidence. RNA-Seq confirmed the validity of 95% of the total peptides. Our study provides evidence that confirms and improves the genome annotation of T. rubrum and represents the first survey of T. rubrum genome annotations based on experimental evidence. Additionally, our integrated proteomics and multisourced transcriptomics approach provides stronger evidence for annotation refinement than proteomic data alone, which helps to address the dilemma of one-hit wonders (uncertainties supported by only one peptide).
Collapse
|
19
|
Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nat Methods 2015; 11:1114-25. [PMID: 25357241 DOI: 10.1038/nmeth.3144] [Citation(s) in RCA: 533] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 09/22/2014] [Indexed: 12/19/2022]
Abstract
Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry-based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models. In recent years, owing to the emergence of new sequencing technologies such as RNA-seq and dramatic improvements in the depth and throughput of mass spectrometry-based proteomics, the pace of proteogenomic research has greatly accelerated. Here I review the current state of proteogenomic methods and applications, including computational strategies for building and using customized protein sequence databases. I also draw attention to the challenge of false positive identifications in proteogenomics and provide guidelines for analyzing the data and reporting the results of proteogenomic studies.
Collapse
Affiliation(s)
- Alexey I Nesvizhskii
- 1] Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
20
|
Wang X, Liu Q, Zhang B. Leveraging the complementary nature of RNA-Seq and shotgun proteomics data. Proteomics 2014; 14:2676-87. [PMID: 25266668 DOI: 10.1002/pmic.201400184] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 08/22/2014] [Accepted: 09/25/2014] [Indexed: 12/22/2022]
Abstract
RNA sequencing (RNA-Seq) and MS-based shotgun proteomics are powerful high-throughput technologies for identifying and quantifying RNA transcripts and proteins, respectively. With the increasing affordability of these technologies, many projects have started to apply both to the same samples to achieve a more comprehensive understanding of biological systems. A major analytical challenge for such integrative projects is how to effectively leverage the complementary nature of RNA-Seq and shotgun proteomics data. RNA-Seq provides comprehensive information on mRNA abundance, alternative splicing, nucleotide variation, and structure alteration. Sample-specific protein databases derived from RNA-Seq data can better approximate the real protein pools in cell and tissue samples and thus improve protein identification. Meanwhile, proteomics data provide essential confirmation of the validity and functional relevance of novel findings from RNA-Seq data. At the quantitative level, mRNA and protein levels are only modestly correlated, suggesting strong involvement of posttranscriptional regulation in controlling gene expression. Here, we review recent studies at the interface of RNA-Seq and proteomics data. We discuss goals, accomplishments, and challenges in RNA-Seq-based proteogenomics. We also examine the current status and future potential of parallel transcriptome and proteome quantification in revealing posttranscriptional regulatory mechanisms.
Collapse
Affiliation(s)
- Xiaojing Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN
| | | | | |
Collapse
|
21
|
Chocu S, Evrard B, Lavigne R, Rolland AD, Aubry F, Jégou B, Chalmel F, Pineau C. Forty-four novel protein-coding loci discovered using a proteomics informed by transcriptomics (PIT) approach in rat male germ cells. Biol Reprod 2014; 91:123. [PMID: 25210130 DOI: 10.1095/biolreprod.114.122416] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Spermatogenesis is a complex process, dependent upon the successive activation and/or repression of thousands of gene products, and ends with the production of haploid male gametes. RNA sequencing of male germ cells in the rat identified thousands of novel testicular unannotated transcripts (TUTs). Although such RNAs are usually annotated as long noncoding RNAs (lncRNAs), it is possible that some of these TUTs code for protein. To test this possibility, we used a "proteomics informed by transcriptomics" (PIT) strategy combining RNA sequencing data with shotgun proteomics analyses of spermatocytes and spermatids in the rat. Among 3559 TUTs and 506 lncRNAs found in meiotic and postmeiotic germ cells, 44 encoded at least one peptide. We showed that these novel high-confidence protein-coding loci exhibit several genomic features intermediate between those of lncRNAs and mRNAs. We experimentally validated the testicular expression pattern of two of these novel protein-coding gene candidates, both highly conserved in mammals: one for a vesicle-associated membrane protein we named VAMP-9, and the other for an enolase domain-containing protein. This study confirms the potential of PIT approaches for the discovery of protein-coding transcripts initially thought to be untranslated or unknown transcripts. Our results contribute to the understanding of spermatogenesis by characterizing two novel proteins, implicated by their strong expression in germ cells. The mass spectrometry proteomics data have been deposited with the ProteomeXchange Consortium under the data set identifier PXD000872.
Collapse
Affiliation(s)
- Sophie Chocu
- Proteomics Core Facility Biogenouest, Inserm U1085, IRSET, Campus de Beaulieu, Rennes, France Inserm U1085, IRSET, Université de Rennes 1, Rennes, France
| | | | - Régis Lavigne
- Proteomics Core Facility Biogenouest, Inserm U1085, IRSET, Campus de Beaulieu, Rennes, France Inserm U1085, IRSET, Université de Rennes 1, Rennes, France
| | | | - Florence Aubry
- Inserm U1085, IRSET, Université de Rennes 1, Rennes, France
| | - Bernard Jégou
- Inserm U1085, IRSET, Université de Rennes 1, Rennes, France
| | | | - Charles Pineau
- Proteomics Core Facility Biogenouest, Inserm U1085, IRSET, Campus de Beaulieu, Rennes, France Inserm U1085, IRSET, Université de Rennes 1, Rennes, France
| |
Collapse
|