1
|
Zhao T, Cheng F, Zhan D, Li J, Zheng C, Lu Y, Qin W, Liu Z. The Glomerulus Multiomics Analysis Provides Deeper Insights into Diabetic Nephropathy. J Proteome Res 2023. [PMID: 37191251 DOI: 10.1021/acs.jproteome.2c00794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Although diabetic nephropathy (DN) is the leading cause of the end-stage renal disease, the exact regulation mechanisms remain unknown. In this study, we integrated the transcriptomics and proteomics profiles of glomeruli isolated from 50 biopsy-proven DN patients and 25 controls to investigate the latest findings about DN pathogenesis. First, 1152 genes exhibited differential expression at the mRNA or protein level, and 364 showed significant association. These strong correlated genes were divided into four different functional modules. Moreover, a regulatory network of the transcription factors (TFs)-target genes (TGs) was constructed, with 30 TFs upregulated at the protein levels and 265 downstream TGs differentially expressed at the mRNA levels. These TFs are the integration centers of several signal transduction pathways and have tremendous therapeutic potential for regulating the aberrant production of TGs and the pathological process of DN. Furthermore, 29 new DN-specific splice-junction peptides were discovered with high confidence; these peptides may play novel functions in the pathological course of DN. So, our in-depth integrative transcriptomics-proteomics analysis provided deeper insights into the pathogenesis of DN and opened the potential avenue for finding new therapeutic interventions. MS raw files were deposited into the proteomeXchange with the dataset identifier PXD040617.
Collapse
Affiliation(s)
- Tingting Zhao
- National Clinical Research Center of Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, Jiangsu 210002, China
| | - Fang Cheng
- Department of Bioinformatics, Beijing Pineal Diagnostics Co., Ltd., Beijing 102206, China
| | - Dongdong Zhan
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Jin'e Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Chunxia Zheng
- National Clinical Research Center of Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, Jiangsu 210002, China
| | - Yinghui Lu
- National Clinical Research Center of Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, Jiangsu 210002, China
| | - Weisong Qin
- National Clinical Research Center of Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, Jiangsu 210002, China
| | - Zhihong Liu
- National Clinical Research Center of Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, Jiangsu 210002, China
| |
Collapse
|
2
|
Qin G, Liu Z, Xie L. Multiple Omics Data Integration. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11508-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
3
|
Li Y, Wang G, Tan X, Ouyang J, Zhang M, Song X, Liu Q, Leng Q, Chen L, Xie L. ProGeo-neo: a customized proteogenomic workflow for neoantigen prediction and selection. BMC Med Genomics 2020; 13:52. [PMID: 32241270 PMCID: PMC7118832 DOI: 10.1186/s12920-020-0683-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Neoantigens can be differentially recognized by T cell receptor (TCR) as these sequences are derived from mutant proteins and are unique to the tumor. The discovery of neoantigens is the first key step for tumor-specific antigen (TSA) based immunotherapy. Based on high-throughput tumor genomic analysis, each missense mutation can potentially give rise to multiple neopeptides, resulting in a vast total number, but only a small percentage of these peptides may achieve immune-dominant status with a given major histocompatibility complex (MHC) class I allele. Specific identification of immunogenic candidate neoantigens is consequently a major challenge. Currently almost all neoantigen prediction tools are based on genomics data. RESULTS Here we report the construction of proteogenomics prediction of neoantigen (ProGeo-neo) pipeline, which incorporates the following modules: mining tumor specific antigens from next-generation sequencing genomic and mRNA expression data, predicting the binding mutant peptides to class I MHC molecules by latest netMHCpan (v.4.0), verifying MHC-peptides by MaxQuant with mass spectrometry proteomics data searched against customized protein database, and checking potential immunogenicity of T-cell-recognization by additional screening methods. ProGeo-neo pipeline achieves proteogenomics strategy and the neopeptides identified were of much higher quality as compared to those identified using genomic data only. CONCLUSIONS The pipeline was constructed based on the genomics and proteomics data of Jurkat leukemia cell line but is generally applicable to other solid cancer research. With massively parallel sequencing and proteomics profiling increasing, this proteogenomics workflow should be useful for neoantigen oriented research and immunotherapy.
Collapse
Affiliation(s)
- Yuyu Li
- Key Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation (Shanghai), China Ministry of Agriculture; College of Food Science and Technology, Shanghai Ocean University, 999 Hu Cheng Huan Road, Shanghai, 201306, China.,Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China
| | - Guangzhi Wang
- Key Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation (Shanghai), China Ministry of Agriculture; College of Food Science and Technology, Shanghai Ocean University, 999 Hu Cheng Huan Road, Shanghai, 201306, China.,Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China
| | - Xiaoxiu Tan
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China
| | - Jian Ouyang
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China
| | - Menghuan Zhang
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China
| | - Xiaofeng Song
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Qi Liu
- Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 20009, China
| | - Qibin Leng
- Affiliated Cancer Hospital & Institute of Guangzhou Medical University, 78 Heng Zhi Gang, Lu Hu Road, Guangzhou, 510095, China
| | - Lanming Chen
- Key Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation (Shanghai), China Ministry of Agriculture; College of Food Science and Technology, Shanghai Ocean University, 999 Hu Cheng Huan Road, Shanghai, 201306, China.
| | - Lu Xie
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Keyuan Road, Shanghai, 201203, China.
| |
Collapse
|
4
|
Zhu Y, Engström PG, Tellgren-Roth C, Baudo CD, Kennell JC, Sun S, Billmyre RB, Schröder MS, Andersson A, Holm T, Sigurgeirsson B, Wu G, Sankaranarayanan SR, Siddharthan R, Sanyal K, Lundeberg J, Nystedt B, Boekhout T, Dawson TL, Heitman J, Scheynius A, Lehtiö J. Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis. Nucleic Acids Res 2017; 45:2629-2643. [PMID: 28100699 PMCID: PMC5389616 DOI: 10.1093/nar/gkx006] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 01/16/2017] [Indexed: 11/23/2022] Open
Abstract
Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies.
Collapse
Affiliation(s)
- Yafeng Zhu
- Science for Life Laboratory, Department of Oncology-Pathology, Karolinska Institutet, 17121 Solna, Sweden
| | - Pär G Engström
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 17121 Solna, Sweden
| | - Christian Tellgren-Roth
- National Genomics Infrastructure, Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 75108 Uppsala, Sweden
| | - Charles D Baudo
- Department of Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - John C Kennell
- Department of Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Sheng Sun
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - R Blake Billmyre
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Markus S Schröder
- School of Biomedical and Biomolecular Science, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Anna Andersson
- Department of Medicine Solna, Translational Immunology Unit, Karolinska Institutet and University Hospital, 17177 Stockholm, Sweden
| | - Tina Holm
- Department of Medicine Solna, Translational Immunology Unit, Karolinska Institutet and University Hospital, 17177 Stockholm, Sweden
| | - Benjamin Sigurgeirsson
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology, 17121 Solna, Sweden
| | - Guangxi Wu
- Computational and Systems Biology, Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), 138672, Singapore
| | - Sundar Ram Sankaranarayanan
- Molecular Mycology Laboratory, Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore 560 064, India
| | - Rahul Siddharthan
- The Institute of Mathematical Sciences/HBNI, Taramani, Chennai 600 113, India
| | - Kaustuv Sanyal
- Molecular Mycology Laboratory, Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore 560 064, India
| | - Joakim Lundeberg
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology, 17121 Solna, Sweden
| | - Björn Nystedt
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 75123 Uppsala, Sweden
| | - Teun Boekhout
- CBS-Fungal Biodiversity Centre, Utrecht, 3508, The Netherlands and Institute for Biodiversity and ecosystem Dynamics (IBED), University of Amsterdam, 1012 WX Amsterdam, The Netherlands
| | - Thomas L Dawson
- Institute of Medical Biology, Agency for Science, Technology and Research (A*STAR), 138648, Singapore
| | - Joseph Heitman
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Annika Scheynius
- Science for Life Laboratory, Department of Clinical Science and Education, Karolinska Institutet, and Sachs' Children and Youth Hospital, Södersjukhuset, SE-118 83 Stockholm, Sweden
| | - Janne Lehtiö
- Science for Life Laboratory, Department of Oncology-Pathology, Karolinska Institutet, 17121 Solna, Sweden
| |
Collapse
|
5
|
Liu Y, Gonzàlez-Porta M, Santos S, Brazma A, Marioni JC, Aebersold R, Venkitaraman AR, Wickramasinghe VO. Impact of Alternative Splicing on the Human Proteome. Cell Rep 2017; 20:1229-1241. [PMID: 28768205 PMCID: PMC5554779 DOI: 10.1016/j.celrep.2017.07.025] [Citation(s) in RCA: 123] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 06/02/2017] [Accepted: 07/12/2017] [Indexed: 02/02/2023] Open
Abstract
Alternative splicing is a critical determinant of genome complexity and, by implication, is assumed to engender proteomic diversity. This notion has not been experimentally tested in a targeted, quantitative manner. Here, we have developed an integrative approach to ask whether perturbations in mRNA splicing patterns alter the composition of the proteome. We integrate RNA sequencing (RNA-seq) (to comprehensively report intron retention, differential transcript usage, and gene expression) with a data-independent acquisition (DIA) method, SWATH-MS (sequential window acquisition of all theoretical spectra-mass spectrometry), to capture an unbiased, quantitative snapshot of the impact of constitutive and alternative splicing events on the proteome. Whereas intron retention is accompanied by decreased protein abundance, alterations in differential transcript usage and gene expression alter protein abundance proportionate to transcript levels. Our findings illustrate how RNA splicing links isoform expression in the human transcriptome with proteomic diversity and provides a foundation for studying perturbations associated with human diseases.
Collapse
Affiliation(s)
- Yansheng Liu
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Mar Gonzàlez-Porta
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Sergio Santos
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - John C Marioni
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
| | - Ashok R Venkitaraman
- The Medical Research Council Cancer Unit, University of Cambridge, Cambridge CB2 0XZ, UK.
| | - Vihandha O Wickramasinghe
- The Medical Research Council Cancer Unit, University of Cambridge, Cambridge CB2 0XZ, UK; RNA Biology and Cancer Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC 3000, Australia.
| |
Collapse
|
6
|
Ruggles KV, Tang Z, Wang X, Grover H, Askenazi M, Teubl J, Cao S, McLellan MD, Clauser KR, Tabb DL, Mertins P, Slebos R, Erdmann-Gilmore P, Li S, Gunawardena HP, Xie L, Liu T, Zhou JY, Sun S, Hoadley KA, Perou CM, Chen X, Davies SR, Maher CA, Kinsinger CR, Rodland KD, Zhang H, Zhang Z, Ding L, Townsend RR, Rodriguez H, Chan D, Smith RD, Liebler DC, Carr SA, Payne S, Ellis MJ, Fenyő D. An Analysis of the Sensitivity of Proteogenomic Mapping of Somatic Mutations and Novel Splicing Events in Cancer. Mol Cell Proteomics 2015; 15:1060-71. [PMID: 26631509 DOI: 10.1074/mcp.m115.056226] [Citation(s) in RCA: 90] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Indexed: 11/06/2022] Open
Abstract
Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations, and splice variants identified in cancer cells are translated. Herein, we apply a proteogenomic data integration tool (QUILTS) to illustrate protein variant discovery using whole genome, whole transcriptome, and global proteome datasets generated from a pair of luminal and basal-like breast-cancer-patient-derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS sample process replicates defined here as an independent tandem MS experiment using identical sample material. Despite analysis of over 30 sample process replicates, only about 10% of SNVs (somatic and germline) detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNVs without a detectable mRNA transcript were also observed, suggesting that transcriptome coverage was incomplete (∼80%). In contrast to germline variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than in the luminal tumor, raising the possibility of differential translation or protein degradation effects. In conclusion, this large-scale proteogenomic integration allowed us to determine the degree to which mutations are translated and identify gaps in sequence coverage, thereby benchmarking current technology and progress toward whole cancer proteome and transcriptome analysis.
Collapse
Affiliation(s)
- Kelly V Ruggles
- From the ‡New York University School of Medicine, New York, NY
| | - Zuojian Tang
- From the ‡New York University School of Medicine, New York, NY
| | - Xuya Wang
- From the ‡New York University School of Medicine, New York, NY
| | - Himanshu Grover
- From the ‡New York University School of Medicine, New York, NY
| | | | - Jennifer Teubl
- From the ‡New York University School of Medicine, New York, NY
| | - Song Cao
- ¶Washington University in St. Louis, St. Louis, MO
| | | | | | - David L Tabb
- **Vanderbilt University School of Medicine, Nashville, TN
| | | | - Robbert Slebos
- **Vanderbilt University School of Medicine, Nashville, TN
| | | | - Shunqiang Li
- ¶Washington University in St. Louis, St. Louis, MO
| | | | - Ling Xie
- ‡‡Universtiy of North Carolina School of Medicine, Chapel Hill, NC
| | - Tao Liu
- §§Pacific Northwest National Laboratory, Richland, WA
| | | | | | | | - Charles M Perou
- ‡‡Universtiy of North Carolina School of Medicine, Chapel Hill, NC
| | - Xian Chen
- ‡‡Universtiy of North Carolina School of Medicine, Chapel Hill, NC
| | | | | | | | | | - Hui Zhang
- ¶¶Johns Hopkins University, Baltimore, MD
| | - Zhen Zhang
- ¶¶Johns Hopkins University, Baltimore, MD
| | - Li Ding
- ¶Washington University in St. Louis, St. Louis, MO
| | | | - Henry Rodriguez
- ‖‖Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD
| | | | | | | | | | - Samuel Payne
- §§Pacific Northwest National Laboratory, Richland, WA;
| | | | - David Fenyő
- From the ‡New York University School of Medicine, New York, NY;
| |
Collapse
|
7
|
Sun H, Chen C, Shi M, Wang D, Liu M, Li D, Yang P, Li Y, Xie L. Integration of mass spectrometry and RNA-Seq data to confirm human ab initio predicted genes and lncRNAs. Proteomics 2015; 14:2760-8. [PMID: 25339270 DOI: 10.1002/pmic.201400174] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Revised: 09/22/2014] [Accepted: 10/16/2014] [Indexed: 12/14/2022]
Abstract
MS/MS has been used to improve genome annotation in various organisms. The classical approach is to construct comprehensive theoretical peptide database with six frame translation model from the whole ORF of a genome and search against this database with real MS/MS spectra. In this work we took a more focused approach, we constructed a database containing only peptides from the ab initio predicted genes from current human genome annotation, and all theoretical peptides from currently annotated lncRNAs, and searched such a database with MS/MS data from human Hela cell line. The purpose of this design is to find translation evidence for ab initio predicted genes and to rule out possible wrongly defined lncRNAs in a systematic proteogenomics effort. To validate proteogenomics results, we integrated RNA-Seq data analysis for the same Hela cell line which generated MS/MS data, and performed MRM experiment on self-cultured Hela cell line samples. Six peptides were found to support ab initio predicted genes with both RNA-Seq and MRM validations, while none was found to support a translated lncRNA. This workflow could be flexibly applied to other human samples and datasets to help further improve human gene annotation.
Collapse
Affiliation(s)
- Han Sun
- Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, Shanghai, China; Key Laboratory of Systems Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Xu X, Liu T, Ren X, Liu B, Yang J, Chen L, Wei C, Zheng J, Dong J, Sun L, Zhu Y, Jin Q. Proteogenomic Analysis of Trichophyton rubrum Aided by RNA Sequencing. J Proteome Res 2015; 14:2207-18. [PMID: 25868943 DOI: 10.1021/acs.jproteome.5b00009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Infections caused by dermatophytes, Trichophyton rubrum in particular, are among the most common diseases in humans. In this study, we present a proteogenomic analysis of T. rubrum based on whole-genome proteomics and RNA-Seq studies. We confirmed 4291 expressed proteins in T. rubrum and validated their annotated gene structures based on 35 874 supporting peptides. In addition, we identified 323 novel peptides (not present in the current annotated protein database of T. rubrum) that can be used to enhance current T. rubrum annotations. A total of 104 predicted genes supported by novel peptides were identified, and 127 gene models suggested by the novel peptides that conflicted with existing annotations were manually assigned based on transcriptomic evidence. RNA-Seq confirmed the validity of 95% of the total peptides. Our study provides evidence that confirms and improves the genome annotation of T. rubrum and represents the first survey of T. rubrum genome annotations based on experimental evidence. Additionally, our integrated proteomics and multisourced transcriptomics approach provides stronger evidence for annotation refinement than proteomic data alone, which helps to address the dilemma of one-hit wonders (uncertainties supported by only one peptide).
Collapse
|
9
|
Sun H, Chen C, Lian B, Zhang M, Wang X, Zhang B, Li Y, Yang P, Xie L. Identification of HPV Integration and Gene Mutation in HeLa Cell Line by Integrated Analysis of RNA-Seq and MS/MS Data. J Proteome Res 2015; 14:1678-86. [PMID: 25698088 DOI: 10.1021/pr500944c] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Han Sun
- Shanghai
Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Ke Yuan Road, Shanghai 201203, China
- Key
Laboratory of Systems Biology, Shanghai Institutes for Biological
Science, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, China
| | - Chen Chen
- Department
of Chemistry, Institutes of Biomedical Sciences, Fudan University, 138
Yixueyuan Road, Shanghai, 200433, China
| | - Baofeng Lian
- Shanghai
Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Ke Yuan Road, Shanghai 201203, China
| | - Menghuan Zhang
- Shanghai
Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Ke Yuan Road, Shanghai 201203, China
| | - Xiaojing Wang
- Department
of Biomedical Informatics, Vanderbilt University School of Medicine, 2525
West End Avenue, Nashville, Tennessee 37232, United States
| | - Bing Zhang
- Department
of Biomedical Informatics, Vanderbilt University School of Medicine, 2525
West End Avenue, Nashville, Tennessee 37232, United States
| | - Yixue Li
- Shanghai
Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Ke Yuan Road, Shanghai 201203, China
- Key
Laboratory of Systems Biology, Shanghai Institutes for Biological
Science, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, China
| | - Pengyuan Yang
- Department
of Chemistry, Institutes of Biomedical Sciences, Fudan University, 138
Yixueyuan Road, Shanghai, 200433, China
| | - Lu Xie
- Shanghai
Center for Bioinformation Technology, Shanghai Academy of Science and Technology, 1278 Ke Yuan Road, Shanghai 201203, China
| |
Collapse
|
10
|
Onco-proteogenomics: cancer proteomics joins forces with genomics. Nat Methods 2015; 11:1107-13. [PMID: 25357240 DOI: 10.1038/nmeth.3138] [Citation(s) in RCA: 106] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 06/26/2014] [Indexed: 12/21/2022]
Abstract
The complexities of tumor genomes are rapidly being uncovered, but how they are regulated into functional proteomes remains poorly understood. Standard proteomics workflows use databases of known proteins, but these databases do not capture the uniqueness of the cancer transcriptome, with its point mutations, unusual splice variants and gene fusions. Onco-proteogenomics integrates mass spectrometry-generated data with genomic information to identify tumor-specific peptides. Linking tumor-derived DNA, RNA and protein measurements into a central-dogma perspective has the potential to improve our understanding of cancer biology.
Collapse
|
11
|
Rapid development of proteomics in China: from the perspective of the Human Liver Proteome Project and technology development. SCIENCE CHINA-LIFE SCIENCES 2014; 57:1162-71. [PMID: 25119674 DOI: 10.1007/s11427-014-4714-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 07/01/2014] [Indexed: 12/17/2022]
|
12
|
Wang X, Zhang B. Integrating genomic, transcriptomic, and interactome data to improve Peptide and protein identification in shotgun proteomics. J Proteome Res 2014; 13:2715-23. [PMID: 24792918 PMCID: PMC4059263 DOI: 10.1021/pr500194t] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
![]()
Mass spectrometry (MS)-based shotgun
proteomics is an effective
technology for global proteome profiling. The ultimate goal is to
assign tandem MS spectra to peptides and subsequently infer proteins
and their abundance. In addition to database searching and protein
assembly algorithms, computational approaches have been developed
to integrate genomic, transcriptomic, and interactome information
to improve peptide and protein identification. Earlier efforts focus
primarily on making databases more comprehensive using publicly available
genomic and transcriptomic data. More recently, with the increasing
affordability of the Next Generation Sequencing (NGS) technologies,
personalized protein databases derived from sample-specific genomic
and transcriptomic data have emerged as an attractive strategy. In
addition, incorporating interactome data not only improves protein
identification but also puts identified proteins into their functional
context and thus facilitates data interpretation. In this paper, we
survey the major integrative bioinformatics approaches that have been
developed during the past decade and discuss their merits and demerits.
Collapse
Affiliation(s)
- Xiaojing Wang
- Department of Biomedical Informatics, ‡Vanderbilt-Ingram Cancer Center, and §Department of Cancer Biology, Vanderbilt University School of Medicine , Nashville, Tennessee 37232, United States
| | | |
Collapse
|
13
|
Sun H, Xing X, Li J, Zhou F, Chen Y, He Y, Li W, Wei G, Chang X, Jia J, Li Y, Xie L. Identification of gene fusions from human lung cancer mass spectrometry data. BMC Genomics 2013; 14 Suppl 8:S5. [PMID: 24564548 PMCID: PMC4042237 DOI: 10.1186/1471-2164-14-s8-s5] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Tandem mass spectrometry (MS/MS) technology has been applied to identify proteins, as an ultimate approach to confirm the original genome annotation. To be able to identify gene fusion proteins, a special database containing peptides that cross over gene fusion breakpoints is needed. Methods It is impractical to construct a database that includes all possible fusion peptides originated from potential breakpoints. Focusing on 6259 reported and predicted gene fusion pairs from ChimerDB 2.0 and Cancer Gene Census, we for the first time created a database CanProFu that comprehensively annotates fusion peptides formed by exon-exon linkage between these pairing genes. Results Applying this database to mass spectrometry datasets of 40 human non-small cell lung cancer (NSCLC) samples and 39 normal lung samples with stringent searching criteria, we were able to identify 19 unique fusion peptides characterizing gene fusion events. Among them 11 gene fusion events were only found in NSCLC samples. And also, 4 alternative splicing events were characterized in cancerous or normal lung samples. Conclusions The database and workflow in this work can be flexibly applied to other MS/MS based human cancer experiments to detect gene fusions as potential disease biomarkers or drug targets.
Collapse
|
14
|
|
15
|
Sheynkman GM, Shortreed MR, Frey BL, Smith LM. Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol Cell Proteomics 2013; 12:2341-53. [PMID: 23629695 DOI: 10.1074/mcp.o113.028142] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Human proteomic databases required for MS peptide identification are frequently updated and carefully curated, yet are still incomplete because it has been challenging to acquire every protein sequence from the diverse assemblage of proteoforms expressed in every tissue and cell type. In particular, alternative splicing has been shown to be a major source of this cell-specific proteomic variation. Many new alternative splice forms have been detected at the transcript level using next generation sequencing methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of next generation sequencing methods, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splice-junction peptides. Eighty million paired-end Illumina reads and ∼500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching. Based on the RefSeq gene models, we detected 136,123 annotated and 144,818 unannotated transcript junctions. Of those, 24,834 unannotated junctions passed various quality filters (e.g. minimum read depth) and these entries were translated into 33,589 polypeptide sequences and used for database searching. We discovered 57 splice junction peptides not present in the Uniprot-Trembl proteomic database comprising an array of different splicing events, including skipped exons, alternative donors and acceptors, and noncanonical transcriptional start sites. To our knowledge this is the first example of using sample-specific RNA-Seq data to create a splice-junction database and discover new peptides resulting from alternative splicing.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave., Madison, Wisconsin 53706, USA
| | | | | | | |
Collapse
|
16
|
Abstract
A newcomer to the -omics era, proteomics, is a broad instrument-intensive research area that has advanced rapidly since its inception less than 20 years ago. Although the 'wet-bench' aspects of proteomics have undergone a renaissance with the improvement in protein and peptide separation techniques, including various improvements in two-dimensional gel electrophoresis and gel-free or off-gel protein focusing, it has been the seminal advances in MS that have led to the ascension of this field. Recent improvements in sensitivity, mass accuracy and fragmentation have led to achievements previously only dreamed of, including whole-proteome identification, and quantification and extensive mapping of specific PTMs (post-translational modifications). With such capabilities at present, one might conclude that proteomics has already reached its zenith; however, 'capability' indicates that the envisioned goals have not yet been achieved. In the present review we focus on what we perceive as the areas requiring more attention to achieve the improvements in workflow and instrumentation that will bridge the gap between capability and achievement for at least most proteomes and PTMs. Additionally, it is essential that we extend our ability to understand protein structures, interactions and localizations. Towards these ends, we briefly focus on selected methods and research areas where we anticipate the next wave of proteomic advances.
Collapse
|
17
|
Translational plant proteomics: a perspective. J Proteomics 2012; 75:4588-601. [PMID: 22516432 DOI: 10.1016/j.jprot.2012.03.055] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2011] [Revised: 02/25/2012] [Accepted: 03/25/2012] [Indexed: 11/21/2022]
Abstract
Translational proteomics is an emerging sub-discipline of the proteomics field in the biological sciences. Translational plant proteomics aims to integrate knowledge from basic sciences to translate it into field applications to solve issues related but not limited to the recreational and economic values of plants, food security and safety, and energy sustainability. In this review, we highlight the substantial progress reached in plant proteomics during the past decade which has paved the way for translational plant proteomics. Increasing proteomics knowledge in plants is not limited to model and non-model plants, proteogenomics, crop improvement, and food analysis, safety, and nutrition but to many more potential applications. Given the wealth of information generated and to some extent applied, there is the need for more efficient and broader channels to freely disseminate the information to the scientific community. This article is part of a Special Issue entitled: Translational Proteomics.
Collapse
|