1
|
Raj A, Aggarwal S, Singh P, Yadav AK, Dash D. PgxSAVy: A tool for comprehensive evaluation of variant peptide quality in proteogenomics - catching the (un)usual suspects. Comput Struct Biotechnol J 2024; 23:711-722. [PMID: 38292474 PMCID: PMC10825656 DOI: 10.1016/j.csbj.2023.12.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/19/2023] [Accepted: 12/23/2023] [Indexed: 02/01/2024] Open
Abstract
Variant peptides resulting from single nucleotide polymorphisms (SNPs) can lead to aberrant protein functions and have translational potential for disease diagnosis and personalized therapy. Variant peptides detected by proteogenomics are fraught with high number of false positives, but there is no uniform and comprehensive approach to assess variant quality across analysis pipelines. Despite class-specific FDR along with ad-hoc filters, the problem is far from solved. These protocols are typically manual and tedious, and thus not uniform across labs. We demonstrate that variant peptide rescoring, integrated with intensity, variant event information and search result features, allows better discrimination of correct variant peptides. Implemented into PgxSAVy - a tool for quality control of variant peptides, this method can tackle the high rate of false positives. PgxSAVy provides a rigorous framework for quality control and annotations of variant peptides on the basis of (i) variant quality, (ii) isobaric masses, and (iii) disease annotation. PgxSAVy demonstrated high accuracy by identifying true variants with 98.43% accuracy on simulated data. Large-scale proteogenomic reanalysis of ∼2.8 million spectra (PXD004010 and PXD001468) resulted in 12,705 variant peptide spectrum matches (PSMs), of which PgxSAVy evaluated 3028 (23.8%), 1409 (11.1%) and 8268 (65.1%) as confident, semi-confident and doubtful respectively. PgxSAVy also annotates the variants based on their pathogenicity and provides support for assisted manual validation. The analysis of proteins carrying variants can provide fine granularity in discovering important pathways. PgxSAVy will advance personalized medicine by providing a comprehensive framework for quality control and prioritization of proteogenomics variants. PgxSAVy is freely available at https://pgxsavy.igib.res.in/ as a webserver and https://github.com/anuragraj/PgxSAVy as a stand-alone tool.
Collapse
Affiliation(s)
- Anurag Raj
- G. N. Ramachandran Knowledge Centre for Genomics Informatics, CSIR – Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Suruchi Aggarwal
- Computational and Mathematical Biology Centre (CMBC), 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
- Centre for Drug Discovery (CDD), 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
- Centre for Microbial Research (CMR), Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
| | - Prateek Singh
- G. N. Ramachandran Knowledge Centre for Genomics Informatics, CSIR – Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Amit Kumar Yadav
- Computational and Mathematical Biology Centre (CMBC), 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
- Centre for Drug Discovery (CDD), 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
- Centre for Microbial Research (CMR), Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
| | - Debasis Dash
- G. N. Ramachandran Knowledge Centre for Genomics Informatics, CSIR – Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| |
Collapse
|
2
|
Lin MS, Varunjikar MS, Lie KK, Søfteland L, Dellafiora L, Ørnsrud R, Sanden M, Berntssen MHG, Dorne JLCM, Bafna V, Rasinger JD. Multi-tissue proteogenomic analysis for mechanistic toxicology studies in non-model species. ENVIRONMENT INTERNATIONAL 2023; 182:108309. [PMID: 37980879 DOI: 10.1016/j.envint.2023.108309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 08/15/2023] [Accepted: 11/04/2023] [Indexed: 11/21/2023]
Abstract
New approach methodologies (NAM), including omics and in vitro approaches, are contributing to the implementation of 3R (reduction, refinement and replacement) strategies in regulatory science and risk assessment. In this study, we present an integrative transcriptomics and proteomics analysis workflow for the validation and revision of complex fish genomes and demonstrate how proteogenomics expression matrices can be used to support multi-level omics data integration in non-model species in vivo and in vitro. Using Atlantic salmon as an example, we constructed proteogenomic databases from publicly available transcriptomic data and in-house generated RNA-Seq and LC-MS/MS data. Our analysis identified ∼80,000 peptides, providing direct evidence of translation for over 40,000 RefSeq structures. The data also highlighted 183 co-located peptide groups that supported a single transcript each, and in each case, either corrected a previous annotation, supported Ensembl annotations not present in RefSeq, or identified novel previously unannotated genes. Proteogenomics data-derived expression matrices revealed distinct profiles for the different tissue types analyzed. Focusing on proteins involved in defense against xenobiotics, we detected distinct expression patterns across different salmon tissues and observed homology in the expression of chemical defense proteins between in vivo and in vitro liver systems. Our study demonstrates the potential of proteogenomic analyses in extending our understanding of complex fish genomes and provides an advanced bioinformatic toolkit to support the further development of NAMs and their application in regulatory science and (eco)toxicological studies of non-model species.
Collapse
Affiliation(s)
- M S Lin
- Bioinformatics and Systems Biology Program, UC San Diego, San Diego, CA, United States.
| | | | - K K Lie
- Institute of Marine Research, Bergen, Norway.
| | - L Søfteland
- Institute of Marine Research, Bergen, Norway.
| | - L Dellafiora
- Department of Food and Drug, University of Parma, Parco Area delle Scienze 27/A, 43124 Parma, Italy.
| | - R Ørnsrud
- Institute of Marine Research, Bergen, Norway.
| | - M Sanden
- Institute of Marine Research, Bergen, Norway.
| | | | - J L C M Dorne
- European Food Safety Authority, Methodological and Scientific Support Unit, Via Carlo Magno 1A, 43121 Parma, Italy.
| | - V Bafna
- Computer Science & Engineering and HDSI, UC San Diego, San Diego, CA, United States.
| | | |
Collapse
|
3
|
Wacholder A, Carvunis AR. Biological factors and statistical limitations prevent detection of most noncanonical proteins by mass spectrometry. PLoS Biol 2023; 21:e3002409. [PMID: 38048358 PMCID: PMC10721188 DOI: 10.1371/journal.pbio.3002409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 12/14/2023] [Accepted: 10/30/2023] [Indexed: 12/06/2023] Open
Abstract
Ribosome profiling experiments indicate pervasive translation of short open reading frames (ORFs) outside of annotated protein-coding genes. However, shotgun mass spectrometry (MS) experiments typically detect only a small fraction of the predicted protein products of this noncanonical translation. The rarity of detection could indicate that most predicted noncanonical proteins are rapidly degraded and not present in the cell; alternatively, it could reflect technical limitations. Here, we leveraged recent advances in ribosome profiling and MS to investigate the factors limiting detection of noncanonical proteins in yeast. We show that the low detection rate of noncanonical ORF products can largely be explained by small size and low translation levels and does not indicate that they are unstable or biologically insignificant. In particular, proteins encoded by evolutionarily young genes, including those with well-characterized biological roles, are too short and too lowly expressed to be detected by shotgun MS at current detection sensitivities. Additionally, we find that decoy biases can give misleading estimates of noncanonical protein false discovery rates, potentially leading to false detections. After accounting for these issues, we found strong evidence for 4 noncanonical proteins in MS data, which were also supported by evolution and translation data. These results illustrate the power of MS to validate unannotated genes predicted by ribosome profiling, but also its substantial limitations in finding many biologically relevant lowly expressed proteins.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
4
|
Wacholder A, Carvunis AR. Biological Factors and Statistical Limitations Prevent Detection of Most Noncanonical Proteins by Mass Spectrometry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.09.531963. [PMID: 36945638 PMCID: PMC10028962 DOI: 10.1101/2023.03.09.531963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Ribosome profiling experiments indicate pervasive translation of short open reading frames (ORFs) outside of annotated protein-coding genes. However, shotgun mass spectrometry experiments typically detect only a small fraction of the predicted protein products of this noncanonical translation. The rarity of detection could indicate that most predicted noncanonical proteins are rapidly degraded and not present in the cell; alternatively, it could reflect technical limitations. Here we leveraged recent advances in ribosome profiling and mass spectrometry to investigate the factors limiting detection of noncanonical proteins in yeast. We show that the low detection rate of noncanonical ORF products can largely be explained by small size and low translation levels and does not indicate that they are unstable or biologically insignificant. In particular, proteins encoded by evolutionarily young genes, including those with well-characterized biological roles, are too short and too lowly-expressed to be detected by shotgun mass spectrometry at current detection sensitivities. Additionally, we find that decoy biases can give misleading estimates of noncanonical protein false discovery rates, potentially leading to false detections. After accounting for these issues, we found strong evidence for four noncanonical proteins in mass spectrometry data, which were also supported by evolution and translation data. These results illustrate the power of mass spectrometry to validate unannotated genes predicted by ribosome profiling, but also its substantial limitations in finding many biologically relevant lowly-expressed proteins.
Collapse
|
5
|
Dou Y, Liu Y, Yi X, Olsen LK, Zhu H, Gao Q, Zhou H, Zhang B. SEPepQuant enhances the detection of possible isoform regulations in shotgun proteomics. Nat Commun 2023; 14:5809. [PMID: 37726316 PMCID: PMC10509223 DOI: 10.1038/s41467-023-41558-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 09/06/2023] [Indexed: 09/21/2023] Open
Abstract
Shotgun proteomics is essential for protein identification and quantification in biomedical research, but protein isoform characterization is challenging due to the extensive number of peptides shared across proteins, hindering our understanding of protein isoform regulation and their roles in normal and disease biology. We systematically assess the challenge and opportunities of shotgun proteomics-based protein isoform characterization using in silico and experimental data, and then present SEPepQuant, a graph theory-based approach to maximize isoform characterization. Using published data from one induced pluripotent stem cell study and two human hepatocellular carcinoma studies, we demonstrate the ability of SEPepQuant in addressing the key limitations of existing methods, providing more comprehensive isoform-level characterization, identifying hundreds of isoform-level regulation events, and facilitating streamlined cross-study comparisons. Our analysis provides solid evidence to support a widespread role of protein isoform regulation in normal and disease processes, and SEPepQuant has broad applications to biological and translational research.
Collapse
Affiliation(s)
- Yongchao Dou
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Yuejia Liu
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Avenue, 210023, Nanjing, Jiangsu, China
| | - Xinpei Yi
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Lindsey K Olsen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Hongwen Zhu
- Department of Analytical Chemistry, State Key Laboratory of Drug Research and CAS Key Laboratory of Receptor Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, 201203, Shanghai, China
| | - Qiang Gao
- Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital, Fudan University, and Key Laboratory of Carcinogenesis and Cancer Invasion of Ministry of Education, 180 Fenglin Road, 200032, Shanghai, China
| | - Hu Zhou
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Avenue, 210023, Nanjing, Jiangsu, China
- Department of Analytical Chemistry, State Key Laboratory of Drug Research and CAS Key Laboratory of Receptor Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, 201203, Shanghai, China
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
6
|
Desai H, Ofori S, Boatner L, Yu F, Villanueva M, Ung N, Nesvizhskii AI, Backus K. Multi-omic stratification of the missense variant cysteinome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.12.553095. [PMID: 37645963 PMCID: PMC10461992 DOI: 10.1101/2023.08.12.553095] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Cancer genomes are rife with genetic variants; one key outcome of this variation is gain-ofcysteine, which is the most frequently acquired amino acid due to missense variants in COSMIC. Acquired cysteines are both driver mutations and sites targeted by precision therapies. However, despite their ubiquity, nearly all acquired cysteines remain uncharacterized. Here, we pair cysteine chemoproteomics-a technique that enables proteome-wide pinpointing of functional, redox sensitive, and potentially druggable residues-with genomics to reveal the hidden landscape of cysteine acquisition. For both cancer and healthy genomes, we find that cysteine acquisition is a ubiquitous consequence of genetic variation that is further elevated in the context of decreased DNA repair. Our chemoproteogenomics platform integrates chemoproteomic, whole exome, and RNA-seq data, with a customized 2-stage false discovery rate (FDR) error controlled proteomic search, further enhanced with a user-friendly FragPipe interface. Integration of CADD predictions of deleteriousness revealed marked enrichment for likely damaging variants that result in acquisition of cysteine. By deploying chemoproteogenomics across eleven cell lines, we identify 116 gain-of-cysteines, of which 10 were liganded by electrophilic druglike molecules. Reference cysteines proximal to missense variants were also found to be pervasive, 791 in total, supporting heretofore untapped opportunities for proteoform-specific chemical probe development campaigns. As chemoproteogenomics is further distinguished by sample-matched combinatorial variant databases and compatible with redox proteomics and small molecule screening, we expect widespread utility in guiding proteoform-specific biology and therapeutic discovery.
Collapse
Affiliation(s)
- Heta Desai
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, UCLA, Los Angeles, CA, 90095, USA
| | - Samuel Ofori
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
| | - Lisa Boatner
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, CA, 90095, USA
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Miranda Villanueva
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, UCLA, Los Angeles, CA, 90095, USA
| | - Nicholas Ung
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, CA, 90095, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
- Molecular Biology Institute, UCLA, Los Angeles, CA, 90095, USA
- DOE Institute for Genomics and Proteomics, UCLA, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Center, UCLA, Los Angeles, CA, 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, UCLA, Los Angeles, CA, 90095, USA
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Keriann Backus
- Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, UCLA, Los Angeles, CA, 90095, USA
- DOE Institute for Genomics and Proteomics, UCLA, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Center, UCLA, Los Angeles, CA, 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, UCLA, Los Angeles, CA, 90095, USA
| |
Collapse
|
7
|
Arad G, Geiger T. Functional impact of protein-RNA variation in clinical cancer analyses. Mol Cell Proteomics 2023:100587. [PMID: 37290530 PMCID: PMC10388586 DOI: 10.1016/j.mcpro.2023.100587] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 04/08/2023] [Accepted: 05/25/2023] [Indexed: 06/10/2023] Open
Abstract
Comprehensive molecular characterization of tumors aims to uncover cancer vulnerabilities, drug resistance mechanisms and biomarkers. Identification of cancer drivers was suggested as the basis for patient-tailored therapy, and transcriptomic analyses were proposed to reveal the phenotypic outcome of cancer mutations. With the maturation of the proteomic field, studies of protein-RNA discrepancies suggested that RNA analyses are insufficient to predict cellular functions. In this manuscript we discuss the importance of direct mRNA-protein comparisons in clinical cancer studies. We make use of the large amount of data generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC), which includes protein and mRNA expression analyses from the exact same samples. Analysis of protein-RNA correlations showed marked differences among cancer types, and highlighted the protein-RNA similarities and discrepancies among functional pathways and drug targets. Additionally, unsupervised clustering of the data based on protein or RNA showed substantial differences in tumor classification and the cellular processes that differentiate between clusters. These analyses show the difficulty to predict protein levels from mRNAs, and the critical role of protein analyses for phenotypic tumor characterization.
Collapse
Affiliation(s)
| | - Tamar Geiger
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
8
|
Lin A, Plubell DL, Keich U, Noble WS. Accurately Assigning Peptides to Spectra When Only a Subset of Peptides Are Relevant. J Proteome Res 2021; 20:4153-4164. [PMID: 34236864 PMCID: PMC8489664 DOI: 10.1021/acs.jproteome.1c00483] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The standard proteomics database search strategy involves searching spectra against a peptide database and estimating the false discovery rate (FDR) of the resulting set of peptide-spectrum matches. One assumption of this protocol is that all the peptides in the database are relevant to the hypothesis being investigated. However, in settings where researchers are interested in a subset of peptides, alternative search and FDR control strategies are needed. Recently, two methods were proposed to address this problem: subset-search and all-sub. We show that both methods fail to control the FDR. For subset-search, this failure is due to the presence of "neighbor" peptides, which are defined as irrelevant peptides with a similar precursor mass and fragmentation spectrum as a relevant peptide. Not considering neighbors compromises the FDR estimate because a spectrum generated by an irrelevant peptide can incorrectly match well to a relevant peptide. Therefore, we have developed a new method, "subset-neighbor search" (SNS), that accounts for neighbor peptides. We show evidence that SNS controls the FDR when neighbors are present and that SNS outperforms group-FDR, the only other method that appears to control the FDR relative to a subset of relevant peptides.
Collapse
Affiliation(s)
- Andy Lin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Deanna L. Plubell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Uri Keich
- School of Mathematics and Statistics, University of Sydney, NSW, Australia
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School for Computer Science and Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
9
|
Macklin A, Khan S, Kislinger T. Recent advances in mass spectrometry based clinical proteomics: applications to cancer research. Clin Proteomics 2020; 17:17. [PMID: 32489335 PMCID: PMC7247207 DOI: 10.1186/s12014-020-09283-w] [Citation(s) in RCA: 150] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Accepted: 05/15/2020] [Indexed: 02/07/2023] Open
Abstract
Cancer biomarkers have transformed current practices in the oncology clinic. Continued discovery and validation are crucial for improving early diagnosis, risk stratification, and monitoring patient response to treatment. Profiling of the tumour genome and transcriptome are now established tools for the discovery of novel biomarkers, but alterations in proteome expression are more likely to reflect changes in tumour pathophysiology. In the past, clinical diagnostics have strongly relied on antibody-based detection strategies, but these methods carry certain limitations. Mass spectrometry (MS) is a powerful method that enables increasingly comprehensive insights into changes of the proteome to advance personalized medicine. In this review, recent improvements in MS-based clinical proteomics are highlighted with a focus on oncology. We will provide a detailed overview of clinically relevant samples types, as well as, consideration for sample preparation methods, protein quantitation strategies, MS configurations, and data analysis pipelines currently available to researchers. Critical consideration of each step is necessary to address the pressing clinical questions that advance cancer patient diagnosis and prognosis. While the majority of studies focus on the discovery of clinically-relevant biomarkers, there is a growing demand for rigorous biomarker validation. These studies focus on high-throughput targeted MS assays and multi-centre studies with standardized protocols. Additionally, improvements in MS sensitivity are opening the door to new classes of tumour-specific proteoforms including post-translational modifications and variants originating from genomic aberrations. Overlaying proteomic data to complement genomic and transcriptomic datasets forges the growing field of proteogenomics, which shows great potential to improve our understanding of cancer biology. Overall, these advancements not only solidify MS-based clinical proteomics' integral position in cancer research, but also accelerate the shift towards becoming a regular component of routine analysis and clinical practice.
Collapse
Affiliation(s)
- Andrew Macklin
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Shahbaz Khan
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Thomas Kislinger
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
| |
Collapse
|
10
|
Kumar P, Johnson JE, Easterly C, Mehta S, Sajulga R, Nunn B, Jagtap PD, Griffin TJ. A Sectioning and Database Enrichment Approach for Improved Peptide Spectrum Matching in Large, Genome-Guided Protein Sequence Databases. J Proteome Res 2020; 19:2772-2785. [DOI: 10.1021/acs.jproteome.0c00260] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Praveen Kumar
- Bioinformatics and Computational Biology, University of Minnesota−Rochester, Rochester, Minnesota 55904, United States
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Caleb Easterly
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Subina Mehta
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Ray Sajulga
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Brook Nunn
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Pratik D. Jagtap
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Timothy J. Griffin
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
11
|
Wen B, Li K, Zhang Y, Zhang B. Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis. Nat Commun 2020; 11:1759. [PMID: 32273506 PMCID: PMC7145864 DOI: 10.1038/s41467-020-15456-w] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 03/10/2020] [Indexed: 01/01/2023] Open
Abstract
Genomics-based neoantigen discovery can be enhanced by proteomic evidence, but there remains a lack of consensus on the performance of different quality control methods for variant peptide identification in proteogenomics. We propose to use the difference between accurately predicted and observed retention times for each peptide as a metric to evaluate different quality control methods. To this end, we develop AutoRT, a deep learning algorithm with high accuracy in retention time prediction. Analysis of three cancer data sets with a total of 287 tumor samples using different quality control strategies results in substantially different numbers of identified variant peptides and putative neoantigens. Our systematic evaluation, using the proposed retention time metric, provides insights and practical guidance on the selection of quality control strategies. We implement the recommended strategy in a computational workflow named NeoFlow to support proteogenomics-based neoantigen prioritization, enabling more sensitive discovery of putative neoantigens. Identifying mutation-derived neoantigens by proteogenomics requires robust strategies for quality control. Here, the authors propose peptide retention time as an evaluation metric for proteogenomics quality control methods, and develop a deep learning algorithm for accurate retention time prediction.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kai Li
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Yun Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
12
|
Li S, Cha SW, Heffner K, Hizal DB, Bowen MA, Chaerkady R, Cole RN, Tejwani V, Kaushik P, Henry M, Meleady P, Sharfstein ST, Betenbaugh MJ, Bafna V, Lewis NE. Proteogenomic Annotation of Chinese Hamsters Reveals Extensive Novel Translation Events and Endogenous Retroviral Elements. J Proteome Res 2019; 18:2433-2445. [PMID: 31020842 DOI: 10.1021/acs.jproteome.8b00935] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
A high-quality genome annotation greatly facilitates successful cell line engineering. Standard draft genome annotation pipelines are based largely on de novo gene prediction, homology, and RNA-Seq data. However, draft annotations can suffer from incorrect predictions of translated sequence, inaccurate splice isoforms, and missing genes. Here, we generated a draft annotation for the newly assembled Chinese hamster genome and used RNA-Seq, proteomics, and Ribo-Seq to experimentally annotate the genome. We identified 3529 new proteins compared to the hamster RefSeq protein annotation and 2256 novel translational events (e.g., alternative splices, mutations, and novel splices). Finally, we used this pipeline to identify the source of translated retroviruses contaminating recombinant products from Chinese hamster ovary (CHO) cell lines, including 119 type-C retroviruses, thus enabling future efforts to eliminate retroviruses to reduce the costs incurred with retroviral particle clearance. In summary, the improved annotation provides a more accurate resource for CHO cell line engineering, by facilitating the interpretation of omics data, defining of cellular pathways, and engineering of complex phenotypes.
Collapse
Affiliation(s)
| | | | | | - Deniz Baycin Hizal
- Antibody Discovery and Protein Engineering , AstraZeneca , Gaithersburg , Maryland , United States
| | - Michael A Bowen
- Antibody Discovery and Protein Engineering , AstraZeneca , Gaithersburg , Maryland , United States
| | - Raghothama Chaerkady
- Antibody Discovery and Protein Engineering , AstraZeneca , Gaithersburg , Maryland , United States
| | | | - Vijay Tejwani
- Colleges of Nanoscale Science and Engineering , SUNY Polytechnic Institute , Albany , New York 12203 , United States
| | - Prashant Kaushik
- National Institute for Cellular Biotechnology , Dublin City University , Dublin 9, Ireland
| | - Michael Henry
- National Institute for Cellular Biotechnology , Dublin City University , Dublin 9, Ireland
| | - Paula Meleady
- National Institute for Cellular Biotechnology , Dublin City University , Dublin 9, Ireland
| | - Susan T Sharfstein
- Colleges of Nanoscale Science and Engineering , SUNY Polytechnic Institute , Albany , New York 12203 , United States
| | | | | | | |
Collapse
|
13
|
Jayaram S, Balakrishnan L, Singh M, Zabihi A, Ganesh RA, Mangalaparthi KK, Sonpatki P, Gupta MK, Amaresha CB, Prasad K, Mariswamappa K, Pillai S, Lakshmikantha A, Shah N, Sirdeshmukh R. Identification of a Novel Splice Variant of Neural Cell Adhesion Molecule in Glioblastoma Through Proteogenomics Analysis. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2019; 22:437-448. [PMID: 29927716 DOI: 10.1089/omi.2017.0220] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Splice variants are known to be important in the pathophysiology of tumors, including the brain cancers. We applied a proteogenomics pipeline to identify splice variants in glioblastoma (GBM, grade IV glioma), a highly malignant brain tumor, using in-house generated mass spectrometric proteomic data and public domain RNASeq dataset. Our analysis led to the identification of a novel exon that maps to the long isoform of Neural cell adhesion molecule 1 (NCAM1), expressed on the surface of glial cells and neurons, important for cell adhesion and cell signaling. The presence of the novel exon is supported with the identification of five peptides spanning it. Additional peptides were also detected in sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) gel separated proteins from GBM patient tissue, underscoring the presence of the novel peptides in the intact brain protein. The novel exon was detected in the RNASeq dataset in 18 of 25 GBM samples and separately validated in additional 10 GBM tumor tissues using quantitative real-time-polymerase chain reaction (qRT-PCR). Both transcriptomic and proteomic data indicate downregulation of NCAM1, including the novel variant, in GBM. Domain analysis of the novel NCAM1 sequence indicates that the insertion of the novel exon contributes extra low-complexity region in the protein that may be important for protein-protein interactions and hence for cell signaling associated with tumor development. Taken together, the novel NCAM1 variant reported in this study exemplifies the importance of future multiomics research and systems biology applications in GBM.
Collapse
Affiliation(s)
- Savita Jayaram
- 1 Institute of Bioinformatics , International Tech Park, Bangalore, India .,2 Manipal Academy of Higher Education , Manipal, India
| | - Lavanya Balakrishnan
- 3 Mazumdar Shaw Center for Translational Research , Narayana Hrudayalaya Health City, Bangalore, India
| | - Manika Singh
- 1 Institute of Bioinformatics , International Tech Park, Bangalore, India .,4 Amrita School of Biotechnology , Amrita Vishwa Vidyapeetham, Kollam, India
| | - Azin Zabihi
- 3 Mazumdar Shaw Center for Translational Research , Narayana Hrudayalaya Health City, Bangalore, India
| | - Raksha A Ganesh
- 3 Mazumdar Shaw Center for Translational Research , Narayana Hrudayalaya Health City, Bangalore, India
| | - Kiran K Mangalaparthi
- 1 Institute of Bioinformatics , International Tech Park, Bangalore, India .,4 Amrita School of Biotechnology , Amrita Vishwa Vidyapeetham, Kollam, India
| | - Pranali Sonpatki
- 3 Mazumdar Shaw Center for Translational Research , Narayana Hrudayalaya Health City, Bangalore, India
| | - Manoj Kumar Gupta
- 1 Institute of Bioinformatics , International Tech Park, Bangalore, India .,2 Manipal Academy of Higher Education , Manipal, India
| | - Chaitra B Amaresha
- 3 Mazumdar Shaw Center for Translational Research , Narayana Hrudayalaya Health City, Bangalore, India
| | - Komal Prasad
- 5 Mazumdar Shaw Medical Center , Narayana Health City, Bangalore, India
| | | | - Shibu Pillai
- 5 Mazumdar Shaw Medical Center , Narayana Health City, Bangalore, India
| | | | - Nameeta Shah
- 3 Mazumdar Shaw Center for Translational Research , Narayana Hrudayalaya Health City, Bangalore, India
| | - Ravi Sirdeshmukh
- 1 Institute of Bioinformatics , International Tech Park, Bangalore, India .,2 Manipal Academy of Higher Education , Manipal, India .,3 Mazumdar Shaw Center for Translational Research , Narayana Hrudayalaya Health City, Bangalore, India
| |
Collapse
|
14
|
Hattori E, Kondo T. Current status of cancer proteogenomics: a brief introduction. ACTA ACUST UNITED AC 2019. [DOI: 10.2198/jelectroph.63.33] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Emi Hattori
- Division of Rare Cancer Research, National Cancer Center Research Institute
| | - Tadashi Kondo
- Division of Rare Cancer Research, National Cancer Center Research Institute
| |
Collapse
|
15
|
Pullman BS, Wertz J, Carver J, Bandeira N. ProteinExplorer: A Repository-Scale Resource for Exploration of Protein Detection in Public Mass Spectrometry Data Sets. J Proteome Res 2018; 17:4227-4234. [PMID: 30985146 DOI: 10.1021/acs.jproteome.8b00496] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
High-throughput tandem mass spectrometry has enabled the detection and identification of over 75% of all proteins predicted to result in translated gene products in the human genome. In fact, the galloping rate of data acquisition and sharing of mass spectrometry data has led to the current availability of many tens of terabytes of public data in thousands of human data sets. The systematic reanalysis of these public data sets has been used to build a community-scale spectral library of 2.1 million precursors for over 1 million unique sequences from over 19,000 proteins (including spectra of synthetic peptides). However, it has remained challenging to find and inspect spectra of peptides covering functional protein regions or matching novel proteins. ProteinExplorer addresses these challenges with an intuitive interface mapping tens of millions of identifications to functional sites on nearly all human proteins while maintaining provenance for every identification back to the original data set and data file. Additionally, ProteinExplorer facilitates the selection and inspection of HPP-compliant peptides whose spectra can be matched to spectra of synthetic peptides and already includes HPP-compliant evidence for 107 missing (PE2, PE3, and PE4) and 23 dubious (PE5) proteins. Finally, ProteinExplorer allows users to rate spectra and to contribute to a community library of peptides entitled PrEdict (Protein Existance dictionary) mapping to novel proteins but whose preliminary identities have not yet been fully established with community-scale false discovery rates and synthetic peptide spectra. ProteinExplorer can be now be accessed at https://massive.ucsd.edu/ProteoSAFe/protein_explorer_splash.jsp .
Collapse
|
16
|
Lee PY, Chin SF, Low TY, Jamal R. Probing the colorectal cancer proteome for biomarkers: Current status and perspectives. J Proteomics 2018; 187:93-105. [PMID: 29953962 DOI: 10.1016/j.jprot.2018.06.014] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 06/13/2018] [Accepted: 06/23/2018] [Indexed: 02/07/2023]
Abstract
Colorectal cancer (CRC) is one of the most prevalent malignancies worldwide. Biomarkers that can facilitate better clinical management of CRC are in high demand to improve patient outcome and to reduce mortality. In this regard, proteomic analysis holds a promising prospect in the hunt of novel biomarkers for CRC and in understanding the mechanisms underlying tumorigenesis. This review aims to provide an overview of the current progress of proteomic research, focusing on discovery and validation of diagnostic biomarkers for CRC. We will summarize the contributions of proteomic strategies to recent discoveries of protein biomarkers for CRC and also briefly discuss the potential and challenges of different proteomic approaches in biomarker discovery and translational applications.
Collapse
Affiliation(s)
- Pey Yee Lee
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000 Kuala Lumpur, Malaysia.
| | - Siok-Fong Chin
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000 Kuala Lumpur, Malaysia
| | - Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000 Kuala Lumpur, Malaysia
| | - Rahman Jamal
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000 Kuala Lumpur, Malaysia
| |
Collapse
|
17
|
Brunet MA, Levesque SA, Hunting DJ, Cohen AA, Roucou X. Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship. Genome Res 2018; 28:609-624. [PMID: 29626081 PMCID: PMC5932603 DOI: 10.1101/gr.230938.117] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 03/27/2018] [Indexed: 12/12/2022]
Abstract
Technological advances promise unprecedented opportunities for whole exome sequencing and proteomic analyses of populations. Currently, data from genome and exome sequencing or proteomic studies are searched against reference genome annotations. This provides the foundation for research and clinical screening for genetic causes of pathologies. However, current genome annotations substantially underestimate the proteomic information encoded within a gene. Numerous studies have now demonstrated the expression and function of alternative (mainly small, sometimes overlapping) ORFs within mature gene transcripts. This has important consequences for the correlation of phenotypes and genotypes. Most alternative ORFs are not yet annotated because of a lack of evidence, and this absence from databases precludes their detection by standard proteomic methods, such as mass spectrometry. Here, we demonstrate how current approaches tend to overlook alternative ORFs, hindering the discovery of new genetic drivers and fundamental research. We discuss available tools and techniques to improve identification of proteins from alternative ORFs and finally suggest a novel annotation system to permit a more complete representation of the transcriptomic and proteomic information contained within a gene. Given the crucial challenge of distinguishing functional ORFs from random ones, the suggested pipeline emphasizes both experimental data and conservation signatures. The addition of alternative ORFs in databases will render identification less serendipitous and advance the pace of research and genomic knowledge. This review highlights the urgent medical and research need to incorporate alternative ORFs in current genome annotations and thus permit their inclusion in hypotheses and models, which relate phenotypes and genotypes.
Collapse
Affiliation(s)
- Marie A Brunet
- Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.,Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada
| | - Sébastien A Levesque
- Pediatric Department, Centre Hospitalier de l'Université de Sherbrooke, Quebec J1H 5N4, Canada
| | - Darel J Hunting
- Department of Nuclear Medicine & Radiobiology, Université de Sherbrooke, Quebec J1H 5N4, Canada
| | - Alan A Cohen
- Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada
| | - Xavier Roucou
- Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada
| |
Collapse
|
18
|
Ivanov MV, Lobas AA, Levitsky LI, Moshkovskii SA, Gorshkov MV. Brute-Force Approach for Mass Spectrometry-Based Variant Peptide Identification in Proteogenomics without Personalized Genomic Data. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2018; 29:435-438. [PMID: 29299837 DOI: 10.1007/s13361-017-1859-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 10/05/2017] [Accepted: 11/22/2017] [Indexed: 06/07/2023]
Abstract
In a proteogenomic approach based on tandem mass spectrometry analysis of proteolytic peptide mixtures, customized exome or RNA-seq databases are employed for identifying protein sequence variants. However, the problem of variant peptide identification without personalized genomic data is important for a variety of applications. Following the recent proposal by Chick et al. (Nat. Biotechnol. 33, 743-749, 2015) on the feasibility of such variant peptide search, we evaluated two available approaches based on the previously suggested "open" search and the "brute-force" strategy. To improve the efficiency of these approaches, we propose an algorithm for exclusion of false variant identifications from the search results involving analysis of modifications mimicking single amino acid substitutions. Also, we propose a de novo based scoring scheme for assessment of identified point mutations. In the scheme, the search engine analyzes y-type fragment ions in MS/MS spectra to confirm the location of the mutation in the variant peptide sequence. Graphical abstract ᅟ.
Collapse
Affiliation(s)
- Mark V Ivanov
- Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
- Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow region, 141700, Russia
| | - Anna A Lobas
- Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
- Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow region, 141700, Russia
| | - Lev I Levitsky
- Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
- Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow region, 141700, Russia
| | - Sergei A Moshkovskii
- Institute of Biomedical Chemistry, Moscow, 119121, Russia
- Pirogov Russian National Research Medical University, Moscow, 117997, Russia
| | - Mikhail V Gorshkov
- Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia.
- Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow region, 141700, Russia.
| |
Collapse
|
19
|
Cha SW, Bonissone S, Na S, Pevzner PA, Bafna V. The Antibody Repertoire of Colorectal Cancer. Mol Cell Proteomics 2017; 16:2111-2124. [PMID: 29046389 PMCID: PMC5724175 DOI: 10.1074/mcp.ra117.000397] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Indexed: 12/31/2022] Open
Abstract
Immunotherapy is becoming increasingly important in the fight against cancers, using and manipulating the body's immune response to treat tumors. Understanding the immune repertoire-the collection of immunological proteins-of treated and untreated cells is possible at the genomic, but technically difficult at the protein level. Standard protein databases do not include the highly divergent sequences of somatic rearranged immunoglobulin genes, and may lead to miss identifications in a mass spectrometry search. We introduce a novel proteogenomic approach, AbScan, to identify these highly variable antibody peptides, by developing a customized antibody database construction method using RNA-seq reads aligned to immunoglobulin (Ig) genes.AbScan starts by filtering transcript (RNA-seq) reads that match the template for Ig genes. The retained reads are used to construct a repertoire graph using the "split" de Bruijn graph: a graph structure that improves on the standard de Bruijn graph to capture the high diversity of Ig genes in a compact manner. AbScan corrects for sequencing errors, and converts the graph to a format suitable for searching with MS/MS search tools. We used AbScan to create an antibody database from 90 RNA-seq colorectal tumor samples. Next, we used proteogenomic analysis to search MS/MS spectra of matched colorectal samples from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) against the AbScan generated database. AbScan identified 1,940 distinct antibody peptides. Correlating with previously identified Single Amino-Acid Variants (SAAVs) in the tumor samples, we identified 163 pairs (antibody peptide, SAAV) with significant cooccurrence pattern in the 90 samples. The presence of coexpressed antibody and mutated peptides was correlated with survival time of the individuals. Our results suggest that AbScan (https://github.com/csw407/AbScan.git) is an effective tool for a proteomic exploration of the immune response in cancers.
Collapse
Affiliation(s)
- Seong Won Cha
- From the ‡Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, California
| | | | - Seungjin Na
- ¶Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92037
| | - Pavel A Pevzner
- ¶Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92037
| | - Vineet Bafna
- ¶Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92037
| |
Collapse
|
20
|
Dimitrakopoulos L, Prassas I, Diamandis EP, Charames GS. Onco-proteogenomics: Multi-omics level data integration for accurate phenotype prediction. Crit Rev Clin Lab Sci 2017; 54:414-432. [DOI: 10.1080/10408363.2017.1384446] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Lampros Dimitrakopoulos
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Ioannis Prassas
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
| | - Eleftherios P. Diamandis
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Clinical Biochemistry, University Health Network, Toronto, ON, Canada
| | - George S. Charames
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| |
Collapse
|
21
|
Li H, Park J, Kim H, Hwang KB, Paek E. Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments. J Proteome Res 2017; 16:2231-2239. [PMID: 28452485 DOI: 10.1021/acs.jproteome.7b00033] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteogenomic searches are useful for novel peptide identification from tandem mass spectra. Usually, separate and multistage approaches are adopted to accurately control the false discovery rate (FDR) for proteogenomic search. Their performance on novel peptide identification has not been thoroughly evaluated, however, mainly due to the difficulty in confirming the existence of identified novel peptides. We simulated a proteogenomic search using a controlled, spike-in proteomic data set. After confirming that the results of the simulated proteogenomic search were similar to those of a real proteogenomic search using a human cell line data set, we evaluated the performance of six FDR control methods-global, separate, and multistage FDR estimation, respectively, coupled to a target-decoy search and a mixture model-based method-on novel peptide identification. The multistage approach showed the highest accuracy for FDR estimation. However, global and separate FDR estimation with the mixture model-based method showed higher sensitivities than others at the same true FDR. Furthermore, the mixture model-based method performed equally well when applied without or with a reduced set of decoy sequences. Considering different prior probabilities for novel and known protein identification, we recommend using mixture model-based methods with separate FDR estimation for sensitive and reliable identification of novel peptides from proteogenomic searches.
Collapse
Affiliation(s)
- Honglan Li
- School of Computer Science and Engineering, Soongsil University , Seoul 06978, Republic of Korea
| | - Jonghun Park
- Department of Computer Science, Hanyang University , Seoul 04763, Republic of Korea
| | - Hyunwoo Kim
- Scientific Data Research Center, Korea Institute of Science and Technology Information , Daejeon 34141, Republic of Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University , Seoul 06978, Republic of Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University , Seoul 04763, Republic of Korea
| |
Collapse
|
22
|
Ruggles KV, Krug K, Wang X, Clauser KR, Wang J, Payne SH, Fenyö D, Zhang B, Mani DR. Methods, Tools and Current Perspectives in Proteogenomics. Mol Cell Proteomics 2017; 16:959-981. [PMID: 28456751 DOI: 10.1074/mcp.mr117.000024] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Indexed: 12/20/2022] Open
Abstract
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications.
Collapse
Affiliation(s)
- Kelly V Ruggles
- From the ‡Department of Medicine, New York University School of Medicine, New York, New York 10016
| | - Karsten Krug
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Xiaojing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Karl R Clauser
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Jing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Samuel H Payne
- **Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354
| | - David Fenyö
- ‡‡Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, New York 10016; .,§§Institute for Systems Genetics, New York University School of Medicine, New York, New York 10016
| | - Bing Zhang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030; .,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - D R Mani
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142;
| |
Collapse
|
23
|
Ivanov MV, Lobas AA, Karpov DS, Moshkovskii SA, Gorshkov MV. Comparison of False Discovery Rate Control Strategies for Variant Peptide Identifications in Shotgun Proteogenomics. J Proteome Res 2017; 16:1936-1943. [DOI: 10.1021/acs.jproteome.6b01014] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Mark V. Ivanov
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Moscow Region, Dolgoprudny 141700, Russia
| | - Anna A. Lobas
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Moscow Region, Dolgoprudny 141700, Russia
| | - Dmitry S. Karpov
- Institute of Biomedical Chemistry, Moscow 119121, Russia
- Engelhardt
Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia
| | - Sergei A. Moshkovskii
- Institute of Biomedical Chemistry, Moscow 119121, Russia
- Pirogov Russian National Research Medical University, Moscow 117997, Russia
| | - Mikhail V. Gorshkov
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Moscow Region, Dolgoprudny 141700, Russia
| |
Collapse
|
24
|
Fu S, Liu X, Luo M, Xie K, Nice EC, Zhang H, Huang C. Proteogenomic studies on cancer drug resistance: towards biomarker discovery and target identification. Expert Rev Proteomics 2017; 14:351-362. [PMID: 28276747 DOI: 10.1080/14789450.2017.1299006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
INTRODUCTION Chemoresistance is a major obstacle for current cancer treatment. Proteogenomics is a powerful multi-omics research field that uses customized protein sequence databases generated by genomic and transcriptomic information to identify novel genes (e.g. noncoding, mutation and fusion genes) from mass spectrometry-based proteomic data. By identifying aberrations that are differentially expressed between tumor and normal pairs, this approach can also be applied to validate protein variants in cancer, which may reveal the response to drug treatment. Areas covered: In this review, we will present recent advances in proteogenomic investigations of cancer drug resistance with an emphasis on integrative proteogenomic pipelines and the biomarker discovery which contributes to achieving the goal of using precision/personalized medicine for cancer treatment. Expert commentary: The discovery and comprehensive understanding of potential biomarkers help identify the cohort of patients who may benefit from particular treatments, and will assist real-time clinical decision-making to maximize therapeutic efficacy and minimize adverse effects. With the development of MS-based proteomics and NGS-based sequencing, a growing number of proteogenomic tools are being developed specifically to investigate cancer drug resistance.
Collapse
Affiliation(s)
- Shuyue Fu
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| | - Xiang Liu
- b Department of Pathology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Maochao Luo
- c West China School of Public Health, Sichuan University , Chengdu , P.R.China
| | - Ke Xie
- d Department of Oncology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Edouard C Nice
- e Department of Biochemistry and Molecular Biology , Monash University , Clayton , Australia
| | - Haiyuan Zhang
- f School of Medicine , Yangtze University , P. R. China
| | - Canhua Huang
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| |
Collapse
|
25
|
Guerrero CR, Jagtap PD, Johnson JE, Griffin TJ. Using Galaxy for Proteomics. PROTEOME INFORMATICS 2016. [DOI: 10.1039/9781782626732-00289] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The area of informatics for mass spectrometry (MS)-based proteomics data has steadily grown over the last two decades. Numerous, effective software programs now exist for various aspects of proteomic informatics. However, many researchers still have difficulties in using these software. These difficulties arise from problems with running and integrating disparate software programs, scalability issues when dealing with large data volumes, and lack of ability to share and reproduce workflows comprised of different software. The Galaxy framework for bioinformatics provides an attractive option for solving many of these current issues in proteomic informatics. Originally developed as a workbench to enable genomic data analysis, numerous researchers are now turning to Galaxy to implement software for MS-based proteomics applications. Here, we provide an introduction to Galaxy and its features, and describe how software tools are deployed, published and shared via the scalable framework. We also describe some of the existing tools in Galaxy for basic MS-based proteomics data analysis and informatics. Finally, we describe how proteomics tools in Galaxy can be combined with other existing tools for genomic and transcriptomic data analysis to enable powerful multi-omic data analysis applications.
Collapse
Affiliation(s)
- Candace R. Guerrero
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota 1479 Gortner Avenue, St. Paul MN 55108 USA
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota 512 Walter Library, 117 Pleasant Street SE Minneapolis MN 55455 USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota 1479 Gortner Avenue, St. Paul MN 55108 USA
| |
Collapse
|
26
|
Deutsch EW, Csordas A, Sun Z, Jarnuczak A, Perez-Riverol Y, Ternent T, Campbell DS, Bernal-Llinares M, Okuda S, Kawano S, Moritz RL, Carver JJ, Wang M, Ishihama Y, Bandeira N, Hermjakob H, Vizcaíno JA. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res 2016; 45:D1100-D1106. [PMID: 27924013 PMCID: PMC5210636 DOI: 10.1093/nar/gkw936] [Citation(s) in RCA: 662] [Impact Index Per Article: 82.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Accepted: 10/07/2016] [Indexed: 11/13/2022] Open
Abstract
The ProteomeXchange (PX) Consortium of proteomics resources (http://www.proteomexchange.org) was formally started in 2011 to standardize data submission and dissemination of mass spectrometry proteomics data worldwide. We give an overview of the current consortium activities and describe the advances of the past few years. Augmenting the PX founding members (PRIDE and PeptideAtlas, including the PASSEL resource), two new members have joined the consortium: MassIVE and jPOST. ProteomeCentral remains as the common data access portal, providing the ability to search for data sets in all participating PX resources, now with enhanced data visualization components. We describe the updated submission guidelines, now expanded to include four members instead of two. As demonstrated by data submission statistics, PX is supporting a change in culture of the proteomics field: public data sharing is now an accepted standard, supported by requirements for journal submissions resulting in public data release becoming the norm. More than 4500 data sets have been submitted to the various PX resources since 2012. Human is the most represented species with approximately half of the data sets, followed by some of the main model organisms and a growing list of more than 900 diverse species. Data reprocessing activities are becoming more prominent, with both MassIVE and PeptideAtlas releasing the results of reprocessed data sets. Finally, we outline the upcoming advances for ProteomeXchange.
Collapse
Affiliation(s)
| | - Attila Csordas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Zhi Sun
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Andrew Jarnuczak
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Tobias Ternent
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Manuel Bernal-Llinares
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | - Shin Kawano
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa 277-0871, Japan
| | | | - Jeremy J Carver
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Mingxun Wang
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.,National Center for Protein Sciences, Beijing, China
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
27
|
Li Y, Wang X, Cho JH, Shaw TI, Wu Z, Bai B, Wang H, Zhou S, Beach TG, Wu G, Zhang J, Peng J. JUMPg: An Integrative Proteogenomics Pipeline Identifying Unannotated Proteins in Human Brain and Cancer Cells. J Proteome Res 2016; 15:2309-20. [PMID: 27225868 DOI: 10.1021/acs.jproteome.6b00344] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Proteogenomics is an emerging approach to improve gene annotation and interpretation of proteomics data. Here we present JUMPg, an integrative proteogenomics pipeline including customized database construction, tag-based database search, peptide-spectrum match filtering, and data visualization. JUMPg creates multiple databases of DNA polymorphisms, mutations, splice junctions, partially trypticity, as well as protein fragments translated from the whole transcriptome in all six frames upon RNA-seq de novo assembly. We use a multistage strategy to search these databases sequentially, in which the performance is optimized by re-searching only unmatched high-quality spectra and reusing amino acid tags generated by the JUMP search engine. The identified peptides/proteins are displayed with gene loci using the UCSC genome browser. Then, the JUMPg program is applied to process a label-free mass spectrometry data set of Alzheimer's disease postmortem brain, uncovering 496 new peptides of amino acid substitutions, alternative splicing, frame shift, and "non-coding gene" translation. The novel protein PNMA6BL specifically expressed in the brain is highlighted. We also tested JUMPg to analyze a stable-isotope labeled data set of multiple myeloma cells, revealing 991 sample-specific peptides that include protein sequences in the immunoglobulin light chain variable region. Thus, the JUMPg program is an effective proteogenomics tool for multiomics data integration.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Hong Wang
- Integrated Biomedical Sciences Program, University of Tennessee Health Science Center , 920 Madison Avenue, Memphis, Tennessee 38163, United States
| | | | - Thomas G Beach
- Banner Sun Health Research Institute , Sun City, Arizona 85351, United States
| | | | | | | |
Collapse
|
28
|
Paik YK, Omenn GS, Overall CM, Deutsch EW, Hancock WS. Recent Advances in the Chromosome-Centric Human Proteome Project: Missing Proteins in the Spot Light. J Proteome Res 2016; 14:3409-14. [PMID: 26337862 DOI: 10.1021/acs.jproteome.5b00785] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Young-Ki Paik
- Yonsei Proteome Research Center, Yonsei University , Seoul 120-749, Korea
| | - Gilbert S Omenn
- Center for Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, Michigan 48109, United States.,Yonsei Proteome Research Center, Yonsei University , Seoul 120-749, Korea
| | - Christopher M Overall
- Department of Biochemistry and Molecular Biology, University of British Columbia , Vancouver, British Columbia V6T 1Z3, Canada.,Yonsei Proteome Research Center, Yonsei University , Seoul 120-749, Korea
| | - Eric W Deutsch
- Institute for Systems Biology , Seattle, Washington 98109, United States.,Yonsei Proteome Research Center, Yonsei University , Seoul 120-749, Korea
| | - William S Hancock
- Department of Chemical Biology, Northeastern University , Boston, Massachusetts 02115, United States.,Yonsei Proteome Research Center, Yonsei University , Seoul 120-749, Korea
| |
Collapse
|
29
|
Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow. Nat Commun 2016; 7:11778. [PMID: 27250503 PMCID: PMC4895710 DOI: 10.1038/ncomms11778] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 04/28/2016] [Indexed: 12/16/2022] Open
Abstract
Complete annotation of the human genome is indispensable for medical research. The GENCODE consortium strives to provide this, augmenting computational and experimental evidence with manual annotation. The rapidly developing field of proteogenomics provides evidence for the translation of genes into proteins and can be used to discover and refine gene models. However, for both the proteomics and annotation groups, there is a lack of guidelines for integrating this data. Here we report a stringent workflow for the interpretation of proteogenomic data that could be used by the annotation community to interpret novel proteogenomic evidence. Based on reprocessing of three large-scale publicly available human data sets, we show that a conservative approach, using stringent filtering is required to generate valid identifications. Evidence has been found supporting 16 novel protein-coding genes being added to GENCODE. Despite this many peptide identifications in pseudogenes cannot be annotated due to the absence of orthogonal supporting evidence. Identifying and annotating functional elements in the human genome remains a challenging but important task. Here the authors propose a priority annotation score to rank identifications and suggest how proteogenomics evidence can be interpreted and what additional information substantiates protein-coding potential for annotation.
Collapse
|
30
|
Hanash S, Taguchi A, Wang H, Ostrin EJ. Deciphering the complexity of the cancer proteome for diagnostic applications. Expert Rev Mol Diagn 2016; 16:399-405. [PMID: 26694525 DOI: 10.1586/14737159.2016.1135738] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The proteome is the most functional component encoded in the genome, yet most features of the proteome that are deregulated in cancer cannot be predicted from genomic analysis alone. These include post-translational modifications (PTMs), sub-cellular localization, networks and circuitry, formation of complexes, and functional activity, all of which could play a role or be affected as part of tumorigenesis. Thus, there is a substantial opportunity to elucidate protein alterations in cancer and to translate knowledge into diagnostics and therapeutics. The progress made in mining the cancer proteome for diagnostic applications and the path forward are herein reviewed.
Collapse
Affiliation(s)
- Samir Hanash
- a Department of Clinical Cancer Prevention , University of Texas MD Anderson Cancer Center , Houston , Texas , US
| | - Ayumu Taguchi
- b Department of Translational Molecular Pathology , University of Texas MD Anderson Cancer Center , Houston , Texas , US
| | - Hong Wang
- a Department of Clinical Cancer Prevention , University of Texas MD Anderson Cancer Center , Houston , Texas , US
| | - Edwin J Ostrin
- c Department of Pulmonary Medicine , University of Texas MD Anderson Cancer Center , Houston , Texas , US
| |
Collapse
|