1
|
Raj A, Aggarwal S, Singh P, Yadav AK, Dash D. PgxSAVy: A tool for comprehensive evaluation of variant peptide quality in proteogenomics - catching the (un)usual suspects. Comput Struct Biotechnol J 2024; 23:711-722. [PMID: 38292474 PMCID: PMC10825656 DOI: 10.1016/j.csbj.2023.12.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/19/2023] [Accepted: 12/23/2023] [Indexed: 02/01/2024] Open
Abstract
Variant peptides resulting from single nucleotide polymorphisms (SNPs) can lead to aberrant protein functions and have translational potential for disease diagnosis and personalized therapy. Variant peptides detected by proteogenomics are fraught with high number of false positives, but there is no uniform and comprehensive approach to assess variant quality across analysis pipelines. Despite class-specific FDR along with ad-hoc filters, the problem is far from solved. These protocols are typically manual and tedious, and thus not uniform across labs. We demonstrate that variant peptide rescoring, integrated with intensity, variant event information and search result features, allows better discrimination of correct variant peptides. Implemented into PgxSAVy - a tool for quality control of variant peptides, this method can tackle the high rate of false positives. PgxSAVy provides a rigorous framework for quality control and annotations of variant peptides on the basis of (i) variant quality, (ii) isobaric masses, and (iii) disease annotation. PgxSAVy demonstrated high accuracy by identifying true variants with 98.43% accuracy on simulated data. Large-scale proteogenomic reanalysis of ∼2.8 million spectra (PXD004010 and PXD001468) resulted in 12,705 variant peptide spectrum matches (PSMs), of which PgxSAVy evaluated 3028 (23.8%), 1409 (11.1%) and 8268 (65.1%) as confident, semi-confident and doubtful respectively. PgxSAVy also annotates the variants based on their pathogenicity and provides support for assisted manual validation. The analysis of proteins carrying variants can provide fine granularity in discovering important pathways. PgxSAVy will advance personalized medicine by providing a comprehensive framework for quality control and prioritization of proteogenomics variants. PgxSAVy is freely available at https://pgxsavy.igib.res.in/ as a webserver and https://github.com/anuragraj/PgxSAVy as a stand-alone tool.
Collapse
Affiliation(s)
- Anurag Raj
- G. N. Ramachandran Knowledge Centre for Genomics Informatics, CSIR – Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Suruchi Aggarwal
- Computational and Mathematical Biology Centre (CMBC), 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
- Centre for Drug Discovery (CDD), 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
- Centre for Microbial Research (CMR), Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
| | - Prateek Singh
- G. N. Ramachandran Knowledge Centre for Genomics Informatics, CSIR – Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Amit Kumar Yadav
- Computational and Mathematical Biology Centre (CMBC), 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
- Centre for Drug Discovery (CDD), 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
- Centre for Microbial Research (CMR), Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, Haryana 121001, India
| | - Debasis Dash
- G. N. Ramachandran Knowledge Centre for Genomics Informatics, CSIR – Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| |
Collapse
|
2
|
Liang Y, Lv D, Liu K, Yang L, Shu H, Wen L, Lv C, Sun Q, Yin J, Liu H, Xu J, Liu Z, Ding N. MicroProteinDB: A database to provide knowledge on sequences, structures and function of ncRNA-derived microproteins. Comput Biol Med 2024; 177:108660. [PMID: 38820774 DOI: 10.1016/j.compbiomed.2024.108660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/08/2024] [Accepted: 05/26/2024] [Indexed: 06/02/2024]
Abstract
Omics-based technologies have revolutionized our comprehension of microproteins encoded by ncRNAs, revealing their abundant presence and pivotal roles within complex functional landscapes. Here, we developed MicroProteinDB (http://bio-bigdata.hrbmu.edu.cn/MicroProteinDB), which offers and visualizes the extensive knowledge to aid retrieval and analysis of computationally predicted and experimentally validated microproteins originating from various ncRNA types. Employing prediction algorithms grounded in diverse deep learning approaches, MicroProteinDB comprehensively documents the fundamental physicochemical properties, secondary and tertiary structures, interactions with functional proteins, family domains, and inter-species conservation of microproteins. With five major analytical modules, it will serve as a valuable knowledge for investigating ncRNA-derived microproteins.
Collapse
Affiliation(s)
- Yinan Liang
- The First Affiliated Hospital, Harbin Medical University, Harbin, 150001, China
| | - Dezhong Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Kefan Liu
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150081, China
| | - Liting Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Huan Shu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Luan Wen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Chongwen Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Qisen Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Jiaqi Yin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Hui Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Juan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| | - Zhigang Liu
- Affiliated Foshan Maternity&Child Healthcare Hospital, Southern Medical University, Guangzhou, 510000, China.
| | - Na Ding
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
3
|
Zhou W, Su M, Jiang T, Yang Q, Sun Q, Xu K, Shi J, Yang C, Ding N, Li Y, Xu J. SORC: an integrated spatial omics resource in cancer. Nucleic Acids Res 2024; 52:D1429-D1437. [PMID: 37811897 PMCID: PMC10768140 DOI: 10.1093/nar/gkad820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 08/31/2023] [Accepted: 09/20/2023] [Indexed: 10/10/2023] Open
Abstract
The interactions between tumor cells and the microenvironment play pivotal roles in the initiation, progression and metastasis of cancer. The advent of spatial transcriptomics data offers an opportunity to unravel the intricate dynamics of cellular states and cell-cell interactions in cancer. Herein, we have developed an integrated spatial omics resource in cancer (SORC, http://bio-bigdata.hrbmu.edu.cn/SORC), which interactively visualizes and analyzes the spatial transcriptomics data in cancer. We manually curated currently available spatial transcriptomics datasets for 17 types of cancer, comprising 722 899 spots across 269 slices. Furthermore, we matched reference single-cell RNA sequencing data in the majority of spatial transcriptomics datasets, involving 334 379 cells and 46 distinct cell types. SORC offers five major analytical modules that address the primary requirements of spatial transcriptomics analysis, including slice annotation, identification of spatially variable genes, co-occurrence of immune cells and tumor cells, functional analysis and cell-cell communications. All these spatial transcriptomics data and in-depth analyses have been integrated into easy-to-browse and explore pages, visualized through intuitive tables and various image formats. In summary, SORC serves as a valuable resource for providing an unprecedented spatially resolved cellular map of cancer and identifying specific genes and functional pathways to enhance our understanding of the tumor microenvironment.
Collapse
Affiliation(s)
- Weiwei Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| | - Minghai Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| | - Tiantongfei Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| | - Qingyi Yang
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| | - Qisen Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| | - Kang Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| | - Jingyi Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| | - Changbo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| | - Na Ding
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| | - Yongsheng Li
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| | - Juan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, Heilongjiang Province, China
| |
Collapse
|
4
|
Lv D, Li D, Cai Y, Guo J, Chu S, Yu J, Liu K, Jiang T, Ding N, Jin X, Li Y, Xu J. CancerProteome: a resource to functionally decipher the proteome landscape in cancer. Nucleic Acids Res 2024; 52:D1155-D1162. [PMID: 37823596 PMCID: PMC10767844 DOI: 10.1093/nar/gkad824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 09/07/2023] [Accepted: 09/20/2023] [Indexed: 10/13/2023] Open
Abstract
Advancements in mass spectrometry (MS)-based proteomics have greatly facilitated the large-scale quantification of proteins and microproteins, thereby revealing altered signalling pathways across many different cancer types. However, specialized and comprehensive resources are lacking for cancer proteomics. Here, we describe CancerProteome (http://bio-bigdata.hrbmu.edu.cn/CancerProteome), which functionally deciphers and visualizes the proteome landscape in cancer. We manually curated and re-analyzed publicly available MS-based quantification and post-translational modification (PTM) proteomes, including 7406 samples from 21 different cancer types, and also examined protein abundances and PTM levels in 31 120 proteins and 4111 microproteins. Six major analytical modules were developed with a view to describe protein contributions to carcinogenesis using proteome analysis, including conventional analyses of quantitative and the PTM proteome, functional enrichment, protein-protein associations by integrating known interactions with co-expression signatures, drug sensitivity and clinical relevance analyses. Moreover, protein abundances, which correlated with corresponding transcript or PTM levels, were evaluated. CancerProteome is convenient as it allows users to access specific proteins/microproteins of interest using quick searches or query options to generate multiple visualization results. In summary, CancerProteome is an important resource, which functionally deciphers the cancer proteome landscape and provides a novel insight for the identification of tumor protein markers in cancer.
Collapse
Affiliation(s)
- Dezhong Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| | - Donghao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| | - Yangyang Cai
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| | - Jiyu Guo
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| | - Sen Chu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| | - Jiaxin Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| | - Kefan Liu
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| | - Tiantongfei Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| | - Na Ding
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| | - Xiyun Jin
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang Province 150000, China
| | - Yongsheng Li
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| | - Juan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province 150081, China
| |
Collapse
|
5
|
Cai Y, Lv D, Li D, Yin J, Ma Y, Luo Y, Fu L, Ding N, Li Y, Pan Z, Li X, Xu J. IEAtlas: an atlas of HLA-presented immune epitopes derived from non-coding regions. Nucleic Acids Res 2022; 51:D409-D417. [PMID: 36099422 PMCID: PMC9825419 DOI: 10.1093/nar/gkac776] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 08/21/2022] [Accepted: 08/29/2022] [Indexed: 01/29/2023] Open
Abstract
Cancer-related epitopes can engage the immune system against tumor cells, thus exploring epitopes derived from non-coding regions is emerging as a fascinating field in cancer immunotherapies. Here, we described a database, IEAtlas (http://bio-bigdata.hrbmu.edu.cn/IEAtlas), which aims to provide and visualize the comprehensive atlas of human leukocyte antigen (HLA)-presented immunogenic epitopes derived from non-coding regions. IEAtlas reanalyzed publicly available mass spectrometry-based HLA immunopeptidome datasets against our integrated benchmarked non-canonical open reading frame information. The current IEAtlas identified 245 870 non-canonical epitopes binding to HLA-I/II allotypes across 15 cancer types and 30 non-cancerous tissues, greatly expanding the cancer immunopeptidome. IEAtlas further evaluates the immunogenicity via several commonly used immunogenic features, including HLA binding affinity, stability and T-cell receptor recognition. In addition, IEAtlas provides the biochemical properties of epitopes as well as the clinical relevance of corresponding genes across major cancer types and normal tissues. Several flexible tools were also developed to aid retrieval and to analyze the epitopes derived from non-coding regions. Overall, IEAtlas will serve as a valuable resource for investigating the immunogenic capacity of non-canonical epitopes and the potential as therapeutic cancer vaccines.
Collapse
Affiliation(s)
| | | | | | - Jiaqi Yin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Yingying Ma
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Ya Luo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Limei Fu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Na Ding
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Yongsheng Li
- Correspondence may also be addressed to Yongsheng Li.
| | - Zhenwei Pan
- Correspondence may also be addressed to Zhenwei Pan.
| | - Xia Li
- Correspondence may also be addressed to Xia Li.
| | - Juan Xu
- To whom correspondence should be addressed. Tel: +86 13654559904;
| |
Collapse
|