1
|
Zhu C, Liu LY, Ha A, Yamaguchi TN, Zhu H, Hugh-White R, Livingstone J, Patel Y, Kislinger T, Boutros PC. moPepGen: Rapid and Comprehensive Identification of Non-canonical Peptides. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.28.587261. [PMID: 38585946 PMCID: PMC10996593 DOI: 10.1101/2024.03.28.587261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Gene expression is a multi-step transformation of biological information from its storage form (DNA) into functional forms (protein and some RNAs). Regulatory activities at each step of this transformation multiply a single gene into a myriad of proteoforms. Proteogenomics is the study of how genomic and transcriptomic variation creates this proteomic diversity, and is limited by the challenges of modeling the complexities of gene-expression. We therefore created moPepGen, a graph-based algorithm that comprehensively generates non-canonical peptides in linear time. moPepGen works with multiple technologies, in multiple species and on all types of genetic and transcriptomic data. In human cancer proteomes, it enumerates previously unobservable noncanonical peptides arising from germline and somatic genomic variants, noncoding open reading frames, RNA fusions and RNA circularization. By enabling efficient detection and quantitation of previously hidden proteins in both existing and new proteomic data, moPepGen facilitates all proteogenomics applications. It is available at: https://github.com/uclahs-cds/package-moPepGen.
Collapse
Affiliation(s)
- Chenghao Zhu
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
- Department of Urology, University of California, Los Angeles, CA, USA
| | - Lydia Y. Liu
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Annie Ha
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Takafumi N. Yamaguchi
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Helen Zhu
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Rupert Hugh-White
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Julie Livingstone
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Yash Patel
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Thomas Kislinger
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Paul C. Boutros
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
- Department of Urology, University of California, Los Angeles, CA, USA
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
| |
Collapse
|
2
|
Kwok N, Aretz Z, Takao S, Ser Z, Cifani P, Kentsis A. Integrative Proteogenomics Using ProteomeGenerator2. J Proteome Res 2023; 22:2750-2764. [PMID: 37418425 PMCID: PMC10783198 DOI: 10.1021/acs.jproteome.3c00005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/09/2023]
Abstract
Recent advances in nucleic acid sequencing now permit rapid and genome-scale analysis of genetic variation and transcription, enabling population-scale studies of human biology, disease, and diverse organisms. Likewise, advances in mass spectrometry proteomics now permit highly sensitive and accurate studies of protein expression at the whole proteome-scale. However, most proteomic studies rely on consensus databases to match spectra to peptide and protein sequences, and thus remain limited to the analysis of canonical protein sequences. Here, we develop ProteomeGenerator2 (PG2), based on the scalable and modular ProteomeGenerator framework. PG2 integrates genome and transcriptome sequencing to incorporate protein variants containing amino acid substitutions, insertions, and deletions, as well as noncanonical reading frames, exons, and other variants caused by genomic and transcriptomic variation. We benchmarked PG2 using synthetic data and genomic, transcriptomic, and proteomic analysis of human leukemia cells. PG2 can be integrated with current and emerging sequencing technologies, assemblers, variant callers, and mass spectral analysis algorithms, and is available open-source from https://github.com/kentsisresearchgroup/ProteomeGenerator2.
Collapse
Affiliation(s)
- Nathaniel Kwok
- Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY
- Doctor of Medicine Program, Weill Cornell Medicine, New York, NY
- Department of Graduate Medical Education, HCA TriStar-Centennial Medical Center, Nashville, TN
| | - Zita Aretz
- Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY
- Physiology Biophysics and Systems Biology Program, Weill Cornell Graduate School, Cornell University, New York, NY
| | - Sumiko Takao
- Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY
- Tow Center for Developmental Oncology, Department of Pediatrics, Memorial Sloan Kettering Cancer Center New York, NY
| | - Zheng Ser
- Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Paolo Cifani
- Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Alex Kentsis
- Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY
- Tow Center for Developmental Oncology, Department of Pediatrics, Memorial Sloan Kettering Cancer Center New York, NY
- Departments of Pediatrics, Pharmacology, and Physiology & Biophysics, Weill Cornell Medical College, Cornell University, New York, NY
| |
Collapse
|
3
|
Kwok N, Aretz Z, Takao S, Ser Z, Cifani P, Kentsis A. Integrative proteogenomics using ProteomeGenerator2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.04.522774. [PMID: 36711693 PMCID: PMC9882001 DOI: 10.1101/2023.01.04.522774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Recent advances in nucleic acid sequencing now permit rapid and genome-scale analysis of genetic variation and transcription, enabling population-scale studies of human biology, disease, and diverse organisms. Likewise, advances in mass spectrometry proteomics now permit highly sensitive and accurate studies of protein expression at the whole proteome-scale. However, most proteomic studies rely on consensus databases to match spectra to peptide and proteins sequences, and thus remain limited to the analysis of canonical protein sequences. Here, we develop ProteomeGenerator2 (PG2), based on the scalable and modular ProteomeGenerator framework. PG2 integrates genome and transcriptome sequencing to incorporate protein variants containing amino acid substitutions, insertions, and deletions, as well as non-canonical reading frames, exons, and other variants caused by genomic and transcriptomic variation. We benchmarked PG2 using synthetic data and genomic, transcriptomic, and proteomic analysis of human leukemia cells. PG2 can be integrated with current and emerging sequencing technologies, assemblers, variant callers, and mass spectral analysis algorithms, and is available open-source from https://github.com/kentsisresearchgroup/ProteomeGenerator2 .
Collapse
|
4
|
Mokou M, Narayanasamy S, Stroggilos R, Balaur IA, Vlahou A, Mischak H, Frantzi M. A Drug Repurposing Pipeline Based on Bladder Cancer Integrated Proteotranscriptomics Signatures. Methods Mol Biol 2023; 2684:59-99. [PMID: 37410228 DOI: 10.1007/978-1-0716-3291-8_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
Delivering better care for patients with bladder cancer (BC) necessitates the development of novel therapeutic strategies that address both the high disease heterogeneity and the limitations of the current therapeutic modalities, such as drug low efficacy and patient resistance acquisition. Drug repurposing is a cost-effective strategy that targets the reuse of existing drugs for new therapeutic purposes. Such a strategy could open new avenues toward more effective BC treatment. BC patients' multi-omics signatures can be used to guide the investigation of existing drugs that show an effective therapeutic potential through drug repurposing. In this book chapter, we present an integrated multilayer approach that includes cross-omics analyses from publicly available transcriptomics and proteomics data derived from BC tissues and cell lines that were investigated for the development of disease-specific signatures. These signatures are subsequently used as input for a signature-based repurposing approach using the Connectivity Map (CMap) tool. We further explain the steps that may be followed to identify and select existing drugs of increased potential for repurposing in BC patients.
Collapse
Affiliation(s)
- Marika Mokou
- Department of Biomarker Research, Mosaiques Diagnostics, Hannover, Germany.
| | - Shaman Narayanasamy
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Rafael Stroggilos
- Systems Biology Center, Biomedical Research Foundation, Academy of Athens, Athens, Greece
| | - Irina-Afrodita Balaur
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Antonia Vlahou
- Systems Biology Center, Biomedical Research Foundation, Academy of Athens, Athens, Greece
| | - Harald Mischak
- Department of Biomarker Research, Mosaiques Diagnostics, Hannover, Germany
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow, UK
| | - Maria Frantzi
- Department of Biomarker Research, Mosaiques Diagnostics, Hannover, Germany
| |
Collapse
|
5
|
Proteotranscriptomics - A facilitator in omics research. Comput Struct Biotechnol J 2022; 20:3667-3675. [PMID: 35891789 PMCID: PMC9293588 DOI: 10.1016/j.csbj.2022.07.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 07/04/2022] [Accepted: 07/04/2022] [Indexed: 11/26/2022] Open
Abstract
Applications in omics research, such as comparative transcriptomics and proteomics, require the knowledge of the species-specific gene sequence and benefit from a comprehensive high-quality annotation of the coding genes to achieve high coverage. While protein-coding genes can in simple cases be detected by scanning the genome for open reading frames, in more complex genomes exonic sequences are separated by introns. Despite advances in sequencing technologies that allow for ever-growing numbers of genomes, the quality of many of the provided genome assemblies do not reach reference quality. These non-contiguous assemblies with gaps and the necessity to predict splice sites limit accurate gene annotation from solely genomic data. In contrast, the transcriptome only contains transcribed gene regions, is devoid of introns and thus provides the optimal basis for the identification of open reading frames. The additional integration of proteomics data to validate predicted protein-coding genes further enriches for accurate gene models. This review outlines the principles of the proteotranscriptomics approach, discusses common challenges and suggests methods for improvement.
Collapse
|
6
|
Na S, Choi H, Paek E. Deephos: Predicted spectral database search for TMT-labeled phosphopeptides and its false discovery rate estimation. Bioinformatics 2022; 38:2980-2987. [PMID: 35441674 DOI: 10.1093/bioinformatics/btac280] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 03/26/2022] [Accepted: 04/14/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Tandem mass tag (TMT)-based tandem mass spectrometry (MS/MS) has become the method of choice for the quantification of post-translational modifications in complex mixtures. Many cancer proteogenomic studies have highlighted the importance of large-scale phosphopeptide quantification coupled with TMT labeling. Herein, we propose a predicted Spectral DataBase (pSDB) search strategy called Deephos that can improve both sensitivity and specificity in identifying MS/MS spectra of TMT-labeled phosphopeptides. RESULTS With deep learning-based fragment ion prediction, we compiled a pSDB of TMT-labeled phosphopeptides generated from ∼8,000 human phosphoproteins annotated in UniProt. Deep learning could successfully recognize the fragmentation patterns altered by both TMT labeling and phosphorylation. In addition, we discuss the decoy spectra for false discovery rate (FDR) estimation in the pSDB search. We show that FDR could be inaccurately estimated by the existing decoy spectra generation methods and propose an innovative method to generate decoy spectra for more accurate FDR estimation. The utilities of Deephos were demonstrated in multi-stage analyses (coupled with database searches) of glioblastoma, acute myeloid leukemia, and breast cancer phosphoproteomes. AVAILABILITY Deephos pSDB and the search software are available at https://github.com/seungjinna/deephos.
Collapse
Affiliation(s)
- Seungjin Na
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea
| | - Hyunjin Choi
- Department of Automotive Engineering, Hanyang University, Seoul, 04763, Republic of Korea
| | - Eunok Paek
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea.,Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| |
Collapse
|
7
|
Hari PS, Balakrishnan L, Kotyada C, Everad John A, Tiwary S, Shah N, Sirdeshmukh R. Proteogenomic Analysis of Breast Cancer Transcriptomic and Proteomic Data, Using De Novo Transcript Assembly: Genome-Wide Identification of Novel Peptides and Clinical Implications. Mol Cell Proteomics 2022; 21:100220. [PMID: 35227895 PMCID: PMC9020135 DOI: 10.1016/j.mcpro.2022.100220] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 01/16/2022] [Accepted: 02/24/2022] [Indexed: 11/30/2022] Open
Abstract
We have carried out proteogenomic analysis of the breast cancer transcriptomic and proteomic data, available at The Clinical Proteomic Tumor Analysis Consortium resource, to identify novel peptides arising from alternatively spliced events as well as other noncanonical expressions. We used a pipeline that consisted of de novo transcript assembly, six frame-translated custom database, and a combination of search engines to identify novel peptides. A portfolio of 4,387 novel peptide sequences initially identified was further screened through PepQuery validation tool (Clinical Proteomic Tumor Analysis Consortium), which yielded 1,558 novel peptides. We considered the dataset of 1,558 validated through PepQuery to understand their functional and clinical significance, leaving the rest to be further verified using other validation tools and approaches. The novel peptides mapped to the known gene sequences as well as to genomic regions yet undefined for translation, 580 novel peptides mapped to known protein-coding genes, 147 to non–protein-coding genes, and 831 belonged to novel translational sequences. The novel peptides belonging to protein-coding genes represented alternatively spliced events or 5′ or 3′ extensions, whereas others represented translation from pseudogenes, long noncoding RNAs, or novel peptides originating from uncharacterized protein-coding sequences—mostly from the intronic regions of known genes. Seventy-six of the 580 protein-coding genes were associated with cancer hallmark genes, which included key oncogenes, transcription factors, kinases, and cell surface receptors. Survival association analysis of the 76 novel peptide sequences revealed 10 of them to be significant, and we present a panel of six novel peptides, whose high expression was found to be strongly associated with poor survival of patients with human epidermal growth factor receptor 2–enriched subtype. Our analysis represents a landscape of novel peptides of different types that may be expressed in breast cancer tissues, whereas their presence in full-length functional proteins needs further investigations. Novel protein variants and peptides from noncoding sequences are rapidly emerging. Mining of mass spectrometry data using proteogenomic analysis reveals such entities. Novel peptides from coding and noncoding sequences identified in breast cancer. Novel peptides mapped to cancer hallmark genes in breast cancer. Panel of novel peptides with prognostic potential found for HER2-enriched subtype.
Collapse
Affiliation(s)
- P S Hari
- Mazumdar Shaw Center for Translational Research, Narayana Health, Bangalore, India
| | - Lavanya Balakrishnan
- Mazumdar Shaw Center for Translational Research, Narayana Health, Bangalore, India
| | - Chaithanya Kotyada
- Mazumdar Shaw Center for Translational Research, Narayana Health, Bangalore, India
| | | | - Shivani Tiwary
- Simulation and Modeling Sciences, Pfizer Pharma GmBH, Berlin, Germany
| | - Nameeta Shah
- Mazumdar Shaw Center for Translational Research, Narayana Health, Bangalore, India.
| | - Ravi Sirdeshmukh
- Mazumdar Shaw Center for Translational Research, Narayana Health, Bangalore, India; Institute of Bioinformatics, International Tech Park, Bangalore, India; Health Sciences, Manipal Academy of Higher Education, Manipal, India.
| |
Collapse
|
8
|
Melani RD, Gerbasi VR, Anderson LC, Sikora JW, Toby TK, Hutton JE, Butcher DS, Negrão F, Seckler HS, Srzentić K, Fornelli L, Camarillo JM, LeDuc RD, Cesnik AJ, Lundberg E, Greer JB, Fellers RT, Robey MT, DeHart CJ, Forte E, Hendrickson CL, Abbatiello SE, Thomas PM, Kokaji AI, Levitsky J, Kelleher NL. The Blood Proteoform Atlas: A reference map of proteoforms in human hematopoietic cells. Science 2022; 375:411-418. [PMID: 35084980 PMCID: PMC9097315 DOI: 10.1126/science.aaz5284] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Human biology is tightly linked to proteins, yet most measurements do not precisely determine alternatively spliced sequences or posttranslational modifications. Here, we present the primary structures of ~30,000 unique proteoforms, nearly 10 times more than in previous studies, expressed from 1690 human genes across 21 cell types and plasma from human blood and bone marrow. The results, compiled in the Blood Proteoform Atlas (BPA), indicate that proteoforms better describe protein-level biology and are more specific indicators of differentiation than their corresponding proteins, which are more broadly expressed across cell types. We demonstrate the potential for clinical application, by interrogating the BPA in the context of liver transplantation and identifying cell and proteoform signatures that distinguish normal graft function from acute rejection and other causes of graft dysfunction.
Collapse
Affiliation(s)
- Rafael D. Melani
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Vincent R. Gerbasi
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Lissa C. Anderson
- National High Magnetic Field Laboratory, Florida State University, Tallahassee, FL, USA
| | - Jacek W. Sikora
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Timothy K. Toby
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Josiah E. Hutton
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - David S. Butcher
- National High Magnetic Field Laboratory, Florida State University, Tallahassee, FL, USA
| | - Fernanda Negrão
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Henrique S. Seckler
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Kristina Srzentić
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Luca Fornelli
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Jeannie M. Camarillo
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Richard D. LeDuc
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Anthony J. Cesnik
- Department of Genetics, Stanford University, Stanford, CA, USA
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH-Royal Institute of Technology, Stockholm, Sweden
| | - Emma Lundberg
- Department of Genetics, Stanford University, Stanford, CA, USA
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH-Royal Institute of Technology, Stockholm, Sweden
| | - Joseph B. Greer
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Ryan T. Fellers
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Matthew T. Robey
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Caroline J. DeHart
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | - Eleonora Forte
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, USA
- Department of Surgery, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | | | | - Paul M. Thomas
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
| | | | - Josh Levitsky
- Comprehensive Transplant Center, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Neil L. Kelleher
- Department of Molecular Biosciences, Department of Chemistry, and the Feinberg School of Medicine, Northwestern University, Evanston, IL, USA
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
9
|
Halder A, Verma A, Biswas D, Srivastava S. Recent advances in mass-spectrometry based proteomics software, tools and databases. DRUG DISCOVERY TODAY. TECHNOLOGIES 2021; 39:69-79. [PMID: 34906327 DOI: 10.1016/j.ddtec.2021.06.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 05/08/2021] [Accepted: 06/21/2021] [Indexed: 01/12/2023]
Abstract
The field of proteomics immensely depends on data generation and data analysis which are thoroughly supported by software and databases. There has been a massive advancement in mass spectrometry-based proteomics over the last 10 years which has compelled the scientific community to upgrade or develop algorithms, tools, and repository databases in the field of proteomics. Several standalone software, and comprehensive databases have aided the establishment of integrated omics pipeline and meta-analysis workflow which has contributed to understand the disease pathobiology, biomarker discovery and predicting new therapeutic modalities. For shotgun proteomics where Data Dependent Acquisition is performed, several user-friendly software are developed that can analyse the pre-processed data to provide mechanistic insights of the disease. Likewise, in Data Independent Acquisition, pipelines are emerged which can accomplish the task from building the spectral library to identify the therapeutic targets. Furthermore, in the age of big data analysis the implications of machine learning and cloud computing are appending robustness, rapidness and in-depth proteomics data analysis. The current review talks about the recent advancement, and development of software, tools, and database in the field of mass-spectrometry based proteomics.
Collapse
Affiliation(s)
- Ankit Halder
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Ayushi Verma
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Deeptarup Biswas
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Sanjeeva Srivastava
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India.
| |
Collapse
|
10
|
Umer HM, Audain E, Zhu Y, Pfeuffer J, Sachsenberg T, Lehtiö J, Branca RM, Perez-Riverol Y. Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides. Bioinformatics 2021; 38:1470-1472. [PMID: 34904638 PMCID: PMC8825679 DOI: 10.1093/bioinformatics/btab838] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 12/07/2021] [Accepted: 12/10/2021] [Indexed: 01/06/2023] Open
Abstract
SUMMARY We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling including optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we have reanalyzed six public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to >5% of the total number of peptides identified. AVAILABILITY AND IMPLEMENTATION The software is freely available. pypgatk: https://github.com/bigbio/py-pgatk/ and pgdb: https://nf-co.re/pgdb. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Husen M Umer
- Department of Oncology‐Pathology, Science for Life Laboratory, Karolinska Institutet, Stockholm 17165, Sweden
| | - Enrique Audain
- Department of Congenital Heart Disease and Pediatric Cardiology, Universitätsklinikum Schleswig-Holstein Kiel, Kiel 24105, Germany
| | - Yafeng Zhu
- Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China
| | - Julianus Pfeuffer
- Algorithmic Bioinformatics, Freie Universität Berlin, Berlin 14195, Germany,Visualization and Data Analysis, Zuse Institute Berlin, Berlin 14195, Germany
| | - Timo Sachsenberg
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
| | - Janne Lehtiö
- Department of Oncology‐Pathology, Science for Life Laboratory, Karolinska Institutet, Stockholm 17165, Sweden
| | - Rui M Branca
- Department of Oncology‐Pathology, Science for Life Laboratory, Karolinska Institutet, Stockholm 17165, Sweden,To whom correspondence should be addressed. or
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK,To whom correspondence should be addressed. or
| |
Collapse
|
11
|
Scull KE, Pandey K, Ramarathinam SH, Purcell AW. Immunopeptidogenomics: Harnessing RNA-Seq to Illuminate the Dark Immunopeptidome. Mol Cell Proteomics 2021; 20:100143. [PMID: 34509645 PMCID: PMC8724885 DOI: 10.1016/j.mcpro.2021.100143] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 08/10/2021] [Accepted: 08/24/2021] [Indexed: 01/08/2023] Open
Abstract
Human leukocyte antigen (HLA) molecules are cell-surface glycoproteins that present peptide antigens on the cell surface for surveillance by T lymphocytes, which contemporaneously seek signs of disease. Mass spectrometric analysis allows us to identify large numbers of these peptides (the immunopeptidome) following affinity purification of solubilized HLA-peptide complexes. However, in recent years, there has been a growing awareness of the "dark side" of the immunopeptidome: unconventional peptide epitopes, including neoepitopes, which elude detection by conventional search methods because their sequences are not present in reference protein databases (DBs). Here, we establish a bioinformatics workflow to aid identification of peptides generated by noncanonical translation of mRNA or by genome variants. The workflow incorporates both standard transcriptomics software and novel computer programs to produce cell line-specific protein DBs based on three-frame translation of the transcriptome. The final protein DB also includes sequences resulting from variants determined by variant calling on the same RNA-Seq data. We then searched our experimental data against both transcriptome-based and standard DBs using PEAKS Studio (Bioinformatics Solutions, Inc). Finally, further novel software helps to compare the various result sets arising for each sample, pinpoint putative genomic origins for unconventional sequences, and highlight potential neoepitopes. We applied the workflow to study the immunopeptidome of the acute myeloid leukemia cell line THP-1, using RNA-Seq and immunopeptidome data. We confidently identified over 14,000 peptides from three replicates of purified HLA peptides derived from THP-1 cells using the conventional UniProt human proteome. Using the transcriptome-based DB generated using our workflow, we recapitulated >85% of these and also identified 1029 unconventional peptides not explained by UniProt, including 16 sequences caused by nonsynonymous variants. Our workflow, which we term "immunopeptidogenomics," can provide DBs, which include pertinent unconventional sequences and allow neoepitope discovery, without becoming too large to search. Immunopeptidogenomics is a step toward unbiased search approaches that are needed to illuminate the dark side of the immunopeptidome.
Collapse
Affiliation(s)
- Katherine E Scull
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Monash Biomedicine Discovery Institute, Monash University, Clayton, Victoria, Australia
| | - Kirti Pandey
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Monash Biomedicine Discovery Institute, Monash University, Clayton, Victoria, Australia
| | - Sri H Ramarathinam
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Monash Biomedicine Discovery Institute, Monash University, Clayton, Victoria, Australia.
| | - Anthony W Purcell
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Monash Biomedicine Discovery Institute, Monash University, Clayton, Victoria, Australia.
| |
Collapse
|
12
|
Cifani P, Li Z, Luo D, Grivainis M, Intlekofer AM, Fenyö D, Kentsis A. Discovery of Protein Modifications Using Differential Tandem Mass Spectrometry Proteomics. J Proteome Res 2021; 20:1835-1848. [PMID: 33749263 PMCID: PMC8341206 DOI: 10.1021/acs.jproteome.0c00638] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Recent studies have revealed diverse amino acid, post-translational, and noncanonical modifications of proteins in diverse organisms and tissues. However, their unbiased detection and analysis remain hindered by technical limitations. Here, we present a spectral alignment method for the identification of protein modifications using high-resolution mass spectrometry proteomics. Termed SAMPEI for spectral alignment-based modified peptide identification, this open-source algorithm is designed for the discovery of functional protein and peptide signaling modifications, without prior knowledge of their identities. Using synthetic standards and controlled chemical labeling experiments, we demonstrate its high specificity and sensitivity for the discovery of substoichiometric protein modifications in complex cellular extracts. SAMPEI mapping of mouse macrophage differentiation revealed diverse post-translational protein modifications, including distinct forms of cysteine itaconatylation. SAMPEI's robust parametrization and versatility are expected to facilitate the discovery of biological modifications of diverse macromolecules. SAMPEI is implemented as a Python package and is available open-source from BioConda and GitHub (https://github.com/FenyoLab/SAMPEI).
Collapse
Affiliation(s)
- Paolo Cifani
- Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10021, United States
| | - Zhi Li
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, New York 10016, United States
- Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, New York 10016, United States
| | - Danmeng Luo
- Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10021, United States
| | - Mark Grivainis
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, New York 10016, United States
| | - Andrew M Intlekofer
- Human Oncology & Pathogenesis Program and Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York 10021, United States
| | - David Fenyö
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, New York 10016, United States
- Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, New York 10016, United States
| | - Alex Kentsis
- Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10021, United States
- Tow Center for Developmental Oncology, Department of Pediatrics, Memorial Sloan Kettering Cancer Center, and Departments of Pediatrics, Pharmacology, and Physiology & Biophysics, Weill Medical College of Cornell University, New York, New York 10021, United States
| |
Collapse
|
13
|
Cao X, Xing J. PrecisionProDB: improving the proteomics performance for precision medicine. Bioinformatics 2021; 37:3361-3363. [PMID: 33787868 DOI: 10.1093/bioinformatics/btab218] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 03/06/2021] [Accepted: 03/30/2021] [Indexed: 01/03/2023] Open
Abstract
SUMMARY As the next-generation sequencing technology becomes broadly applied, genomics and transcriptomics are becoming more commonly used in both research and clinical settings. However, proteomics is still an obstacle to be conquered. For most peptide search programs in proteomics, a standard reference protein database is used. Because of the thousands of coding DNA variants in each individual, a standard reference database does not provide perfect match for many proteins/peptides of an individual. A personalized reference database can improve the detection power and accuracy for individual proteomics data. To connect genomics and proteomics, we designed a Python package PrecisionProDB that is specialized for generating a personized protein database for proteomics applications. PrecisionProDB supports multiple popular file formats and reference databases, and can generate a personized database in minutes. To demonstrate the application of PrecisionProDB, we generated human population-specific reference protein databases with PrecisionProDB, which improves the number of identified peptides by 0.34% on average. In addition, by incorporating cell line-specific variants into the protein database, we demonstrated a 0.71% improvement for peptide identification in the Jurkat cell line. With PrecisionProDB and these datasets, researchers and clinicians can improve their peptide search performance by adopting the more representative protein database or adding population and individual-specific proteins to the search database with minimum increase of efforts. AVAILABILITY PrecisionProDB and pre-calculated protein databases are freely available at https://github.com/ATPs/PrecisionProDB and https://github.com/ATPs/PrecisionProDB_references. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaolong Cao
- Department of Genetics, Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Jinchuan Xing
- Department of Genetics, Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| |
Collapse
|
14
|
Fasih Ramandi N, Faranoush M, Ghassempour A, Aboul-Enein HY. Mass Spectrometry: A Powerful Method for Monitoring Various Type of Leukemia, Especially MALDI-TOF in Leukemia's Proteomics Studies Review. Crit Rev Anal Chem 2021; 52:1259-1286. [PMID: 33499652 DOI: 10.1080/10408347.2021.1871844] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Recent success in studying the proteome, as a source of biomarkers, has completely changed our understanding of leukemia (blood cancer). The identification of differentially expressed proteins, such as relapse and drug resistance proteins involved in leukemia by using various ionization sources and mass analyzers of mass spectrometry techniques, has helped scientists find better diagnosis, prognosis, and treatment strategies. With the aid of this powerful analytical technique, we can investigate the qualification/quantification of proteins, protein-protein interactions, post-translational modifications, and find the correlation between proteins and their genes with the hope of finding the missing parts of the successful therapy puzzle. In this review, we followed different MS sources and analyzers which used for monitoring various type of leukemia, then focused on MALDI-TOF MS as a quick and reliable method for studying proteins. Due to several review published for other techniques, the present review is the first work in this field. Also, by classifying more than 400 proteins, we have found 42 proteins are involved in two or three different stages of leukemia. Finally, we have suggested six specific biomarkers for AML, one for ALL, three biomarkers with a role in the etiology of leukemia and 13 markers with the potential for further studies.
Collapse
Affiliation(s)
- Negin Fasih Ramandi
- Medicinal Plants and Drugs Research Institute, Shahid Beheshti University, Tehran, Iran
| | - Mohammad Faranoush
- Pediatric Growth and Development Research Center, Institute of Endocrinology, Iran University of Medical Sciences, Tehran, Iran
| | - Alireza Ghassempour
- Medicinal Plants and Drugs Research Institute, Shahid Beheshti University, Tehran, Iran
| | - Hassan Y Aboul-Enein
- Pharmaceutical and Medicinal Chemistry Department, Pharmaceutical and Drug Industries Research Division, National Research Center, Cairo, Egypt
| |
Collapse
|
15
|
Vitorino R, Guedes S, Trindade F, Correia I, Moura G, Carvalho P, Santos MAS, Amado F. De novo sequencing of proteins by mass spectrometry. Expert Rev Proteomics 2020; 17:595-607. [PMID: 33016158 DOI: 10.1080/14789450.2020.1831387] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. AREAS COVERED De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. EXPERT OPINION As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.
Collapse
Affiliation(s)
- Rui Vitorino
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal.,Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Sofia Guedes
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| | - Fabio Trindade
- Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Inês Correia
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Gabriela Moura
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Paulo Carvalho
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, FIOCRUZ, Laboratory for Proteomics and Protein Engineering , Brazil
| | - Manuel A S Santos
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Francisco Amado
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| |
Collapse
|
16
|
Cesnik AJ, Miller RM, Ibrahim K, Lu L, Millikin RJ, Shortreed MR, Frey BL, Smith LM. Spritz: A Proteogenomic Database Engine. J Proteome Res 2020; 20:1826-1834. [PMID: 32967423 DOI: 10.1021/acs.jproteome.0c00407] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Proteoforms are the workhorses of the cell, and subtle differences between their amino acid sequences or post-translational modifications (PTMs) can change their biological function. To most effectively identify and quantify proteoforms in genetically diverse samples by mass spectrometry (MS), it is advantageous to search the MS data against a sample-specific protein database that is tailored to the sample being analyzed, in that it contains the correct amino acid sequences and relevant PTMs for that sample. To this end, we have developed Spritz (https://smith-chem-wisc.github.io/Spritz/), an open-source software tool for generating protein databases annotated with sequence variations and PTMs. We provide a simple graphical user interface for Windows and scripts that can be run on any operating system. Spritz automatically sets up and executes approximately 20 tools, which enable the construction of a proteogenomic database from only raw RNA sequencing data. Sequence variations that are discovered in RNA sequencing data upon comparison to the Ensembl reference genome are annotated on proteins in these databases, and PTM annotations are transferred from UniProt. Modifications can also be discovered and added to the database using bottom-up mass spectrometry data and global PTM discovery in MetaMorpheus. We demonstrate that such sample-specific databases allow the identification of variant peptides, modified variant peptides, and variant proteoforms by searching bottom-up and top-down proteomic data from the Jurkat human T lymphocyte cell line and demonstrate the identification of phosphorylated variant sites with phosphoproteomic data from the U2OS human osteosarcoma cell line.
Collapse
Affiliation(s)
- Anthony J Cesnik
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.,Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm 17121, Sweden.,Department of Genetics, Stanford University, Stanford, California 94305, United States.,Chan Zuckerberg Biohub, San Francisco, California 94158, United States
| | - Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Khairina Ibrahim
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lei Lu
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Robert J Millikin
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Brian L Frey
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
17
|
Lau E, Han Y, Williams DR, Thomas CT, Shrestha R, Wu JC, Lam MPY. Splice-Junction-Based Mapping of Alternative Isoforms in the Human Proteome. Cell Rep 2020; 29:3751-3765.e5. [PMID: 31825849 PMCID: PMC6961840 DOI: 10.1016/j.celrep.2019.11.026] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 09/24/2019] [Accepted: 11/06/2019] [Indexed: 12/18/2022] Open
Abstract
The protein-level translational status and function of many alternative splicing events remain poorly understood. We use an RNA sequencing (RNA-seq)-guided proteomics method to identify protein alternative splicing isoforms in the human proteome by constructing tissue-specific protein databases that prioritize transcript splice junction pairs with high translational potential. Using the custom databases to reanalyze ~80 million mass spectra in public proteomics datasets, we identify more than 1,500 noncanonical protein isoforms across 12 human tissues, including ~400 sequences undocumented on TrEMBL and RefSeq databases. We apply the method to original quantitative mass spectrometry experiments and observe widespread isoform regulation during human induced pluripotent stem cell cardiomyocyte differentiation. On a proteome scale, alternative isoform regions overlap frequently with disordered sequences and post-translational modification sites, suggesting that alternative splicing may regulate protein function through modulating intrinsically disordered regions. The described approach may help elucidate functional consequences of alternative splicing and expand the scope of proteomics investigations in various systems. The translation and function of many alternative splicing events await confirmation at the protein level. Lau et al. use an integrated proteotranscriptomics approach to identify non-canonical and undocumented isoforms from 12 organs in the human proteome. Alternative isoforms interfere with functional sequence features and are differentially regulated during iPSC cardiomyocyte differentiation.
Collapse
Affiliation(s)
- Edward Lau
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Yu Han
- Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA; Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
| | - Damon R Williams
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Cody T Thomas
- Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
| | - Rajani Shrestha
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Joseph C Wu
- Stanford Cardiovascular Institute, Department of Medicine, Stanford University, Palo Alto, CA, USA; Department of Radiology, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Maggie P Y Lam
- Consortium for Fibrosis Research and Translation, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA; Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA.
| |
Collapse
|
18
|
Guo Q, Li D, Zhai Y, Gu Z. CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms. ACS OMEGA 2020; 5:15370-15384. [PMID: 32637811 PMCID: PMC7331046 DOI: 10.1021/acsomega.0c01278] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 06/09/2020] [Indexed: 06/11/2023]
Abstract
Protein reference databases are a critical part of producing efficient proteomic analyses. However, the method for constructing clean, efficient, and comprehensive protein reference databases of nonmodel organisms is lacking. Existing methods either do not have contamination control procedures, or these methods rely on a three-frame and/or six-frame translation that sharply increases the search space and the need for computational resources. Herein, we propose a framework for constructing a customized comprehensive proteomic reference database (CCPRD) from draft genomes and deep sequencing transcriptomes. Its effectiveness is demonstrated by incorporating the proteomes of nematocysts from endoparasitic cnidarian: myxozoans. By applying customized contamination removal procedures, contaminations in omic data were successfully identified and removed. This is an effective method that does not result in overdecontamination. This can be shown by comparing the CCPRD MS results with an artificially contaminated database and another database with removed contaminations in genomes and transcriptomes added back. CCPRD outperformed traditional frame-based methods by identifying 35.2-50.7% more peptides and 35.8-43.8% more proteins, with a maximum of 84.6% in size reduction. A BUSCO analysis showed that the CCPRD maintained a relatively high level of completeness compared to traditional methods. These results confirm the superiority of the CCPRD over existing methods in peptide and protein identification numbers, database size, and completeness. By providing a general framework for generating the reference database, the CCPRD, which does not need a high-quality genome, can potentially be applied to nonmodel organisms and significantly contribute to proteomic research.
Collapse
Affiliation(s)
- Qingxiang Guo
- Department of Aquatic
Animal Medicine, College of Fisheries, Huazhong
Agricultural University, Wuhan, Hubei Province 430070, PR China
- Hubei Engineering Technology Research
Center for Aquatic Animal Diseases Control and Prevention, Wuhan 430070, PR China
| | - Dan Li
- Department of Aquatic
Animal Medicine, College of Fisheries, Huazhong
Agricultural University, Wuhan, Hubei Province 430070, PR China
- Hubei Engineering Technology Research
Center for Aquatic Animal Diseases Control and Prevention, Wuhan 430070, PR China
| | - Yanhua Zhai
- Department of Aquatic
Animal Medicine, College of Fisheries, Huazhong
Agricultural University, Wuhan, Hubei Province 430070, PR China
- Hubei Engineering Technology Research
Center for Aquatic Animal Diseases Control and Prevention, Wuhan 430070, PR China
| | - Zemao Gu
- Department of Aquatic
Animal Medicine, College of Fisheries, Huazhong
Agricultural University, Wuhan, Hubei Province 430070, PR China
- Hubei Engineering Technology Research
Center for Aquatic Animal Diseases Control and Prevention, Wuhan 430070, PR China
| |
Collapse
|
19
|
Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis. Int J Mol Sci 2020; 21:ijms21082873. [PMID: 32326049 PMCID: PMC7216093 DOI: 10.3390/ijms21082873] [Citation(s) in RCA: 130] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 04/16/2020] [Accepted: 04/18/2020] [Indexed: 01/15/2023] Open
Abstract
Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the relationship between genotype and phenotype. Though many popular bioinformatics methods in proteomics are derived from other omics studies, novel analysis strategies are required to deal with the unique characteristics of proteomics data. In this review, we discuss the current developments in the bioinformatics methods used in proteomics and how they facilitate the mechanistic understanding of biological processes. We first introduce bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then we review the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies. We conclude with a discussion of how quantitative protein data can be used to reconstruct protein interactions and signaling networks.
Collapse
|
20
|
Vitrinel B, Koh HWL, Mujgan Kar F, Maity S, Rendleman J, Choi H, Vogel C. Exploiting Interdata Relationships in Next-generation Proteomics Analysis. Mol Cell Proteomics 2019; 18:S5-S14. [PMID: 31126983 PMCID: PMC6692783 DOI: 10.1074/mcp.mr118.001246] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 05/01/2019] [Indexed: 12/11/2022] Open
Abstract
Mass spectrometry based proteomics and other technologies have matured to enable routine quantitative, system-wide analysis of concentrations, modifications, and interactions of proteins, mRNAs, and other molecules. These studies have allowed us to move toward a new field concerned with mining information from the combination of these orthogonal data sets, perhaps called "integromics." We highlight examples of recent studies and tools that aim at relating proteomic information to mRNAs, genetic associations, and changes in small molecules and lipids. We argue that productive data integration differs from parallel acquisition and interpretation and should move toward quantitative modeling of the relationships between the data. These relationships might be expressed by temporal information retrieved from time series experiments, rate equations to model synthesis and degradation, or networks of causal, evolutionary, physical, and other interactions. We outline steps and considerations toward such integromic studies to exploit the synergy between data sets.
Collapse
Affiliation(s)
- Burcu Vitrinel
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY
| | - Hiromi W L Koh
- Department of Medicine, Yong Loo Lin School of Medicine, National University Singapore, Singapore; Institute of Molecular and Cell Biology, Agency for Science, Technology, and Research, Singapore
| | - Funda Mujgan Kar
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY
| | - Shuvadeep Maity
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY
| | - Justin Rendleman
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY
| | - Hyungwon Choi
- Department of Medicine, Yong Loo Lin School of Medicine, National University Singapore, Singapore; Institute of Molecular and Cell Biology, Agency for Science, Technology, and Research, Singapore
| | - Christine Vogel
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY.
| |
Collapse
|
21
|
Schaffer LV, Millikin RJ, Miller RM, Anderson LC, Fellers RT, Ge Y, Kelleher NL, LeDuc RD, Liu X, Payne SH, Sun L, Thomas PM, Tucholski T, Wang Z, Wu S, Wu Z, Yu D, Shortreed MR, Smith LM. Identification and Quantification of Proteoforms by Mass Spectrometry. Proteomics 2019; 19:e1800361. [PMID: 31050378 PMCID: PMC6602557 DOI: 10.1002/pmic.201800361] [Citation(s) in RCA: 135] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 04/07/2019] [Indexed: 12/29/2022]
Abstract
A proteoform is a defined form of a protein derived from a given gene with a specific amino acid sequence and localized post-translational modifications. In top-down proteomic analyses, proteoforms are identified and quantified through mass spectrometric analysis of intact proteins. Recent technological developments have enabled comprehensive proteoform analyses in complex samples, and an increasing number of laboratories are adopting top-down proteomic workflows. In this review, some recent advances are outlined and current challenges and future directions for the field are discussed.
Collapse
Affiliation(s)
- Leah V Schaffer
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Robert J Millikin
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lissa C Anderson
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Tallahassee, FL, 32310, USA
| | - Ryan T Fellers
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Department of Cell and Regenerative Biology and Human Proteomics Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Neil L Kelleher
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
- Department of Chemistry and Molecular Biosciences and the Division of Hematology and Oncology, Northwestern University, Evanston, IL, 60208, USA
| | - Richard D LeDuc
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN, 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Samuel H Payne
- Department of Biology, Brigham Young University, Provo, UT, 84602
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, East Lansing, MI, 48824, USA
| | - Paul M Thomas
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
| | - Trisha Tucholski
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Zhe Wang
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, 73019, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, 73019, USA
| | - Zhijie Wu
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Dahang Yu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, 73019, USA
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|