101
|
Noor Z, Ahn SB, Baker MS, Ranganathan S, Mohamedali A. Mass spectrometry-based protein identification in proteomics-a review. Brief Bioinform 2020; 22:1620-1638. [PMID: 32047889 DOI: 10.1093/bib/bbz163] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 11/05/2019] [Accepted: 11/21/2019] [Indexed: 12/21/2022] Open
Abstract
Statistically, accurate protein identification is a fundamental cornerstone of proteomics and underpins the understanding and application of this technology across all elements of medicine and biology. Proteomics, as a branch of biochemistry, has in recent years played a pivotal role in extending and developing the science of accurately identifying the biology and interactions of groups of proteins or proteomes. Proteomics has primarily used mass spectrometry (MS)-based techniques for identifying proteins, although other techniques including affinity-based identifications still play significant roles. Here, we outline the basics of MS to understand how data are generated and parameters used to inform computational tools used in protein identification. We then outline a comprehensive analysis of the bioinformatics and computational methodologies used in protein identification in proteomics including discussing the most current communally acceptable metrics to validate any identification.
Collapse
|
102
|
Loo LSW, Vethe H, Soetedjo AAP, Paulo JA, Jasmen J, Jackson N, Bjørlykke Y, Valdez IA, Vaudel M, Barsnes H, Gygi SP, Raeder H, Teo AKK, Kulkarni RN. Dynamic proteome profiling of human pluripotent stem cell-derived pancreatic progenitors. Stem Cells 2020; 38:542-555. [PMID: 31828876 DOI: 10.1002/stem.3135] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 11/15/2019] [Indexed: 12/25/2022]
Abstract
A comprehensive characterization of the molecular processes controlling cell fate decisions is essential to derive stable progenitors and terminally differentiated cells that are functional from human pluripotent stem cells (hPSCs). Here, we report the use of quantitative proteomics to describe early proteome adaptations during hPSC differentiation toward pancreatic progenitors. We report that the use of unbiased quantitative proteomics allows the simultaneous profiling of numerous proteins at multiple time points, and is a valuable tool to guide the discovery of signaling events and molecular signatures underlying cellular differentiation. We also monitored the activity level of pathways whose roles are pivotal in the early pancreas differentiation, including the Hippo signaling pathway. The quantitative proteomics data set provides insights into the dynamics of the global proteome during the transition of hPSCs from a pluripotent state toward pancreatic differentiation.
Collapse
Affiliation(s)
- Larry Sai Weng Loo
- Stem Cells and Diabetes Laboratory, Institute of Molecular and Cell Biology (IMCB), A*STAR, Singapore.,School of Biological Sciences, Nanyang Technological University (NTU), Singapore
| | - Heidrun Vethe
- Section of Islet Cell and Regenerative Biology, Joslin Diabetes Center, Harvard Medical School, Boston, Massachusetts.,KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway
| | | | - Joao A Paulo
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts
| | - Joanita Jasmen
- Stem Cells and Diabetes Laboratory, Institute of Molecular and Cell Biology (IMCB), A*STAR, Singapore
| | - Nicholas Jackson
- Section of Islet Cell and Regenerative Biology, Joslin Diabetes Center, Harvard Medical School, Boston, Massachusetts
| | - Yngvild Bjørlykke
- KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Ivan A Valdez
- Section of Islet Cell and Regenerative Biology, Joslin Diabetes Center, Harvard Medical School, Boston, Massachusetts
| | - Marc Vaudel
- Proteomics Unit (PROBE), Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Harald Barsnes
- KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway.,Proteomics Unit (PROBE), Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Steven P Gygi
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts
| | - Helge Raeder
- KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway.,Department of Pediatrics, Haukeland University Hospital, Bergen, Norway
| | - Adrian Kee Keong Teo
- Stem Cells and Diabetes Laboratory, Institute of Molecular and Cell Biology (IMCB), A*STAR, Singapore.,School of Biological Sciences, Nanyang Technological University (NTU), Singapore.,Departments of Biochemistry and Medicine, Yong Loo Lin School of Medicine, National University of Singapore (NUS), Singapore
| | - Rohit N Kulkarni
- Section of Islet Cell and Regenerative Biology, Joslin Diabetes Center, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
103
|
Abstract
Shotgun proteomics is the method of choice for large-scale protein identification. However, the use of a robust statistical workflow to validate such identification is mandatory to minimize false matches, ambiguities, and amplification of error rates from spectra to proteins. In this chapter we emphasize the key concepts to take into account when processing the output of a search engine to obtain reliable peptide or protein identifications. We assume that the reader is already familiar with tandem mass spectrometry so we can focus on the use of statistical confidence methods. After introducing the key concepts we present different software tools and how to use them with an example dataset.
Collapse
Affiliation(s)
- Gorka Prieto
- Department of Communications Engineering, Faculty of Engineering of Bilbao, University of the Basque Country (UPV/EHU), Bilbao, Spain.
| | - Jesús Vázquez
- Laboratory of Cardiovascular Proteomics, Centro Nacional de Investigaciones Cardiovasculares (CNIC) and CIBER de Enfermedades Cardiovasculares (CIBERCV), Madrid, Spain
| |
Collapse
|
104
|
Deb B, George IA, Sharma J, Kumar P. Phosphoproteomics Profiling to Identify Altered Signaling Pathways and Kinase-Targeted Cancer Therapies. Methods Mol Biol 2020; 2051:241-264. [PMID: 31552632 DOI: 10.1007/978-1-4939-9744-2_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Phosphorylation is one of the most extensively studied posttranslational modifications (PTM), which regulates cellular functions like cell growth, differentiation, apoptosis, and cell signaling. Kinase families cover a wide number of oncoproteins and are strongly associated with cancer. Identification of driver kinases is an intense area of cancer research. Thus, kinases serve as the potential target to improve the efficacy of targeted therapies. Mass spectrometry-based phosphoproteomic approach has paved the way to the identification of a large number of altered phosphorylation events in proteins and signaling cascades that may lead to oncogenic processes in a cell. Alterations in signaling pathways result in the activation of oncogenic processes predominantly regulated by kinases and phosphatases. Therefore, drugs such as kinase inhibitors, which target dysregulated pathways, represent a promising area for cancer therapy.
Collapse
Affiliation(s)
- Barnali Deb
- Institute of Bioinformatics, International Technology Park, Bangalore, India.,Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, India
| | - Irene A George
- Institute of Bioinformatics, International Technology Park, Bangalore, India
| | - Jyoti Sharma
- Institute of Bioinformatics, International Technology Park, Bangalore, India.,Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, India
| | - Prashant Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore, India. .,Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, India.
| |
Collapse
|
105
|
Hubler SL, Kumar P, Mehta S, Easterly C, Johnson JE, Jagtap PD, Griffin TJ. Challenges in Peptide-Spectrum Matching: A Robust and Reproducible Statistical Framework for Removing Low-Accuracy, High-Scoring Hits. J Proteome Res 2019; 19:161-173. [DOI: 10.1021/acs.jproteome.9b00478] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
106
|
Wang X, Shen S, Rasam SS, Qu J. MS1 ion current-based quantitative proteomics: A promising solution for reliable analysis of large biological cohorts. MASS SPECTROMETRY REVIEWS 2019; 38:461-482. [PMID: 30920002 PMCID: PMC6849792 DOI: 10.1002/mas.21595] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 02/28/2019] [Indexed: 05/04/2023]
Abstract
The rapidly-advancing field of pharmaceutical and clinical research calls for systematic, molecular-level characterization of complex biological systems. To this end, quantitative proteomics represents a powerful tool but an optimal solution for reliable large-cohort proteomics analysis, as frequently involved in pharmaceutical/clinical investigations, is urgently needed. Large-cohort analysis remains challenging owing to the deteriorating quantitative quality and snowballing missing data and false-positive discovery of altered proteins when sample size increases. MS1 ion current-based methods, which have become an important class of label-free quantification techniques during the past decade, show considerable potential to achieve reproducible protein measurements in large cohorts with high quantitative accuracy/precision. Nonetheless, in order to fully unleash this potential, several critical prerequisites should be met. Here we provide an overview of the rationale of MS1-based strategies and then important considerations for experimental and data processing techniques, with the emphasis on (i) efficient and reproducible sample preparation and LC separation; (ii) sensitive, selective and high-resolution MS detection; iii)accurate chromatographic alignment; (iv) sensitive and selective generation of quantitative features; and (v) optimal post-feature-generation data quality control. Prominent technical developments in these aspects are discussed. Finally, we reviewed applications of MS1-based strategy in disease mechanism studies, biomarker discovery, and pharmaceutical investigations.
Collapse
Affiliation(s)
- Xue Wang
- Department of Cell Stress BiologyRoswell Park Cancer InstituteBuffaloNew York
| | - Shichen Shen
- Department of Pharmaceutical SciencesUniversity at BuffaloState University of New YorkNew YorkNew York
| | - Sailee Suryakant Rasam
- Department of Biochemistry, University at BuffaloState University of New YorkNew YorkNew York
| | - Jun Qu
- Department of Cell Stress BiologyRoswell Park Cancer InstituteBuffaloNew York
- Department of Pharmaceutical SciencesUniversity at BuffaloState University of New YorkNew YorkNew York
- Department of Biochemistry, University at BuffaloState University of New YorkNew YorkNew York
| |
Collapse
|
107
|
Valli M, Russo HM, Pilon AC, Pinto MEF, Dias NB, Freire RT, Castro-Gamboa I, Bolzani VDS. Computational methods for NMR and MS for structure elucidation I: software for basic NMR. PHYSICAL SCIENCES REVIEWS 2019. [DOI: 10.1515/psr-2018-0108] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Abstract
Structure elucidation is an important and sometimes time-consuming step for natural products research. This step has evolved in the past few years to a faster and more automated process due to the development of several computational programs and analytical techniques. In this paper, the topics of NMR prediction and CASE programs are addressed. Furthermore, the elucidation of natural peptides is discussed.
Collapse
|
108
|
Géron A, Werner J, Wattiez R, Lebaron P, Matallana-Surget S. Deciphering the Functioning of Microbial Communities: Shedding Light on the Critical Steps in Metaproteomics. Front Microbiol 2019; 10:2395. [PMID: 31708885 PMCID: PMC6821674 DOI: 10.3389/fmicb.2019.02395] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 10/03/2019] [Indexed: 11/13/2022] Open
Abstract
Unraveling the complex structure and functioning of microbial communities is essential to accurately predict the impact of perturbations and/or environmental changes. From all molecular tools available today to resolve the dynamics of microbial communities, metaproteomics stands out, allowing the establishment of phenotype-genotype linkages. Despite its rapid development, this technology has faced many technical challenges that still hamper its potential power. How to maximize the number of protein identification, improve quality of protein annotation, and provide reliable ecological interpretation are questions of immediate urgency. In our study, we used a robust metaproteomic workflow combining two protein fractionation approaches (gel-based versus gel-free) and four protein search databases derived from the same metagenome to analyze the same seawater sample. The resulting eight metaproteomes provided different outcomes in terms of (i) total protein numbers, (ii) taxonomic structures, and (iii) protein functions. The characterization and/or representativeness of numerous proteins from ecologically relevant taxa such as Pelagibacterales, Rhodobacterales, and Synechococcales, as well as crucial environmental processes, such as nutrient uptake, nitrogen assimilation, light harvesting, and oxidative stress response, were found to be particularly affected by the methodology. Our results provide clear evidences that the use of different protein search databases significantly alters the biological conclusions in both gel-free and gel-based approaches. Our findings emphasize the importance of diversifying the experimental workflow for a comprehensive metaproteomic study.
Collapse
Affiliation(s)
- Augustin Géron
- Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom
- Department of Proteomic and Microbiology, University of Mons, Mons, Belgium
| | - Johannes Werner
- Department of Biological Oceanography, Leibniz Institute for Baltic Sea Research, Rostock, Germany
| | - Ruddy Wattiez
- Department of Proteomic and Microbiology, University of Mons, Mons, Belgium
| | - Philippe Lebaron
- Sorbonne Universités, UPMC Université Paris 06, USR 3579, LBBM, Observatoire Océanologique, Banyuls-sur-Mer, France
| | - Sabine Matallana-Surget
- Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom
| |
Collapse
|
109
|
Ignjatovic V, Geyer PE, Palaniappan KK, Chaaban JE, Omenn GS, Baker MS, Deutsch EW, Schwenk JM. Mass Spectrometry-Based Plasma Proteomics: Considerations from Sample Collection to Achieving Translational Data. J Proteome Res 2019; 18:4085-4097. [PMID: 31573204 DOI: 10.1021/acs.jproteome.9b00503] [Citation(s) in RCA: 104] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The proteomic analysis of human blood and blood-derived products (e.g., plasma) offers an attractive avenue to translate research progress from the laboratory into the clinic. However, due to its unique protein composition, performing proteomics assays with plasma is challenging. Plasma proteomics has regained interest due to recent technological advances, but challenges imposed by both complications inherent to studying human biology (e.g., interindividual variability) and analysis of biospecimens (e.g., sample variability), as well as technological limitations remain. As part of the Human Proteome Project (HPP), the Human Plasma Proteome Project (HPPP) brings together key aspects of the plasma proteomics pipeline. Here, we provide considerations and recommendations concerning study design, plasma collection, quality metrics, plasma processing workflows, mass spectrometry (MS) data acquisition, data processing, and bioinformatic analysis. With exciting opportunities in studying human health and disease though this plasma proteomics pipeline, a more informed analysis of human plasma will accelerate interest while enhancing possibilities for the incorporation of proteomics-scaled assays into clinical practice.
Collapse
Affiliation(s)
- Vera Ignjatovic
- Haematology Research , Murdoch Children's Research Institute , Parkville , VIC 3052 , Australia.,Department of Paediatrics , The University of Melbourne , Parkville , VIC 3052 , Australia
| | - Philipp E Geyer
- NNF Center for Protein Research, Faculty of Health Sciences , University of Copenhagen , 2200 Copenhagen , Denmark.,Department of Proteomics and Signal Transduction , Max Planck Institute of Biochemistry , 82152 Martinsried , Germany
| | - Krishnan K Palaniappan
- Freenome , 259 East Grand Avenue , South San Francisco , California 94080 , United States
| | - Jessica E Chaaban
- Haematology Research , Murdoch Children's Research Institute , Parkville , VIC 3052 , Australia
| | - Gilbert S Omenn
- Departments of Computational Medicine & Bioinformatics, Human Genetics, and Internal Medicine and School of Public Health , University of Michigan , 100 Washtenaw Avenue , Ann Arbor , Michigan 48109-2218 , United States
| | - Mark S Baker
- Department of Biomedical Sciences, Faculty of Medicine & Health Sciences , Macquarie University , 75 Talavera Road , North Ryde , NSW 2109 , Australia
| | - Eric W Deutsch
- Institute for Systems Biology , 401 Terry Avenue North , Seattle , Washington 98109 , United States
| | - Jochen M Schwenk
- Affinity Proteomics, SciLifeLab , KTH Royal Institute of Technology , 171 65 Stockholm , Sweden
| |
Collapse
|
110
|
Heller NC, Garrett AM, Merkley ED, Cendrowski SR, Melville AM, Arce JS, Jenson SC, Wahl KL, Jarman KH. Probabilistic Limit of Detection for Ricin Identification Using a Shotgun Proteomics Assay. Anal Chem 2019; 91:12399-12406. [PMID: 31490662 DOI: 10.1021/acs.analchem.9b02721] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Robust and highly specific methods for the detection of the protein toxin ricin are of interest to the law enforcement community. In previous studies, methods based on liquid chromatography-tandem mass spectrometry shotgun proteomics have been proposed. The successful implementation of this approach relies on specific data evaluation criteria addressing (1) the quality of the mass spectrometric data, (2) the confidence of peptide identifications (peptide-spectrum matches), and (3) the number and sequence specificity of peptides detected. We present such data evaluation criteria and use a novel approach to establish the limit of detection for this ricin assay. Specifically, we use logistic regression to determine the probability of detection for individual ricin peptides at different concentrations. We then apply basic rules from probability theory, combining these individual peptide probabilities into an overall assay limit of detection. This procedure yields an assay limit of detection for ricin at 42.5 ng on column or 21.25 ng/μL for a 2-μL injection. We also show that, despite the conventional wisdom that detergents are deleterious to mass spectrometric analyses, the presence of Tween-20 did not prevent detection of ricin peptides, and indeed assays performed in buffers that included Tween-20 gave better results than assays performed using other buffer formulations with or without detergent removal.
Collapse
Affiliation(s)
| | - Alaine M Garrett
- National Biodefense Analysis and Countermeasures Center , Operated by BNBI for the U.S. Department of Homeland Security Science and Technology Directorate , Frederick , Maryland , United States
| | | | - Stephen R Cendrowski
- National Biodefense Analysis and Countermeasures Center , Operated by BNBI for the U.S. Department of Homeland Security Science and Technology Directorate , Frederick , Maryland , United States
| | | | | | | | | | | |
Collapse
|
111
|
Kim H, Lee S, Park H. Target-small decoy search strategy for false discovery rate estimation. BMC Bioinformatics 2019; 20:438. [PMID: 31443634 PMCID: PMC6708216 DOI: 10.1186/s12859-019-3034-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 08/19/2019] [Indexed: 11/24/2022] Open
Abstract
Background One of the most important steps in peptide identification is to estimate the false discovery rate (FDR). The most commonly used method for estimating FDR is the target-decoy search strategy (TDS). While this method is simple and effective, it is time/space-inefficient because it searches a database that is twice as large as the original protein database. This inefficiency problem becomes more evident as protein databases get bigger and bigger. We propose a target-small decoy search strategy and present a rigorous verification that it reduces the database size and search time while retaining the accuracy of target-decoy search strategy (TDS). Results We show that peptide spectrum matches (PSMs) obtained at 1% FDR in TDS overlap ~ 99% with those in our method. (Considering that 1% FDR is used, 99% overlap means our method is very accurate.) Moreover, our method is more time/space-efficient than TDS. The search time of our method is reduced to only 1/4 of that of TDS when UniProt and its 1/8 decoy database are used. Conclusions We demonstrate that our method is almost as accurate as TDS and more time/space-efficient than TDS. Since the efficiency of our method is more evident as the database size increases, our method is expected to be useful for identifying peptides in proteogenomics databases constructed from inflated databases using genomic data. Electronic supplementary material The online version of this article (10.1186/s12859-019-3034-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hyunwoo Kim
- Research Data Sharing Center, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea
| | - Sangjeong Lee
- Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea
| | - Heejin Park
- Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea.
| |
Collapse
|
112
|
Nishimura T, Nakamura H, Végvári Á, Marko-Varga G, Furuya N, Saji H. Current status of clinical proteogenomics in lung cancer. Expert Rev Proteomics 2019; 16:761-772. [PMID: 31402712 DOI: 10.1080/14789450.2019.1654861] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Introduction: Lung cancer is the leading cause of cancer death worldwide. Proteogenomics, a way to integrate genomics, transcriptomics, and proteomics, have emerged as a way to understand molecular causes in cancer tumorigenesis. This understanding will help identify therapeutic targets that are urgently needed to improve individual patient outcomes. Areas covered: To explore underlying molecular mechanisms of lung cancer subtypes, several efforts have used proteogenomic approaches that integrate next generation sequencing (NGS) and mass spectrometry (MS)-based technologies. Expert opinion: A large-scale, MS-based, proteomic analysis, together with both NGS-based genomic data and clinicopathological information, will facilitate establishing extensive databases for lung cancer subtypes that can be used for further proteogenomic analyzes. Proteogenomic strategies will further be understanding of how major driver mutations affect downstream molecular networks, resulting in lung cancer progression and malignancy, and how therapy-resistant cancers resistant are molecularly structured. These strategies require advanced bioinformatics based on a dynamic theory of network systems, rather than statistics, to accurately identify mutant proteins and their affected key networks.
Collapse
Affiliation(s)
- Toshihide Nishimura
- Department of Translational Medicine Informatics, St. Marianna University School of Medicine , Kawasaki, Kanagawa , Japan
| | - Haruhiko Nakamura
- Department of Translational Medicine Informatics, St. Marianna University School of Medicine , Kawasaki, Kanagawa , Japan.,Department of Chest Surgery, St. Marianna University School of Medicine , Kawasaki, Kanagawa , Japan
| | - Ákos Végvári
- Proteomics Biomedicum, Division of Physiological Chemistry I, Department of Medical Biochemistry & Biophysics (MBB), Karolinska Institutet , Solna , Sweden
| | - György Marko-Varga
- Clinical Protein Science & Imaging, Biomedical Centre, Department of Biomedical Engineering, Lund University , Lund , Sweden.,Section for Clinical Chemistry, Department of Translational Medicine, Lund University, Skåne University Hospital Malmö , Malmö , Sweden
| | - Naoki Furuya
- Department of Internal Medicine, Division of Respiratory Medicine, St. Marianna University School of Medicine , Kawasaki , Kanagawa , Japan
| | - Hisashi Saji
- Department of Chest Surgery, St. Marianna University School of Medicine , Kawasaki, Kanagawa , Japan
| |
Collapse
|
113
|
Na S, Kim J, Paek E. MODplus: Robust and Unrestrictive Identification of Post-Translational Modifications Using Mass Spectrometry. Anal Chem 2019; 91:11324-11333. [PMID: 31365238 DOI: 10.1021/acs.analchem.9b02445] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Post-translational modifications regulate various cellular processes and are of great biological interest. Unrestrictive searches of mass spectrometry data enable the detection of any type of modification. Here we propose MODplus, which makes practical unrestrictive searches possible by allowing (1) hundreds of modifications, (2) multiple modifications per peptide, (3) the whole proteome database, and (4) any tolerant values in search parameters. The utility of MODplus was demonstrated in large human data sets of HEK293 cells and TMT-labeled phosphorylation enrichment. Notably, MODplus supports identifying different modification types at multiple sites and reports real chemical and biological modifications, as it has been very labor intensive to link unrestrictive search results to real modifications. We also confirmed the presence of Missing Precursor (MP) spectra that were not identifiable using targeted precursor masses. The MP spectra mostly resulted in identifications of wrong modifications and negatively affected the overall performance, often by as much as 10%. MODplus can rapidly recognize MP spectra and correct their identifications, resulting in increased identification rate up to 70% in the HEK293 data set as well as improved reliability.
Collapse
Affiliation(s)
- Seungjin Na
- Department of Computer Science , Hanyang University , Seoul 04763 , South Korea
| | - Jihyung Kim
- Department of Computer Science , Hanyang University , Seoul 04763 , South Korea
| | - Eunok Paek
- Department of Computer Science , Hanyang University , Seoul 04763 , South Korea
| |
Collapse
|
114
|
Chen ZL, Meng JM, Cao Y, Yin JL, Fang RQ, Fan SB, Liu C, Zeng WF, Ding YH, Tan D, Wu L, Zhou WJ, Chi H, Sun RX, Dong MQ, He SM. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides. Nat Commun 2019; 10:3404. [PMID: 31363125 PMCID: PMC6667459 DOI: 10.1038/s41467-019-11337-z] [Citation(s) in RCA: 238] [Impact Index Per Article: 47.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Accepted: 06/20/2019] [Indexed: 01/05/2023] Open
Abstract
We describe pLink 2, a search engine with higher speed and reliability for proteome-scale identification of cross-linked peptides. With a two-stage open search strategy facilitated by fragment indexing, pLink 2 is ~40 times faster than pLink 1 and 3~10 times faster than Kojak. Furthermore, using simulated datasets, synthetic datasets, 15N metabolically labeled datasets, and entrapment databases, four analysis methods were designed to evaluate the credibility of ten state-of-the-art search engines. This systematic evaluation shows that pLink 2 outperforms these methods in precision and sensitivity, especially at proteome scales. Lastly, re-analysis of four published proteome-scale cross-linking datasets with pLink 2 required only a fraction of the time used by pLink 1, with up to 27% more cross-linked residue pairs identified. pLink 2 is therefore an efficient and reliable tool for cross-linking mass spectrometry analysis, and the systematic evaluation methods described here will be useful for future software development. The identification of cross-linked peptides at a proteome scale for interactome analyses represents a complex challenge. Here the authors report an efficient and reliable search engine pLink 2 for proteome-scale cross-linking mass spectrometry analyses, and demonstrate how to systematically evaluate the credibility of search engines.
Collapse
Affiliation(s)
- Zhen-Lin Chen
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jia-Ming Meng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yong Cao
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Ji-Li Yin
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Run-Qian Fang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sheng-Bo Fan
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chao Liu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yue-He Ding
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Dan Tan
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Long Wu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wen-Jing Zhou
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Rui-Xiang Sun
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, 102206, China.
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
115
|
De Cicco M, Mamone G, Di Stasio L, Ferranti P, Addeo F, Picariello G. Hidden "Digestome": Current Analytical Approaches Provide Incomplete Peptide Inventories of Food Digests. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2019; 67:7775-7782. [PMID: 31088053 DOI: 10.1021/acs.jafc.9b02342] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Analyzing an in vitro gastroduodenal digest of whey proteins by high-performance liquid chromatography (HPLC) coupled to high-resolution/high-sensitivity tandem mass spectrometry (MS/MS), we sought to evaluate if state-of-art peptidomics provide comprehensive peptide coverage of food "digestomes". A multitude of small-sized peptides derived from both α-lactalbumin and β-lactoglobulin as well as disulfide cross-linked hetero-oligomers remained unassigned, even when the digests were compared before and after S-S reduction. The precipitation with 12% trichloroacetic acid demonstrated the occurrence of large-sized polypeptides that escaped the bioinformatic identification. The analysis of a HPLC-MS/MS run with different proteomic search engines generated dissimilar peptide subsets, thus emphasizing the demand of refined searching algorithms. Although the MS/MS fragmentation of monocharged ions with exclusion of non-peptide-interfering compounds enlarged the inventory of short peptides, the overall picture of the "digestome" was still incomplete. These findings raise relevant implications for the identification of possible food-derived bioactive peptides or allergenic determinants.
Collapse
Affiliation(s)
- Maristella De Cicco
- Institute of Food Sciences , National Research Council (CNR) , Via Roma 64 , 83100 Avellino , Italy
| | - Gianfranco Mamone
- Institute of Food Sciences , National Research Council (CNR) , Via Roma 64 , 83100 Avellino , Italy
| | - Luigia Di Stasio
- Institute of Food Sciences , National Research Council (CNR) , Via Roma 64 , 83100 Avellino , Italy
- Department of Agriculture , University of Naples "Federico II" , Parco Gussone, Via Università 100 , 80055 Portici , Naples, Italy
| | - Pasquale Ferranti
- Department of Agriculture , University of Naples "Federico II" , Parco Gussone, Via Università 100 , 80055 Portici , Naples, Italy
| | - Francesco Addeo
- Department of Agriculture , University of Naples "Federico II" , Parco Gussone, Via Università 100 , 80055 Portici , Naples, Italy
| | - Gianluca Picariello
- Institute of Food Sciences , National Research Council (CNR) , Via Roma 64 , 83100 Avellino , Italy
| |
Collapse
|
116
|
Bai W, Bilmes J, Noble WS. Submodular Generalized Matching for Peptide Identification in Tandem Mass Spectrometry. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1168-1181. [PMID: 29993658 PMCID: PMC8641787 DOI: 10.1109/tcbb.2018.2822280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
MOTIVATION Identification of spectra produced by a shotgun proteomics mass spectrometry experiment is commonly performed by searching the observed spectra against a peptide database. The heart of this search procedure is a score function that evaluates the quality of a hypothesized match between an observed spectrum and a theoretical spectrum corresponding to a particular peptide sequence. Accordingly, the success of a spectrum analysis pipeline depends critically upon this peptide-spectrum score function. We develop peptide-spectrum score functions that compute the maximum value of a submodular function under $m$ m matroid constraints. We call this procedure a submodular generalized matching (SGM) since it generalizes bipartite matching. We use a greedy algorithm to compute maximization, which can achieve a solution whose objective is guaranteed to be at least $\frac{1}{1+m}$ 1 1 + m of the true optimum. The advantage of the SGM framework is that known long-range properties of experimental spectra can be modeled by designing suitable submodular functions and matroid constraints. Experiments on four data sets from various organisms and mass spectrometry platforms show that the SGM approach leads to significantly improved performance compared to several state-of-the-art methods. Supplementary information, C++ source code, and data sets can be found at https://melodi-lab.github.io/SGM.
Collapse
|
117
|
Machado KCT, Fortuin S, Tomazella GG, Fonseca AF, Warren RM, Wiker HG, de Souza SJ, de Souza GA. On the Impact of the Pangenome and Annotation Discrepancies While Building Protein Sequence Databases for Bacteria Proteogenomics. Front Microbiol 2019; 10:1410. [PMID: 31281302 PMCID: PMC6596428 DOI: 10.3389/fmicb.2019.01410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 06/05/2019] [Indexed: 01/19/2023] Open
Abstract
In proteomics, peptide information within mass spectrometry (MS) data from a specific organism sample is routinely matched against a protein sequence database that best represent such organism. However, if the species/strain in the sample is unknown or genetically poorly characterized, it becomes challenging to determine a database which can represent such sample. Building customized protein sequence databases merging multiple strains for a given species has become a strategy to overcome such restrictions. However, as more genetic information is publicly available and interesting genetic features such as the existence of pan- and core genes within a species are revealed, we questioned how efficient such merging strategies are to report relevant information. To test this assumption, we constructed databases containing conserved and unique sequences for 10 different species. Features that are relevant for probabilistic-based protein identification by proteomics were then monitored. As expected, increase in database complexity correlates with pangenomic complexity. However, Mycobacterium tuberculosis and Bordetella pertussis generated very complex databases even having low pangenomic complexity. We further tested database performance by using MS data from eight clinical strains from M. tuberculosis, and from two published datasets from Staphylococcus aureus. We show that by using an approach where database size is controlled by removing repeated identical tryptic sequences across strains/species, computational time can be reduced drastically as database complexity increases.
Collapse
Affiliation(s)
- Karla C T Machado
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
| | - Suereta Fortuin
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research/SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Gisele Guicardi Tomazella
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
- The Gade Research Group for Infection and Immunity, Department of Clinical Science, University of Bergen, Bergen, Norway
- The Institute of Bioinformatics and Biotechnology, Natal, Brazil
| | - Andre F Fonseca
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
| | - Robin Mark Warren
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research/SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Harald G Wiker
- The Gade Research Group for Infection and Immunity, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Sandro Jose de Souza
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
- The Brain Institute, Universidade Federal do Rio Grande do Norte, Natal, Brazil
| | - Gustavo Antonio de Souza
- Bioinformatics Multidisciplinary Environment, Universidade Federal do Rio Grande do Norte, Natal, Brazil
- Department of Biochemistry, Federal University of Rio Grande do Norte (UFRN), Natal, Brazil
| |
Collapse
|
118
|
Binz PA, Shofstahl J, Vizcaíno JA, Barsnes H, Chalkley RJ, Menschaert G, Alpi E, Clauser K, Eng JK, Lane L, Seymour SL, Sánchez LFH, Mayer G, Eisenacher M, Perez-Riverol Y, Kapp EA, Mendoza L, Baker PR, Collins A, Van Den Bossche T, Deutsch EW. Proteomics Standards Initiative Extended FASTA Format. J Proteome Res 2019; 18:2686-2692. [PMID: 31081335 DOI: 10.1021/acs.jproteome.9b00064] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff .
Collapse
Affiliation(s)
- Pierre-Alain Binz
- CHUV Centre Hospitalier Universitaire Vaudois , CH-1011 Lausanne 14 , Switzerland
| | - Jim Shofstahl
- Thermo Fisher Scientific , 355 River Oaks Parkway , San Jose , California 95134 , United States
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine , University of Bergen , N-5009 Bergen , Norway.,Computational Biology Unit, Department of Informatics , University of Bergen , N-5008 Bergen , Norway
| | - Robert J Chalkley
- University California at San Francisco , San Francisco , California 94143 , United States
| | - Gerben Menschaert
- Biobix, Department of Data Analysis and Mathematical Modelling , Ghent University , 9000 Ghent , Belgium
| | - Emanuele Alpi
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Karl Clauser
- Broad Institute , Cambridge , Massachusetts 02142 , United States
| | - Jimmy K Eng
- University of Washington , Seattle , Washington 98195 , United States
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics , CH-1211 Geneva 4 , Switzerland.,Department of Microbiology and Molecular Medicine, Faculty of Medicine , University of Geneva , CH-1211 Geneva 4 , Switzerland
| | - Sean L Seymour
- Seymour Data Science, LLC , San Francisco , California 95000 , United States
| | - Luis Francisco Hernández Sánchez
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science , University of Bergen , 5021 Bergen , Norway.,Center for Medical Genetics and Molecular Medicine , Haukeland University Hospital , 5021 Bergen , Norway
| | - Gerhard Mayer
- Medical Faculty, Medizinisches Proteom-Center , Ruhr University Bochum , D-44801 Bochum , Germany
| | - Martin Eisenacher
- Medical Faculty, Medizinisches Proteom-Center , Ruhr University Bochum , D-44801 Bochum , Germany
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Eugene A Kapp
- Walter & Eliza Hall Institute of Medical Research and the University of Melbourne , Melbourne , VIC 3052 , Australia
| | - Luis Mendoza
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| | - Peter R Baker
- University California at San Francisco , San Francisco , California 94143 , United States
| | - Andrew Collins
- Department of Functional and Comparative Genomics, Institute of Integrated Biology , University of Liverpool , Liverpool L69 7ZB , United Kingdom
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology , Ghent University , 9000 Ghent , Belgium
| | - Eric W Deutsch
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| |
Collapse
|
119
|
Schaffer LV, Millikin RJ, Miller RM, Anderson LC, Fellers RT, Ge Y, Kelleher NL, LeDuc RD, Liu X, Payne SH, Sun L, Thomas PM, Tucholski T, Wang Z, Wu S, Wu Z, Yu D, Shortreed MR, Smith LM. Identification and Quantification of Proteoforms by Mass Spectrometry. Proteomics 2019; 19:e1800361. [PMID: 31050378 PMCID: PMC6602557 DOI: 10.1002/pmic.201800361] [Citation(s) in RCA: 128] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 04/07/2019] [Indexed: 12/29/2022]
Abstract
A proteoform is a defined form of a protein derived from a given gene with a specific amino acid sequence and localized post-translational modifications. In top-down proteomic analyses, proteoforms are identified and quantified through mass spectrometric analysis of intact proteins. Recent technological developments have enabled comprehensive proteoform analyses in complex samples, and an increasing number of laboratories are adopting top-down proteomic workflows. In this review, some recent advances are outlined and current challenges and future directions for the field are discussed.
Collapse
Affiliation(s)
- Leah V. Schaffer
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Robert J. Millikin
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Rachel M. Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lissa C. Anderson
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Tallahassee, Florida 32310, United States
| | - Ryan T. Fellers
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois 60208, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Department of Cell and Regenerative Biology and Human Proteomics Program, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Neil L. Kelleher
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois 60208, United States
- Department of Chemistry and Molecular Biosciences and the Division of Hematology-Oncology, Northwestern University, Evanston, Illinois 60208, United States
| | - Richard D. LeDuc
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois 60208, United States
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, United States
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Samuel H. Payne
- Department of Biology, Brigham Young University, Provo, UT 84602
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Paul M. Thomas
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois 60208, United States
| | - Trisha Tucholski
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Zhe Wang
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, Oklahoma 73019, United States
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, Oklahoma 73019, United States
| | - Zhijie Wu
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Dahang Yu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, Oklahoma 73019, United States
| | - Michael R. Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M. Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
120
|
Maes E, Oeyen E, Boonen K, Schildermans K, Mertens I, Pauwels P, Valkenborg D, Baggerman G. The challenges of peptidomics in complementing proteomics in a clinical context. MASS SPECTROMETRY REVIEWS 2019; 38:253-264. [PMID: 30372792 DOI: 10.1002/mas.21581] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 10/01/2018] [Indexed: 06/08/2023]
Abstract
Naturally occurring peptides, including growth factors, hormones, and neurotransmitters, represent an important class of biomolecules and have crucial roles in human physiology. The study of these peptides in clinical samples is therefore as relevant as ever. Compared to more routine proteomics applications in clinical research, peptidomics research questions are more challenging and have special requirements with regard to sample handling, experimental design, and bioinformatics. In this review, we describe the issues that confront peptidomics in a clinical context. After these hurdles are (partially) overcome, peptidomics will be ready for a successful translation into medical practice.
Collapse
Affiliation(s)
- Evelyne Maes
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Food and Bio-Based Products, AgResearch Ltd., Lincoln, New Zealand
| | - Eline Oeyen
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Karin Schildermans
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Inge Mertens
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Patrick Pauwels
- Molecular Pathology Unit, Department of Pathology, Antwerp University Hospital, Edegem, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Center for Statistics, Hasselt University, Diepenbeek, Belgium
| | - Geert Baggerman
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
121
|
Schiebenhoefer H, Van Den Bossche T, Fuchs S, Renard BY, Muth T, Martens L. Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis. Expert Rev Proteomics 2019; 16:375-390. [PMID: 31002542 DOI: 10.1080/14789450.2019.1609944] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
INTRODUCTION The study of microbial communities based on the combined analysis of genomic and proteomic data - called metaproteogenomics - has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment. Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications. Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Tim Van Den Bossche
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| | - Stephan Fuchs
- d FG13 Division of Nosocomial Pathogens and Antibiotic Resistances , Robert Koch Institute , Wernigerode , Germany
| | - Bernhard Y Renard
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Thilo Muth
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Lennart Martens
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| |
Collapse
|
122
|
Ma WT, Liu ZY, Chen XZ, Lin ZL, Zheng ZB, Miao WG, Xie SQ. A protein identification algorithm for tandem mass spectrometry by incorporating the abundance of mRNA into a binomial probability scoring model. J Proteomics 2019; 197:53-59. [PMID: 30790687 DOI: 10.1016/j.jprot.2019.02.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 02/15/2019] [Accepted: 02/17/2019] [Indexed: 12/17/2022]
Abstract
Peptide-spectrum matches (PSM) scoring between the experimental and theoretical spectrum is a key step in the identification of proteins using mass spectrometry (MS)-based proteomics analyses. Efficient protein identification using MS/MS data remains a challenge. The strategy of using RNA-seq data increases the number of proteins identified by re-constructing the custom search database and integrating mRNA abundance into the false discovery rate of post-PSM. However, this process lacks an algorithm that can allow the incorporation of mRNA abundance into the key scoring model of PSM. Therefore, we developed a novel PSM scoring model, which incorporates mRNA abundance for improved peptide and protein identification. In the new algorithm, abundance information of mRNA was transformed to the prior probability of protein identification and integrated to re-score in PSM using the binomial probability distribution model. Compared with other algorithms using five MS/MS datasets, the results showed that the least improvement ratios of peptide and protein groups were 3.39%-9.79% and 0.48%-8.16% in different datasets (human, rat, zebrafish, yeast, and Arabidopsis thaliana). The new strategy offers an effective solution for MS-based identification of peptides and proteins. SIGNIFICANCE: The new algorithm identifies proteins by quantifying mRNA abundance (FPKM) and incorporating it into a scoring model for peptide-spectrum matches. It is important to improve peptide and protein identification from MS/MS datasets in proteomics research.
Collapse
Affiliation(s)
- Wen-Tai Ma
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Zhao-Yu Liu
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Xiao-Zhou Chen
- School of Mathematics and Computer science, Yunnan Minzu University, Kunming 650031, China
| | - Zhen-Liang Lin
- Department of General Surgery, The Affiliated Cangnan Hospital of Wenzhou Medical University, Wenzhou 325800, China
| | - Zhong-Bing Zheng
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Wei-Guo Miao
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China.
| | - Shang-Qian Xie
- Institute of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China.
| |
Collapse
|
123
|
An Y, Zhou L, Huang Z, Nice EC, Zhang H, Huang C. Molecular insights into cancer drug resistance from a proteomics perspective. Expert Rev Proteomics 2019; 16:413-429. [PMID: 30925852 DOI: 10.1080/14789450.2019.1601561] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
INTRODUCTION Resistance to chemotherapy and development of specific and effective molecular targeted therapies are major obstacles facing current cancer treatment. Comparative proteomic approaches have been employed for the discovery of putative biomarkers associated with cancer drug resistance and have yielded a number of candidate proteins, showing great promise for both novel drug target identification and personalized medicine for the treatment of drug-resistant cancer. Areas covered: Herein, we review the recent advances and challenges in proteomics studies on cancer drug resistance with an emphasis on biomarker discovery, as well as understanding the interconnectivity of proteins in disease-related signaling pathways. In addition, we highlight the critical role that post-translational modifications (PTMs) play in the mechanisms of cancer drug resistance. Expert opinion: Revealing changes in proteome profiles and the role of PTMs in drug-resistant cancer is key to deciphering the mechanisms of treatment resistance. With the development of sensitive and specific mass spectrometry (MS)-based proteomics and related technologies, it is now possible to investigate in depth potential biomarkers and the molecular mechanisms of cancer drug resistance, assisting the development of individualized therapeutic strategies for cancer patients.
Collapse
Affiliation(s)
- Yao An
- a West China School of Basic Medical Sciences & Forensic Medicine , Sichuan University , Chengdu , PR China.,b Department of Oncology , The Second Affiliated Hospital of Hainan Medical University , Haikou , P.R. China
| | - Li Zhou
- a West China School of Basic Medical Sciences & Forensic Medicine , Sichuan University , Chengdu , PR China
| | - Zhao Huang
- a West China School of Basic Medical Sciences & Forensic Medicine , Sichuan University , Chengdu , PR China
| | - Edouard C Nice
- c Department of Biochemistry and Molecular Biology , Monash University , Clayton , Australia
| | - Haiyuan Zhang
- b Department of Oncology , The Second Affiliated Hospital of Hainan Medical University , Haikou , P.R. China
| | - Canhua Huang
- a West China School of Basic Medical Sciences & Forensic Medicine , Sichuan University , Chengdu , PR China.,b Department of Oncology , The Second Affiliated Hospital of Hainan Medical University , Haikou , P.R. China
| |
Collapse
|
124
|
Devabhaktuni A, Lin S, Zhang L, Swaminathan K, Gonzalez CG, Olsson N, Pearlman SM, Rawson K, Elias JE. TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat Biotechnol 2019; 37:469-479. [PMID: 30936560 PMCID: PMC6447449 DOI: 10.1038/s41587-019-0067-5] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Accepted: 02/12/2019] [Indexed: 02/06/2023]
Abstract
Although mass spectrometry is well suited to identifying thousands of potential protein post-translational modifications (PTMs), it has historically been biased towards just a few. To measure the entire set of PTMs across diverse proteomes, software must overcome the dual challenges of covering enormous search spaces and distinguishing correct from incorrect spectrum interpretations. Here, we describe TagGraph, a computational tool that overcomes both challenges with an unrestricted string-based search method that is as much as 350-fold faster than existing approaches, and a probabilistic validation model that we optimized for PTM assignments. We applied TagGraph to a published human proteomic dataset of 25 million mass spectra and tripled confident spectrum identifications compared to its original analysis. We identified thousands of modification types on almost 1 million sites in the proteome. We show alternative contexts for highly abundant yet understudied PTMs such as proline hydroxylation, and its unexpected association with cancer mutations. By enabling broad characterization of PTMs, TagGraph informs as to how their functions and regulation intersect.
Collapse
Affiliation(s)
- Arun Devabhaktuni
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Sarah Lin
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Lichao Zhang
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Kavya Swaminathan
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Carlos G Gonzalez
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Niclas Olsson
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Samuel M Pearlman
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Keith Rawson
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Joshua E Elias
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
125
|
Applications and challenges of forensic proteomics. Forensic Sci Int 2019; 297:350-363. [DOI: 10.1016/j.forsciint.2019.01.022] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 01/09/2019] [Accepted: 01/13/2019] [Indexed: 12/23/2022]
|
126
|
Fast Proteome Identification and Quantification from Data-Dependent Acquisition-Tandem Mass Spectrometry (DDA MS/MS) Using Free Software Tools. Methods Protoc 2019; 2. [PMID: 31008411 PMCID: PMC6469856 DOI: 10.3390/mps2010008] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The identification of nearly all proteins in a biological system using data-dependent acquisition (DDA) tandem mass spectrometry has become routine for organisms with relatively small genomes such as bacteria and yeast. Still, the quantification of the identified proteins may be a complex process and often requires multiple different software packages. In this protocol, I describe a flexible strategy for the identification and label-free quantification of proteins from bottom-up proteomics experiments. This method can be used to quantify all the detectable proteins in any DDA dataset collected with high-resolution precursor scans and may be used to quantify proteome remodeling in response to drug treatment or a gene knockout. Notably, the method is statistically rigorous, uses the latest and fastest freely-available software, and the entire protocol can be completed in a few hours with a small number of data files from the analysis of yeast.
Collapse
|
127
|
Hu A, Lu YY, Bilmes J, Noble WS. Joint Precursor Elution Profile Inference via Regression for Peptide Detection in Data-Independent Acquisition Mass Spectra. J Proteome Res 2019; 18:86-94. [PMID: 30362768 PMCID: PMC6465123 DOI: 10.1021/acs.jproteome.8b00365] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In data independent acquisition (DIA) mass spectrometry, precursor scans are interleaved with wide-window fragmentation scans, resulting in complex fragmentation spectra containing multiple coeluting peptide species. In this setting, detecting the isotope distribution profiles of intact peptides in the precursor scans can be a critical initial step in accurate peptide detection and quantification. This peak detection step is particularly challenging when the isotope peaks associated with two different peptide species overlap-or interfere-with one another. We propose a regression model, called Siren, to detect isotopic peaks in precursor DIA data that can explicitly account for interference. We validate Siren's peak-calling performance on a variety of data sets by counting how many of the peaks Siren identifies are associated with confidently detected peptides. In particular, we demonstrate that substituting the Siren regression model in place of the existing peak-calling step in DIA-Umpire leads to improved overall rates of peptide detection.
Collapse
|
128
|
Lachén-Montes M, González-Morales A, Fernández-Irigoyen J, Santamaría E. Deployment of Label-Free Quantitative Olfactory Proteomics to Detect Cerebrospinal Fluid Biomarker Candidates in Synucleinopathies. Methods Mol Biol 2019; 2044:273-289. [PMID: 31432419 DOI: 10.1007/978-1-4939-9706-0_17] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Nowadays, diagnosis of neurodegenerative disorders is mainly based on neuroimaging and clinical symptoms, although postmortem neuropathological confirmation remains the gold standard diagnostic technique. Therefore, cerebrospinal fluid (CSF) proteome is considered a valuable molecular repository for diagnosing and targeting the neurodegenerative process. It is well known that olfactory dysfunction is among the earliest features of synucleinopathies such as Parkinson's disease (PD). Consequently, we consider that the application of tissue proteomics in primary olfactory structures is an ideal approach to explore early pathophysiological changes, detecting olfactory proteins that might be tested in CSF as potential biomarkers. Data mining of mass spectrometry-generated datasets has revealed that 30% of the olfactory bulb (OB) proteome is also localized in CSF. In this chapter, we describe a method that utilizes label-free quantitative proteomics and computational analysis to characterize human OB proteomes and potential cerebrospinal fluid (CSF) biomarkers associated with neurodegenerative syndromes. For that, we applied peptide fractionation methods, followed by tandem mass spectrometry (nanoLC-MS/MS), in silico analysis, and semi-quantitative orthogonal techniques in OB derived from PD subjects. After obtaining the differential OB proteome across Lewy-type alpha-synucleinopathy (LTS) stages and further validating the method, this workflow was applied to probe changes in NEGR1 (neuronal growth regulator 1) and GNPDA2 (glucosamine-6-phosphate deaminase 2) protein levels in CSF derived from parkinsonian subjects with respect to controls, observing an inverse correlation between both proteins and α-synuclein, the principal component analysis of Lewy pathology.
Collapse
Affiliation(s)
- Mercedes Lachén-Montes
- Proteomics Unit, Clinical Neuroproteomics Laboratory, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Proteored-ISCIII, Pamplona, Spain
| | - Andrea González-Morales
- Proteomics Unit, Clinical Neuroproteomics Laboratory, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Proteored-ISCIII, Pamplona, Spain
| | - Joaquín Fernández-Irigoyen
- Proteomics Unit, Clinical Neuroproteomics Laboratory, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Proteored-ISCIII, Pamplona, Spain
| | - Enrique Santamaría
- Proteomics Unit, Clinical Neuroproteomics Laboratory, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Proteored-ISCIII, Pamplona, Spain.
| |
Collapse
|
129
|
Deutsch EW, Perez-Riverol Y, Chalkley RJ, Wilhelm M, Tate S, Sachsenberg T, Walzer M, Käll L, Delanghe B, Böcker S, Schymanski EL, Wilmes P, Dorfer V, Kuster B, Volders PJ, Jehmlich N, Vissers JP, Wolan DW, Wang AY, Mendoza L, Shofstahl J, Dowsey AW, Griss J, Salek RM, Neumann S, Binz PA, Lam H, Vizcaíno JA, Bandeira N, Röst H. Expanding the Use of Spectral Libraries in Proteomics. J Proteome Res 2018; 17:4051-4060. [PMID: 30270626 PMCID: PMC6443480 DOI: 10.1021/acs.jproteome.8b00485] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Robert J. Chalkley
- University of California San Francisco, San Francisco, 94158, California, United States
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
| | | | - Timo Sachsenberg
- Department of Computer Science, Center for Bioinformatics, University of Tübingen, Sand 14, Tübingen, 72076, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH − Royal Institute of Technology, Stockholm 114 28, Sweden
| | - Bernard Delanghe
- Thermo Fisher Scientific Bremen, Hanna-Kunath Str. 11, 28199 Bremen, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Viktoria Dorfer
- University of Applied Sciences Upper Austria, Bioinformatics Research Group, Hagenberg, 4232, Austria
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
- Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich, Freising, 85354, Germany
| | | | - Nico Jehmlich
- Helmholtz-Centre for Environmental Research - UFZ, Leipzig, Germany
| | | | - Dennis W. Wolan
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Ana Y. Wang
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Jim Shofstahl
- Thermo Fisher Scientific, 355 River Oaks Parkway San Jose, CA 95134
| | - Andrew W. Dowsey
- Department of Population Health Sciences and Bristol Veterinary School, Faculty of Health Sciences, University of Bristol, Bristol BS9 1BN, UK
| | - Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Währinger Gürtel 18-20, Vienna 1090, Austria
| | - Reza M. Salek
- The International Agency for Research on Cancer (IARC), 150 Cours Albert Thomas, 69372 Lyon CEDEX 08, France
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Department of Stress and Developmental Biology, 06120 Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Pierre-Alain Binz
- Clinical Chemistry Service, Centre Hospitalier Universitaire Vaudois, 1011 Lausanne, Switzerland
| | - Henry Lam
- Department of Chemical and Biological Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 92093-0404, USA
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, 160 College St., Toronto, ON, M5S 3E1, Canada
| |
Collapse
|
130
|
Kovalchik KA, Colborne S, Spencer SE, Sorensen PH, Chen DDY, Morin GB, Hughes CS. RawTools: Rapid and Dynamic Interrogation of Orbitrap Data Files for Mass Spectrometer System Management. J Proteome Res 2018; 18:700-708. [PMID: 30462513 DOI: 10.1021/acs.jproteome.8b00721] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Optimizing the quality of proteomics data collected from a mass spectrometer (MS) requires careful selection of acquisition parameters and proper assessment of instrument performance. Software tools capable of extracting a broad set of information from raw files, including meta, scan, quantification, and identification data, are needed to provide guidance for MS system management. In this work, direct extraction and utilization of these data is demonstrated using RawTools, a standalone tool for extracting meta and scan data directly from raw MS files generated on Thermo Orbitrap instruments. RawTools generates summarized and detailed plain text outputs after parsing individual raw files, including scan rates and durations, duty cycle characteristics, precursor and reporter ion quantification, and chromatography performance. RawTools also contains a diagnostic module that includes an optional "preview" database search for facilitating informed decision-making related to optimization of MS performance based on a variety of metrics. RawTools has been developed in C# and utilizes the Thermo RawFileReader library and thus can process raw MS files with high speed and high efficiency on all major operating systems (Windows, MacOS, Linux). To demonstrate the utility of RawTools, the extraction of meta and scan data from both individual and large collections of raw MS files was carried out to identify problematic characteristics of instrument performance. Taken together, the combined rich feature-set of RawTools with the capability for interrogation of MS and experiment performance makes this software a valuable tool for proteomics researchers.
Collapse
Affiliation(s)
- Kevin A Kovalchik
- Department of Chemistry , University of British Columbia , Vancouver , British Columbia V6T 1Z3 , Canada.,Canada's Michael Smith Genome Sciences Centre , British Columbia Cancer Agency , Vancouver , British Columbia V5Z 1L3 , Canada
| | - Shane Colborne
- Canada's Michael Smith Genome Sciences Centre , British Columbia Cancer Agency , Vancouver , British Columbia V5Z 1L3 , Canada
| | - Sandra Elizabeth Spencer
- Canada's Michael Smith Genome Sciences Centre , British Columbia Cancer Agency , Vancouver , British Columbia V5Z 1L3 , Canada
| | - Poul H Sorensen
- Department of Molecular Oncology , British Columbia Cancer Research Centre , Vancouver , British Columbia V5Z 1L3 , Canada
| | - David D Y Chen
- Department of Chemistry , University of British Columbia , Vancouver , British Columbia V6T 1Z3 , Canada
| | - Gregg B Morin
- Canada's Michael Smith Genome Sciences Centre , British Columbia Cancer Agency , Vancouver , British Columbia V5Z 1L3 , Canada.,Department of Medical Genetics , University of British Columbia , Vancouver , British Columbia V6T 1Z3 , Canada
| | - Christopher S Hughes
- Canada's Michael Smith Genome Sciences Centre , British Columbia Cancer Agency , Vancouver , British Columbia V5Z 1L3 , Canada.,Department of Molecular Oncology , British Columbia Cancer Research Centre , Vancouver , British Columbia V5Z 1L3 , Canada
| |
Collapse
|
131
|
Wilson EA, Anderson KS. Lost in the crowd: identifying targetable MHC class I neoepitopes for cancer immunotherapy. Expert Rev Proteomics 2018; 15:1065-1077. [PMID: 30408427 DOI: 10.1080/14789450.2018.1545578] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
INTRODUCTION The recent development of checkpoint blockade immunotherapy for cancer has led to impressive clinical results across multiple tumor types. There is mounting evidence that immune recognition of tumor derived MHC class I (MHC-I) restricted epitopes bearing cancer specific mutations and alterations is a crucial mechanism in successfully triggering immune-mediated tumor rejection. Therapeutic targeting of these cancer specific epitopes (neoepitopes) is emerging as a promising opportunity for the generation of personalized cancer vaccines and adoptive T cell therapies. However, one major obstacle limiting the broader application of neoepitope based therapies is the difficulty of selecting highly immunogenic neoepitopes among the wide array of presented non-immunogenic HLA ligands derived from self-proteins. Areas covered: In this review, we present an overview of the MHC-I processing and presentation pathway, as well as highlight key areas that contribute to the complexity of the associated MHC-I peptidome. We cover recent technological advances that simplify and optimize the identification of targetable neoepitopes for cancer immunotherapeutic applications. Expert commentary: Recent advances in computational modeling, bioinformatics, and mass spectrometry are unlocking the underlying mechanisms governing antigen processing and presentation of tumor-derived neoepitopes.
Collapse
Affiliation(s)
- Eric A Wilson
- a Center for Personalized Diagnostics, Biodesign Institute , Arizona State University , Tempe , AZ , USA
| | - Karen S Anderson
- a Center for Personalized Diagnostics, Biodesign Institute , Arizona State University , Tempe , AZ , USA.,b Department of Medical Oncology , Mayo Clinic Arizona , Scottsdale , AZ , USA
| |
Collapse
|
132
|
Dams M, Dores-Sousa JL, Lamers RJ, Treumann A, Eeltink S. High-Resolution Nano-Liquid Chromatography with Tandem Mass Spectrometric Detection for the Bottom-Up Analysis of Complex Proteomic Samples. Chromatographia 2018. [DOI: 10.1007/s10337-018-3647-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
133
|
Oonk S, Schuurmans T, Pabst M, de Smet LCPM, de Puit M. Proteomics as a new tool to study fingermark ageing in forensics. Sci Rep 2018; 8:16425. [PMID: 30401937 PMCID: PMC6219553 DOI: 10.1038/s41598-018-34791-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Accepted: 10/26/2018] [Indexed: 01/10/2023] Open
Abstract
Fingermarks are trace evidence of great forensic importance, and their omnipresence makes them pivotal in crime investigation. Police and law enforcement authorities have exploited fingermarks primarily for personal identification, but crucial knowledge on when fingermarks were deposited is often lacking, thereby hindering crime reconstruction. Biomolecular constituents of fingermark residue, such as amino acids, lipids and proteins, may provide excellent means for fingermark age determination, however robust methodologies or detailed knowledge on molecular mechanisms in time are currently not available. Here, we address fingermark age assessment by: (i) drafting a first protein map of fingermark residue, (ii) differential studies of fresh and aged fingermarks and (iii), to mimic real-world scenarios, estimating the effects of donor contact with bodily fluids on the identification of potential age biomarkers. Using a high-resolution mass spectrometry-based proteomics approach, we drafted a characteristic fingermark proteome, of which five proteins were identified as promising candidates for fingermark age estimation. This study additionally demonstrates successful identification of both endogenous and contaminant proteins from donors that have been in contact with various bodily fluids. In summary, we introduce state-of-the-art proteomics as a sensitive tool to monitor fingermark aging on the protein level with sufficient selectivity to differentiate potential age markers from body fluid contaminants.
Collapse
Affiliation(s)
- Stijn Oonk
- Netherlands Forensic Institute, Digital Technology and Biometrics, Laan van Ypenburg 6, 2497 GB, Den Haag, Netherlands. .,Delft University of Technology, Faculty of Applied Sciences, Department of Chemical Engineering, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands.
| | - Tom Schuurmans
- Netherlands Forensic Institute, Digital Technology and Biometrics, Laan van Ypenburg 6, 2497 GB, Den Haag, Netherlands
| | - Martin Pabst
- Delft University of Technology, Faculty of Applied Sciences, Department of Biotechnology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Louis C P M de Smet
- Delft University of Technology, Faculty of Applied Sciences, Department of Chemical Engineering, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands.,Wageningen University & Research, Laboratory of Organic Chemistry, Stippeneng 4, 6708 WE, Wageningen, The Netherlands
| | - Marcel de Puit
- Netherlands Forensic Institute, Digital Technology and Biometrics, Laan van Ypenburg 6, 2497 GB, Den Haag, Netherlands. .,Delft University of Technology, Faculty of Applied Sciences, Department of Chemical Engineering, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands.
| |
Collapse
|
134
|
Lin A, Howbert JJ, Noble WS. Combining High-Resolution and Exact Calibration To Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data. J Proteome Res 2018; 17:3644-3656. [PMID: 30221945 DOI: 10.1021/acs.jproteome.8b00206] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
To achieve accurate assignment of peptide sequences to observed fragmentation spectra, a shotgun proteomics database search tool must make good use of the very high-resolution information produced by state-of-the-art mass spectrometers. However, making use of this information while also ensuring that the search engine's scores are well calibrated, that is, that the score assigned to one spectrum can be meaningfully compared to the score assigned to a different spectrum, has proven to be challenging. Here we describe a database search score function, the "residue evidence" (res-ev) score, that achieves both of these goals simultaneously. We also demonstrate how to combine calibrated res-ev scores with calibrated XCorr scores to produce a "combined p value" score function. We provide a benchmark consisting of four mass spectrometry data sets, which we use to compare the combined p value to the score functions used by several existing search engines. Our results suggest that the combined p value achieves state-of-the-art performance, generally outperforming MS Amanda and Morpheus and performing comparably to MS-GF+. The res-ev and combined p-value score functions are freely available as part of the Tide search engine in the Crux mass spectrometry toolkit ( http://crux.ms ).
Collapse
Affiliation(s)
- Andy Lin
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - J Jeffry Howbert
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - William Stafford Noble
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States.,Department of Computer Science and Engineering , University of Washington , Seattle , Washington 98195 , United States
| |
Collapse
|
135
|
Song R, Sarnoski EA, Acar M. The Systems Biology of Single-Cell Aging. iScience 2018; 7:154-169. [PMID: 30267677 PMCID: PMC6153419 DOI: 10.1016/j.isci.2018.08.023] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 07/30/2018] [Accepted: 08/29/2018] [Indexed: 12/12/2022] Open
Abstract
Aging is a leading cause of human morbidity and mortality, but efforts to slow or reverse its effects are hampered by an incomplete understanding of its multi-faceted origins. Systems biology, the use of quantitative and computational methods to understand complex biological systems, offers a toolkit well suited to elucidating the root cause of aging. We describe the known components of the aging network and outline innovative techniques that open new avenues of investigation to the aging research community. We propose integration of the systems biology and aging fields, identifying areas of complementarity based on existing and impending technological capabilities.
Collapse
Affiliation(s)
- Ruijie Song
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, 300 George Street, Suite 501, New Haven, CT 06511, USA; Systems Biology Institute, Yale University, 850 West Campus Drive, West Haven, CT 06516, USA
| | - Ethan A Sarnoski
- Systems Biology Institute, Yale University, 850 West Campus Drive, West Haven, CT 06516, USA; Department of Molecular Cellular and Developmental Biology, Yale University, 219 Prospect Street, New Haven, CT 06511, USA
| | - Murat Acar
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, 300 George Street, Suite 501, New Haven, CT 06511, USA; Systems Biology Institute, Yale University, 850 West Campus Drive, West Haven, CT 06516, USA; Department of Molecular Cellular and Developmental Biology, Yale University, 219 Prospect Street, New Haven, CT 06511, USA; Department of Physics, Yale University, 217 Prospect Street, New Haven, CT 06511, USA.
| |
Collapse
|
136
|
Mendoza L, Deutsch EW, Sun Z, Campbell DS, Shteynberg DD, Moritz RL. Flexible and Fast Mapping of Peptides to a Proteome with ProteoMapper. J Proteome Res 2018; 17:4337-4344. [PMID: 30230343 DOI: 10.1021/acs.jproteome.8b00544] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Bottom-up proteomics relies on the proteolytic or chemical cleavage of proteins into peptides, the identification of those peptides via mass spectrometry, and the mapping of the identified peptides back to the reference proteome to infer which possible proteins are identified. Reliable mapping of peptides to proteins still poses substantial challenges when considering similar proteins, protein families, splice isoforms, sequence variation, and possible residue mass modifications, combined with an imperfect and incomplete understanding of the proteome. The ProteoMapper tool enables a comprehensive and rapid mapping of peptides to a reference proteome. The indexer component creates a segmented index for an input proteome from a FASTA or PEFF file. The ProMaST component provides ultrafast mapping of one or more input peptides against the index. ProteoMapper allows searches that take into account known sequence variation encoded in PEFF files. It also enables fuzzy searches to find highly similar peptides with residue order changes or other isobaric or near-isobaric substitutions within a specified mass tolerance. We demonstrate an example of a one-hit-wonder identification in PeptideAtlas that may be better explained by a combination of catalogued and uncatalogued sequence variation in another highly observed protein. ProteoMapper is a free and open source, available for local use after downloading, embedding in other applications, as an online web tool at http://www.peptideatlas.org/map , and as a web service.
Collapse
Affiliation(s)
- Luis Mendoza
- Institute for Systems Biology , 401 Terry Ave North , Seattle , Washington 98109 , United States
| | - Eric W Deutsch
- Institute for Systems Biology , 401 Terry Ave North , Seattle , Washington 98109 , United States
| | - Zhi Sun
- Institute for Systems Biology , 401 Terry Ave North , Seattle , Washington 98109 , United States
| | - David S Campbell
- Institute for Systems Biology , 401 Terry Ave North , Seattle , Washington 98109 , United States
| | - David D Shteynberg
- Institute for Systems Biology , 401 Terry Ave North , Seattle , Washington 98109 , United States
| | - Robert L Moritz
- Institute for Systems Biology , 401 Terry Ave North , Seattle , Washington 98109 , United States
| |
Collapse
|
137
|
Discrimination and quantification of homologous keratins from goat and sheep with dual protease digestion and PRM assays. J Proteomics 2018; 186:38-46. [DOI: 10.1016/j.jprot.2018.07.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 07/03/2018] [Accepted: 07/13/2018] [Indexed: 01/25/2023]
|
138
|
Yi X, Wang B, An Z, Gong F, Li J, Fu Y. Quality control of single amino acid variations detected by tandem mass spectrometry. J Proteomics 2018; 187:144-151. [PMID: 30012419 DOI: 10.1016/j.jprot.2018.07.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 06/26/2018] [Accepted: 07/02/2018] [Indexed: 02/04/2023]
Abstract
Study of single amino acid variations (SAVs) of proteins, resulting from single nucleotide polymorphisms, is of great importance for understanding the relationships between genotype and phenotype. In mass spectrometry based shotgun proteomics, identification of peptides with SAVs often suffers from high error rates on the variant sites detected. These site errors are due to multiple reasons and can be confirmed by manual inspection or genomic sequencing. Here, we present a software tool, named SAVControl, for site-level quality control of variant peptide identifications. It mainly includes strict false discovery rate control of variant peptide identifications and variant site verification by unrestrictive mass shift relocalization. SAVControl was validated on three colorectal adenocarcinoma cell line datasets with genomic sequencing evidences and tested on a colorectal cancer dataset from The Cancer Genome Atlas. The results show that SAVControl can effectively remove false detections of SAVs. SIGNIFICANCE Protein sequence variations caused by single nucleotide polymorphisms (SNPs) are single amino acid variations (SAVs). The investigation of SAVs may provide a chance for understanding the relationships between genotype and phenotype. Mass spectrometry (MS) based proteomics provides a large-scale way to detect SAVs. However, using the current analysis strategy to detect SAVs may lead to high rate of false positives. The SAVControl we present here is a computational workflow and software tool for site-level quality control of SAVs detected by MS. It accesses the confidence of detected variant sites by relocating the mass shift responsible for an SAV to search for alternative interpretations. In addition, it uses a strict false discovery rate control method for variant peptide identifications. The advantages of SAVControl were demonstrated on three colorectal adenocarcinoma cell line datasets and a colorectal cancer dataset. We believe that SAVControl will be a powerful tool for computational proteomics and proteogenomics.
Collapse
Affiliation(s)
- Xinpei Yi
- NCMIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bo Wang
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhiwu An
- NCMIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fuzhou Gong
- NCMIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Jing Li
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
| | - Yan Fu
- NCMIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
139
|
Abstract
We consider the problem of controlling the FDR among discoveries from searching an incomplete database. This problem differs from the classical multiple testing setting because there are two different types of false discoveries: those arising from objects that have no match in the database and those that are incorrectly matched. We show that commonly used FDR controlling procedures are inadequate for this setup, a special case of which is tandem mass spectrum identification. We then derive a novel FDR controlling approach which extensive simulations suggest is unbiased. We also compare its performance with problem-specific as well as general FDR controlling procedures using both simulated and real mass spectrometry data.
Collapse
Affiliation(s)
- Uri Keich
- School of Mathematics and Statistics F07, University of Sydney
| | - William Stafford Noble
- Departments of Genome Sciences and of Computer Science and Engineering, University of Washington
| |
Collapse
|
140
|
Levitsky LI, Ivanov MV, Lobas AA, Bubis JA, Tarasova IA, Solovyeva EM, Pridatchenko ML, Gorshkov MV. IdentiPy: An Extensible Search Engine for Protein Identification in Shotgun Proteomics. J Proteome Res 2018; 17:2249-2255. [PMID: 29682971 DOI: 10.1021/acs.jproteome.7b00640] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We present an open-source, extensible search engine for shotgun proteomics. Implemented in Python programming language, IdentiPy shows competitive processing speed and sensitivity compared with the state-of-the-art search engines. It is equipped with a user-friendly web interface, IdentiPy Server, enabling the use of a single server installation accessed from multiple workstations. Using a simplified version of X!Tandem scoring algorithm and its novel "autotune" feature, IdentiPy outperforms the popular alternatives on high-resolution data sets. Autotune adjusts the search parameters for the particular data set, resulting in improved search efficiency and simplifying the user experience. IdentiPy with the autotune feature shows higher sensitivity compared with the evaluated search engines. IdentiPy Server has built-in postprocessing and protein inference procedures and provides graphic visualization of the statistical properties of the data set and the search results. It is open-source and can be freely extended to use third-party scoring functions or processing algorithms and allows customization of the search workflow for specialized applications.
Collapse
Affiliation(s)
- Lev I Levitsky
- Moscow Institute of Physics and Technology , 9 Institutskiy per. , Dolgoprudny , Moscow Region 141700 , Russian Federation.,V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Mark V Ivanov
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Anna A Lobas
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Julia A Bubis
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Irina A Tarasova
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Elizaveta M Solovyeva
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Marina L Pridatchenko
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Mikhail V Gorshkov
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| |
Collapse
|
141
|
Creech AL, Ting YS, Goulding SP, Sauld JF, Barthelme D, Rooney MS, Addona TA, Abelin JG. The Role of Mass Spectrometry and Proteogenomics in the Advancement of HLA Epitope Prediction. Proteomics 2018; 18:e1700259. [PMID: 29314742 PMCID: PMC6033110 DOI: 10.1002/pmic.201700259] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 12/12/2017] [Indexed: 12/30/2022]
Abstract
A challenge in developing personalized cancer immunotherapies is the prediction of putative cancer-specific antigens. Currently, predictive algorithms are used to infer binding of peptides to human leukocyte antigen (HLA) heterodimers to aid in the selection of putative epitope targets. One drawback of current epitope prediction algorithms is that they are trained on datasets containing biochemical HLA-peptide binding data that may not completely capture the rules associated with endogenous processing and presentation. The field of MS has made great improvements in instrumentation speed and sensitivity, chromatographic resolution, and proteogenomic database search strategies to facilitate the identification of HLA-ligands from a variety of cell types and tumor tissues. As such, these advances have enabled MS profiling of HLA-binding peptides to be a tractable, orthogonal approach to lower throughput biochemical assays for generating comprehensive datasets to train epitope prediction algorithms. In this review, we will highlight the progress made in the field of HLA-ligand profiling enabled by MS and its impact on current and future epitope prediction strategies.
Collapse
|
142
|
Freudenmann LK, Marcu A, Stevanović S. Mapping the tumour human leukocyte antigen (HLA) ligandome by mass spectrometry. Immunology 2018; 154:331-345. [PMID: 29658117 DOI: 10.1111/imm.12936] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 03/29/2018] [Accepted: 04/02/2018] [Indexed: 12/13/2022] Open
Abstract
The entirety of human leukocyte antigen (HLA)-presented peptides is referred to as the HLA ligandome of a cell or tissue, in tumours often termed immunopeptidome. Mapping the tumour immunopeptidome by mass spectrometry (MS) comprehensively views the pathophysiologically relevant antigenic signature of human malignancies. MS is an unbiased approach stringently filtering the candidates to be tested as opposed to epitope prediction algorithms. In the setting of peptide-specific immunotherapies, MS-based strategies significantly diminish the risk of lacking clinical benefit, as they yield highly enriched amounts of truly presented peptides. Early immunopeptidomic efforts were severely limited by technical sensitivity and manual spectra interpretation. The technological progress with development of orbitrap mass analysers and enhanced chromatographic performance led to vast improvements in mass accuracy, sensitivity, resolution, and speed. Concomitantly, bioinformatic tools were developed to process MS data, integrate sequencing results, and deconvolute multi-allelic datasets. This enabled the immense advancement of tumour immunopeptidomics. Studying the HLA-presented peptide repertoire bears high potential for both answering basic scientific questions and translational application. Mapping the tumour HLA ligandome has started to significantly contribute to target identification for the design of peptide-specific cancer immunotherapies in clinical trials and compassionate need treatments. In contrast to prediction algorithms, rare HLA allotypes and HLA class II can be adequately addressed when choosing MS-guided target identification platforms. Herein, we review the identification of tumour HLA ligands focusing on sources, methods, bioinformatic data analysis, translational application, and provide an outlook on future developments.
Collapse
Affiliation(s)
- Lena Katharina Freudenmann
- Interfaculty Institute for Cell Biology, Department of Immunology, University of Tübingen, Tübingen, Germany.,DKFZ Partner Site Tübingen, German Cancer Consortium (DKTK), Tübingen, Germany
| | - Ana Marcu
- Interfaculty Institute for Cell Biology, Department of Immunology, University of Tübingen, Tübingen, Germany
| | - Stefan Stevanović
- Interfaculty Institute for Cell Biology, Department of Immunology, University of Tübingen, Tübingen, Germany.,DKFZ Partner Site Tübingen, German Cancer Consortium (DKTK), Tübingen, Germany
| |
Collapse
|
143
|
Révész Á, Rokob TA, Jeanne Dit Fouque D, Turiák L, Memboeuf A, Vékey K, Drahos L. Selection of Collision Energies in Proteomics Mass Spectrometry Experiments for Best Peptide Identification: Study of Mascot Score Energy Dependence Reveals Double Optimum. J Proteome Res 2018; 17:1898-1906. [PMID: 29607649 DOI: 10.1021/acs.jproteome.7b00912] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Collision energy is a key parameter determining the information content of beam-type collision induced dissociation tandem mass spectrometry (MS/MS) spectra, and its optimal choice largely affects successful peptide and protein identification in MS-based proteomics. For an MS/MS spectrum, quality of peptide match based on sequence database search, often characterized in terms of a single score, is a complex function of spectrum characteristics, and its collision energy dependence has remained largely unexplored. We carried out electrospray ionization-quadrupole-time of flight (ESI-Q-TOF)-MS/MS measurements on 2807 peptides from tryptic digests of HeLa and E. coli at 21 different collision energies. Agglomerative clustering of the resulting Mascot score versus energy curves revealed that only few of them display a single, well-defined maximum; rather, they feature either a broad plateau or two clear peaks. Nonlinear least-squares fitting of one or two Gaussian functions allowed the characteristic energies to be determined. We found that the double peaks and the plateaus in Mascot score can be associated with the different energy dependence of b- and y-type fragment ion intensities. We determined that the energies for optimum Mascot scores follow separate linear trends for the unimodal and bimodal cases with rather large residual variance even after differences in proton mobility are taken into account. This leaves room for experiment optimization and points to the possible influence of further factors beyond m/ z.
Collapse
Affiliation(s)
| | | | - Dany Jeanne Dit Fouque
- UMR CNRS 6521, CEMCA , Université de Bretagne Occidentale , 6 Av. Le Gorgeu , 29238 Brest Cedex 3 , France
| | | | - Antony Memboeuf
- UMR CNRS 6521, CEMCA , Université de Bretagne Occidentale , 6 Av. Le Gorgeu , 29238 Brest Cedex 3 , France
| | | | | |
Collapse
|
144
|
Manes NP, Nita-Lazar A. Application of targeted mass spectrometry in bottom-up proteomics for systems biology research. J Proteomics 2018; 189:75-90. [PMID: 29452276 DOI: 10.1016/j.jprot.2018.02.008] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 01/25/2018] [Accepted: 02/07/2018] [Indexed: 02/08/2023]
Abstract
The enormous diversity of proteoforms produces tremendous complexity within cellular proteomes, facilitates intricate networks of molecular interactions, and constitutes a formidable analytical challenge for biomedical researchers. Currently, quantitative whole-proteome profiling often relies on non-targeted liquid chromatography-mass spectrometry (LC-MS), which samples proteoforms broadly, but can suffer from lower accuracy, sensitivity, and reproducibility compared with targeted LC-MS. Recent advances in bottom-up proteomics using targeted LC-MS have enabled previously unachievable identification and quantification of target proteins and posttranslational modifications within complex samples. Consequently, targeted LC-MS is rapidly advancing biomedical research, especially systems biology research in diverse areas that include proteogenomics, interactomics, kinomics, and biological pathway modeling. With the recent development of targeted LC-MS assays for nearly the entire human proteome, targeted LC-MS is positioned to enable quantitative proteomic profiling of unprecedented quality and accessibility to support fundamental and clinical research. Here we review recent applications of bottom-up proteomics using targeted LC-MS for systems biology research. SIGNIFICANCE: Advances in targeted proteomics are rapidly advancing systems biology research. Recent applications include systems-level investigations focused on posttranslational modifications (such as phosphoproteomics), protein conformation, protein-protein interaction, kinomics, proteogenomics, and metabolic and signaling pathways. Notably, absolute quantification of metabolic and signaling pathway proteins has enabled accurate pathway modeling and engineering. Integration of targeted proteomics with other technologies, such as RNA-seq, has facilitated diverse research such as the identification of hundreds of "missing" human proteins (genes and transcripts that appear to encode proteins but direct experimental evidence was lacking).
Collapse
Affiliation(s)
- Nathan P Manes
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Aleksandra Nita-Lazar
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
145
|
|
146
|
Kou Q, Wu S, Tolic N, Paša-Tolic L, Liu Y, Liu X. A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra. Bioinformatics 2018; 33:1309-1316. [PMID: 28453668 DOI: 10.1093/bioinformatics/btw806] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 12/15/2016] [Indexed: 11/14/2022] Open
Abstract
Motivation Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. Results We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms. Availability and implementation http://proteomics.informatics.iupui.edu/software/topmg/. Contact xwliu@iupui.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA
| | - Nikola Tolic
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Ljiljana Paša-Tolic
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
147
|
Fort KL, Cramer CN, Voinov VG, Vasil'ev YV, Lopez NI, Beckman JS, Heck AJR. Exploring ECD on a Benchtop Q Exactive Orbitrap Mass Spectrometer. J Proteome Res 2017; 17:926-933. [PMID: 29249155 PMCID: PMC5799867 DOI: 10.1021/acs.jproteome.7b00622] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
As the application of mass spectrometry intensifies in scope and diversity, the need for advanced instrumentation addressing a wide variety of analytical needs also increases. To this end, many modern, top-end mass spectrometers are designed or modified to include a wider range of fragmentation technologies, for example, ECD, ETD, EThcD, and UVPD. Still, the majority of instrument platforms are limited to more conventional methods, such as CID and HCD. While these latter methods have performed well, the less conventional fragmentation methods have been shown to lead to increased information in many applications including middle-down proteomics, top-down proteomics, glycoproteomics, and disulfide bond mapping. We describe the modification of the popular Q Exactive Orbitrap mass spectrometer to extend its fragmentation capabilities to include ECD. We show that this modification allows ≥85% matched ion intensity to originate from ECD fragment ion types as well as provides high sequence coverage (≥60%) of intact proteins and high fragment identification rates with ∼70% of ion signals matched. Finally, the ECD implementation promotes selective disulfide bond dissociation, facilitating the identification of disulfide-linked peptide conjugates. Collectively, this modification extends the capabilities of the Q Exactive Orbitrap mass spectrometer to a range of new applications.
Collapse
Affiliation(s)
- Kyle L Fort
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University , Utrecht 3584 CH, The Netherlands.,Netherlands Proteomics Center , Utrecht 3584 CH, The Netherlands
| | - Christian N Cramer
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University , Utrecht 3584 CH, The Netherlands.,Protein Engineering, Global Research Novo Nordisk A/S , Novo Nordisk Park, 2760 Måløv, Denmark.,Proteomics Program, The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen , 2200 Copenhagen, Denmark
| | - Valery G Voinov
- e-MSion, Inc. , 2121 NE Jack London Drive, Corvallis, Oregon 97330, United States.,Linus Pauling Institute, Department of Biochemistry and Biophysics, Oregon State University , Corvallis, Oregon 97331, United States
| | - Yury V Vasil'ev
- e-MSion, Inc. , 2121 NE Jack London Drive, Corvallis, Oregon 97330, United States.,Linus Pauling Institute, Department of Biochemistry and Biophysics, Oregon State University , Corvallis, Oregon 97331, United States
| | - Nathan I Lopez
- e-MSion, Inc. , 2121 NE Jack London Drive, Corvallis, Oregon 97330, United States
| | - Joseph S Beckman
- e-MSion, Inc. , 2121 NE Jack London Drive, Corvallis, Oregon 97330, United States.,Linus Pauling Institute, Department of Biochemistry and Biophysics, Oregon State University , Corvallis, Oregon 97331, United States
| | - Albert J R Heck
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University , Utrecht 3584 CH, The Netherlands.,Netherlands Proteomics Center , Utrecht 3584 CH, The Netherlands
| |
Collapse
|
148
|
Burger T. Gentle Introduction to the Statistical Foundations of False Discovery Rate in Quantitative Proteomics. J Proteome Res 2017; 17:12-22. [DOI: 10.1021/acs.jproteome.7b00170] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Thomas Burger
- BIG-BGE (Université Grenoble-Alpes,
CNRS, CEA, INSERM), Grenoble 38000, France
| |
Collapse
|
149
|
Starr AE, Deeke SA, Li L, Zhang X, Daoud R, Ryan J, Ning Z, Cheng K, Nguyen LVH, Abou-Samra E, Lavallée-Adam M, Figeys D. Proteomic and Metaproteomic Approaches to Understand Host–Microbe Interactions. Anal Chem 2017; 90:86-109. [DOI: 10.1021/acs.analchem.7b04340] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Amanda E. Starr
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Shelley A. Deeke
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Leyuan Li
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Xu Zhang
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Rachid Daoud
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - James Ryan
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Zhibin Ning
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Kai Cheng
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Linh V. H. Nguyen
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Elias Abou-Samra
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Mathieu Lavallée-Adam
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
| | - Daniel Figeys
- Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada
- Molecular Architecture of Life Program, Canadian Institute for Advanced Research, Toronto, Ontario, M5G 1M1, Canada
| |
Collapse
|
150
|
Dorl S, Winkler S, Mechtler K, Dorfer V. PhoStar: Identifying Tandem Mass Spectra of Phosphorylated Peptides before Database Search. J Proteome Res 2017; 17:290-295. [PMID: 29057658 DOI: 10.1021/acs.jproteome.7b00563] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Standard proteomics workflows use tandem mass spectrometry followed by sequence database search to analyze complex biological samples. The identification of proteins carrying post-translational modifications, for example, phosphorylation, is typically addressed by allowing variable modifications in the searched sequences. Accounting for these variations exponentially increases the combinatorial space in the database, which leads to increased processing times and more false positive identifications. The here-presented tool PhoStar identifies spectra that originate from phosphorylated peptides before database search using a supervised machine learning approach. The model for the prediction of phosphorylation was trained and validated with an accuracy of 97.6% on a large set of high-confidence spectra collected from publicly available experimental data. Its power was further validated by predicting phosphorylation in the complete NIST human and mouse high collision-dissociation spectral libraries, achieving an accuracy of 98.2 and 97.9%, respectively. We demonstrate the application of PhoStar by using it for spectra filtering before database search. In database search of HeLa samples the peptide search space was reduced by 27-66% while finding at least 97% of total peptide identifications (at 1% FDR) compared with a standard workflow.
Collapse
Affiliation(s)
- Sebastian Dorl
- University of Applied Sciences Upper Austria , Bioinformatics Research Group, Softwarepark 11, 4232 Hagenberg, Austria
| | - Stephan Winkler
- University of Applied Sciences Upper Austria , Bioinformatics Research Group, Softwarepark 11, 4232 Hagenberg, Austria
| | - Karl Mechtler
- Research Institute of Molecular Pathology (IMP) , Protein Chemistry, Campus-Vienna-Biocenter 1, 1030 Vienna, Austria.,Institute of Molecular Biotechnology (IMBA), Protein Chemistry , Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Viktoria Dorfer
- University of Applied Sciences Upper Austria , Bioinformatics Research Group, Softwarepark 11, 4232 Hagenberg, Austria
| |
Collapse
|