1
|
Peng Y, Jain S, Radivojac P. An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics. Bioinformatics 2024; 40:i428-i436. [PMID: 38940171 DOI: 10.1093/bioinformatics/btae233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Cross-linking tandem mass spectrometry (XL-MS/MS) is an established analytical platform used to determine distance constraints between residues within a protein or from physically interacting proteins, thus improving our understanding of protein structure and function. To aid biological discovery with XL-MS/MS, it is essential that pairs of chemically linked peptides be accurately identified, a process that requires: (i) database search, that creates a ranked list of candidate peptide pairs for each experimental spectrum and (ii) false discovery rate (FDR) estimation, that determines the probability of a false match in a group of top-ranked peptide pairs with scores above a given threshold. Currently, the only available FDR estimation mechanism in XL-MS/MS is the target-decoy approach (TDA). However, despite its simplicity, TDA has both theoretical and practical limitations that impact the estimation accuracy and increase run time over potential decoy-free approaches (DFAs). RESULTS We introduce a novel decoy-free framework for FDR estimation in XL-MS/MS. Our approach relies on multi-sample mixtures of skew normal distributions, where the latent components correspond to the scores of correct peptide pairs (both peptides identified correctly), partially incorrect peptide pairs (one peptide identified correctly, the other incorrectly), and incorrect peptide pairs (both peptides identified incorrectly). To learn these components, we exploit the score distributions of first- and second-ranked peptide-spectrum matches for each experimental spectrum and subsequently estimate FDR using a novel expectation-maximization algorithm with constraints. We evaluate the method on ten datasets and provide evidence that the proposed DFA is theoretically sound and a viable alternative to TDA owing to its good performance in terms of accuracy, variance of estimation, and run time. AVAILABILITY AND IMPLEMENTATION https://github.com/shawn-peng/xlms.
Collapse
Affiliation(s)
- Yisu Peng
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Shantanu Jain
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
- The Institute for Experiential AI, Northeastern University, Boston, MA 02115, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| |
Collapse
|
2
|
Dhakal B, Li CMY, Ramezanpour M, Houtak G, Li R, Bouras G, Collela A, Chegeni N, Chataway TK, Drew P, Sallustio BC, Vreugde S, Smith E, Maddern G, Licari G, Fenix K. Proteomic characterisation of perhexiline treatment on THP-1 M1 macrophage differentiation. Front Immunol 2023; 14:1054588. [PMID: 36993962 PMCID: PMC10040681 DOI: 10.3389/fimmu.2023.1054588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 02/21/2023] [Indexed: 03/16/2023] Open
Abstract
BackgroundDysregulated inflammation is important in the pathogenesis of many diseases including cancer, allergy, and autoimmunity. Macrophage activation and polarisation are commonly involved in the initiation, maintenance and resolution of inflammation. Perhexiline (PHX), an antianginal drug, has been suggested to modulate macrophage function, but the molecular effects of PHX on macrophages are unknown. In this study we investigated the effect of PHX treatment on macrophage activation and polarization and reveal the underlying proteomic changes induced.MethodsWe used an established protocol to differentiate human THP-1 monocytes into M1 or M2 macrophages involving three distinct, sequential stages (priming, rest, and differentiation). We examined the effect of PHX treatment at each stage on the polarization into either M1 or M2 macrophages using flow cytometry, quantitative polymerase chain reaction (qPCR) and enzyme linked immunosorbent assay (ELISA). Quantitative changes in the proteome were investigated using data independent acquisition mass spectrometry (DIA MS).ResultsPHX treatment promoted M1 macrophage polarization, including increased STAT1 and CCL2 expression and IL-1β secretion. This effect occurred when PHX was added at the differentiation stage of the M1 cultures. Proteomic profiling of PHX treated M1 cultures identified changes in metabolic (fatty acid metabolism, cholesterol homeostasis and oxidative phosphorylation) and immune signalling (Receptor Tyrosine Kinase, Rho GTPase and interferon) pathways.ConclusionThis is the first study to report on the action of PHX on THP-1 macrophage polarization and the associated changes in the proteome of these cells.
Collapse
Affiliation(s)
- Bimala Dhakal
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
| | - Celine Man Ying Li
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
| | - Mahnaz Ramezanpour
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
- Department of Surgery-Otolaryngology Head and Neck Surgery, Central Adelaide Local Health Network, Adelaide, SA, Australia
| | - Ghais Houtak
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
- Department of Surgery-Otolaryngology Head and Neck Surgery, Central Adelaide Local Health Network, Adelaide, SA, Australia
| | - Runhao Li
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
- Medical Oncology, The Queen Elizabeth Hospital, Adelaide, SA, Australia
| | - George Bouras
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
- Department of Surgery-Otolaryngology Head and Neck Surgery, Central Adelaide Local Health Network, Adelaide, SA, Australia
| | - Alex Collela
- Flinders Omics Facility, Department of Human Physiology, Flinders University, Adelaide, SA, Australia
| | - Nusha Chegeni
- Flinders Omics Facility, Department of Human Physiology, Flinders University, Adelaide, SA, Australia
| | - Tim Kennion Chataway
- Flinders Omics Facility, Department of Human Physiology, Flinders University, Adelaide, SA, Australia
| | - Paul Drew
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
| | - Benedetta C. Sallustio
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
- Discipline of Pharmacology, School of Biomedicine, The University of Adelaide, Adelaide, SA, Australia
| | - Sarah Vreugde
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
- Department of Surgery-Otolaryngology Head and Neck Surgery, Central Adelaide Local Health Network, Adelaide, SA, Australia
| | - Eric Smith
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
- Medical Oncology, The Queen Elizabeth Hospital, Adelaide, SA, Australia
| | - Guy Maddern
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
| | - Giovanni Licari
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
- Discipline of Pharmacology, School of Biomedicine, The University of Adelaide, Adelaide, SA, Australia
| | - Kevin Fenix
- Discipline of Surgery, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- The Basil Hetzel Institute for Translational Health Research, The Queen Elizabeth Hospital, Adelaide, SA, Australia
- Department of Surgery-Otolaryngology Head and Neck Surgery, Central Adelaide Local Health Network, Adelaide, SA, Australia
- *Correspondence: Kevin Fenix,
| |
Collapse
|
3
|
Boekweg H, Payne SH. Challenges and opportunities for single cell computational proteomics. Mol Cell Proteomics 2023; 22:100518. [PMID: 36828128 PMCID: PMC10060113 DOI: 10.1016/j.mcpro.2023.100518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/15/2023] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
Single-cell proteomics is growing rapidly and has made several technological advancements. As most research has been focused on improving instrumentation and sample preparation methods, very little attention has been given to algorithms responsible for identifying and quantifying proteins. Given the inherent difference between bulk data and single-cell data, it's necessary to realize that current algorithms being employed on single-cell data were designed for bulk data, and have underlying assumptions that may not hold true for single-cell data. In order to develop and optimize algorithms for single-cell data, we need to characterize the differences between single-cell data and bulk data, and assess how current algorithms perform on single-cell data. Here, we present a review of algorithms responsible for identifying and quantifying peptides and proteins. We will give a review of how each type of algorithm works, assumptions it relies on, how it performs on single-cell data, and possible optimizations and solutions that could be used to address the differences in single-cell data.
Collapse
Affiliation(s)
- Hannah Boekweg
- Biology Department, Brigham Young University, Provo, Utah, USA
| | - Samuel H Payne
- Biology Department, Brigham Young University, Provo, Utah, USA.
| |
Collapse
|
4
|
Tsiamis V, Schwämmle V. VIQoR: a web service for visually supervised protein inference and protein quantification. Bioinformatics 2022; 38:2757-2764. [PMID: 35561162 DOI: 10.1093/bioinformatics/btac182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 03/07/2022] [Accepted: 03/22/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In quantitative bottom-up mass spectrometry (MS)-based proteomics, the reliable estimation of protein concentration changes from peptide quantifications between different biological samples is essential. This estimation is not a single task but comprises the two processes of protein inference and protein abundance summarization. Furthermore, due to the high complexity of proteomics data and associated uncertainty about the performance of these processes, there is a demand for comprehensive visualization methods able to integrate protein with peptide quantitative data including their post-translational modifications. Hence, there is a lack of a suitable tool that provides post-identification quantitative analysis of proteins with simultaneous interactive visualization. RESULTS In this article, we present VIQoR, a user-friendly web service that accepts peptide quantitative data of both labeled and label-free experiments and accomplishes the crucial components protein inference and summarization and interactive visualization modules, including the novel VIQoR plot. We implemented two different parsimonious algorithms to solve the protein inference problem, while protein summarization is facilitated by a well-established factor analysis algorithm called fast-FARMS followed by a weighted average summarization function that minimizes the effect of missing values. In addition, summarization is optimized by the so-called Global Correlation Indicator (GCI). We test the tool on three publicly available ground truth datasets and demonstrate the ability of the protein inference algorithms to handle shared peptides. We furthermore show that GCI increases the accuracy of the quantitative analysis in datasets with replicated design. AVAILABILITY AND IMPLEMENTATION VIQoR is accessible at: http://computproteomics.bmb.sdu.dk/Apps/VIQoR/. The source code is available at: https://bitbucket.org/veitveit/viqor/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vasileios Tsiamis
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| |
Collapse
|
5
|
Siqueira JF, Rôças IN. A critical analysis of research methods and experimental models to study the root canal microbiome. Int Endod J 2021; 55 Suppl 1:46-71. [PMID: 34714548 DOI: 10.1111/iej.13656] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 10/22/2021] [Accepted: 10/27/2021] [Indexed: 12/15/2022]
Abstract
Endodontic microbiology deals with the study of the microbial aetiology and pathogenesis of pulpal and periradicular inflammatory diseases. Research in endodontic microbiology started almost 130 years ago and since then has mostly focussed on establishing and confirming the infectious aetiology of apical periodontitis, identifying the microbial species associated with the different types of endodontic infections and determining the efficacy of treatment procedures in eradicating or controlling infection. Diverse analytical methods have been used over the years, each one with their own advantages and limitations. In this review, the main features and applications of the most used technologies are discussed, and advice is provided to improve study designs in order to properly address the scientific questions and avoid setbacks that can compromise the results. Finally, areas of future research are described.
Collapse
Affiliation(s)
- José F Siqueira
- Department of Endodontics and Molecular Microbiology Laboratory, Faculty of Dentistry, Grande Rio University, Rio de Janeiro, Brazil.,Department of Dental Research, Faculty of Dentistry, Iguaçu University (UNIG), Nova Iguaçu, Brazil
| | - Isabela N Rôças
- Department of Endodontics and Molecular Microbiology Laboratory, Faculty of Dentistry, Grande Rio University, Rio de Janeiro, Brazil.,Department of Dental Research, Faculty of Dentistry, Iguaçu University (UNIG), Nova Iguaçu, Brazil
| |
Collapse
|
6
|
Klann K, Tascher G, Münch C. Virus systems biology: Proteomics profiling of dynamic protein networks during infection. Adv Virus Res 2021; 109:1-29. [PMID: 33934824 DOI: 10.1016/bs.aivir.2020.12.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The host cell proteome undergoes a variety of dynamic changes during viral infection, elicited by the virus itself or host cell defense mechanisms. Studying these changes on a global scale by integrating functional and physical interactions within protein networks during infection is an important tool to understand pathology. Indeed, proteomics studies dissecting protein signaling cascades and interaction networks upon infection showed how global information can significantly improve understanding of disease mechanisms of diverse viral infections. Here, we summarize and give examples of different experimental designs, proteomics approaches and bioinformatics analyses that allow profiling proteome changes and host-pathogen interactions to gain a molecular systems view of viral infection.
Collapse
Affiliation(s)
- Kevin Klann
- Institute of Biochemistry II, Faculty of Medicine, Goethe University, Frankfurt am Main, Germany
| | - Georg Tascher
- Institute of Biochemistry II, Faculty of Medicine, Goethe University, Frankfurt am Main, Germany
| | - Christian Münch
- Institute of Biochemistry II, Faculty of Medicine, Goethe University, Frankfurt am Main, Germany; Frankfurt Cancer Institute, Frankfurt am Main, Germany; Cardio-Pulmonary Institute, Frankfurt am Main, Germany.
| |
Collapse
|
7
|
Peng Y, Jain S, Li YF, Greguš M, Ivanov AR, Vitek O, Radivojac P. New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics. Bioinformatics 2020; 36:i745-i753. [PMID: 33381824 DOI: 10.1093/bioinformatics/btaa807] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs) have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation-efficiency and low signal-to-noise-ratio spectra. RESULTS We introduce a new decoy-free framework for FDR estimation that generalizes present DFAs while exploiting more search data in a manner similar to TDAs. Our approach relies on multi-component mixtures, in which score distributions corresponding to the correct PSMs, best incorrect PSMs and second-best incorrect PSMs are modeled by the skew normal family. We derive EM algorithms to estimate parameters of these distributions from the scores of best and second-best PSMs associated with each experimental spectrum. We evaluate our models on multiple proteomics datasets and a HeLa cell digest case study consisting of more than a million spectra in total. We provide evidence of improved performance over existing DFAs and improved stability and speed over TDAs without any performance degradation. We propose that the new strategy has the potential to extend beyond peptide identification and reduce the need for TDA on all analytical platforms. AVAILABILITYAND IMPLEMENTATION https://github.com/shawn-peng/FDR-estimation. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yisu Peng
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Shantanu Jain
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | | | - Michal Greguš
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.,Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA 02115, USA
| | - Alexander R Ivanov
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.,Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA 02115, USA
| | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA.,Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA 02115, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA.,Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.,Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
8
|
Esser-Skala W, Segl M, Wohlschlager T, Reisinger V, Holzmann J, Huber CG. Exploring sample preparation and data evaluation strategies for enhanced identification of host cell proteins in drug products of therapeutic antibodies and Fc-fusion proteins. Anal Bioanal Chem 2020; 412:6583-6593. [PMID: 32691086 PMCID: PMC7442769 DOI: 10.1007/s00216-020-02796-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 06/22/2020] [Accepted: 06/30/2020] [Indexed: 01/17/2023]
Abstract
Manufacturing of biopharmaceuticals involves recombinant protein expression in host cells followed by extensive purification of the target protein. Yet, host cell proteins (HCPs) may persist in the final drug product, potentially reducing its quality with respect to safety and efficacy. Consequently, residual HCPs are closely monitored during downstream processing by techniques such as enzyme-linked immunosorbent assay (ELISA) or high-performance liquid chromatography combined with tandem mass spectrometry (HPLC-MS/MS). The latter is especially attractive as it provides information with respect to protein identities. Although the applied HPLC-MS/MS methodologies are frequently optimized with respect to HCP identification, acquired data is typically analyzed using standard settings. Here, we describe an improved strategy for evaluating HPLC-MS/MS data of HCP-derived peptides, involving probabilistic protein inference and peptide detection in the absence of fragment ion spectra. This data analysis workflow was applied to data obtained for drug products of various biotherapeutics upon protein A affinity depletion. The presented data evaluation strategy enabled in-depth comparative analysis of the HCP repertoires identified in drug products of the monoclonal antibodies rituximab and bevacizumab, as well as the fusion protein etanercept. In contrast to commonly applied ELISA strategies, the here presented workflow is process-independent and may be implemented into existing HPLC-MS/MS setups for drug product characterization and process development. Graphical abstract ![]()
Collapse
Affiliation(s)
- Wolfgang Esser-Skala
- Bioanalytical Research Labs, Department of Biosciences, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.,Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria
| | - Marius Segl
- Bioanalytical Research Labs, Department of Biosciences, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.,Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria
| | - Therese Wohlschlager
- Bioanalytical Research Labs, Department of Biosciences, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.,Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria
| | - Veronika Reisinger
- Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.,Technical Development Biosimilars, Global Drug Development, Novartis, Sandoz GmbH, Biochemiestraße 10, 6250, Kundl, Austria
| | - Johann Holzmann
- Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.,Technical Development Biosimilars, Global Drug Development, Novartis, Sandoz GmbH, Biochemiestraße 10, 6250, Kundl, Austria
| | - Christian G Huber
- Bioanalytical Research Labs, Department of Biosciences, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria. .,Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.
| |
Collapse
|
9
|
Harper MM, Rudd D, Meyer KJ, Kanthasamy AG, Anantharam V, Pieper AA, Vázquez-Rosa E, Shin MK, Chaubey K, Koh Y, Evans LP, Bassuk AG, Anderson MG, Dutca L, Kudva IT, John M. Identification of chronic brain protein changes and protein targets of serum auto-antibodies after blast-mediated traumatic brain injury. Heliyon 2020; 6:e03374. [PMID: 32099918 PMCID: PMC7029173 DOI: 10.1016/j.heliyon.2020.e03374] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 07/19/2019] [Accepted: 02/03/2020] [Indexed: 12/13/2022] Open
Abstract
In addition to needing acute emergency management, blast-mediated traumatic brain injury (TBI) is also a chronic disorder with delayed-onset symptoms that manifest and progress over time. While the immediate consequences of acute blast injuries are readily apparent, chronic sequelae are harder to recognize. Indeed, the identification of individuals with mild-TBI or TBI-induced symptoms is greatly impaired in large part due to the lack of objective and robust biomarkers. The purpose of this study was to address these need by identifying candidates for serum-based biomarkers of blast TBI, and also to identify unique or differentially regulated protein expression in the thalamus in C57BL/6J mice exposed to blast using high throughput qualitative screens of protein expression. To identify thalamic proteins differentially or uniquely associated with blast exposure, we utilized an antibody-based affinity-capture strategy (referred to as "proteomics-based analysis of depletomes"; PAD) to deplete thalamic lysates from blast-treated mice of endogenous thalamic proteins also found in control mice. Analysis of this "depletome" detected 75 unique proteins, many with associations to the myelin sheath. To identify blast-associated proteins eliciting production of circulating autoantibodies, serum antibodies of blast-treated mice were immobilized, and their immunogens subsequently identified by proteomic analysis of proteins specifically captured following incubation with thalamic lysates (a variant of a strategy referred to as "proteomics-based expression library screening"; PELS). This analysis identified 46 blast-associated immunogenic proteins, including 6 shared in common with the PAD analysis (ALDOA, PHKB, HBA-A1, DPYSL2, SYN1, and CKB). These proteins and their autoantibodies are appropriate for further consideration as biomarkers of blast-mediated TBI.
Collapse
Affiliation(s)
- Matthew M. Harper
- The Iowa City Department of Veterans Affairs Medical Center, Center for the Prevention and Treatment of Visual Loss, Iowa City, IA, USA
- The University of Iowa Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, USA
| | - Danielle Rudd
- The Iowa City Department of Veterans Affairs Medical Center, Center for the Prevention and Treatment of Visual Loss, Iowa City, IA, USA
| | - Kacie J. Meyer
- The University of Iowa Department of Molecular Physiology and Biophysics, University of Iowa, Iowa City, IA, USA
| | | | | | - Andrew A. Pieper
- Harrington Discovery Institute, University Hospitals of Cleveland, Department of Psychiatry Case Western Reserve University, Geriatric Research Education and Clinical Centers, Louis Stokes VA Medical Center, Cleveland, OH 44106, USA
| | - Edwin Vázquez-Rosa
- Harrington Discovery Institute, University Hospitals of Cleveland, Department of Psychiatry Case Western Reserve University, Geriatric Research Education and Clinical Centers, Louis Stokes VA Medical Center, Cleveland, OH 44106, USA
| | - Min-Kyoo Shin
- Harrington Discovery Institute, University Hospitals of Cleveland, Department of Psychiatry Case Western Reserve University, Geriatric Research Education and Clinical Centers, Louis Stokes VA Medical Center, Cleveland, OH 44106, USA
| | - Kalyani Chaubey
- Harrington Discovery Institute, University Hospitals of Cleveland, Department of Psychiatry Case Western Reserve University, Geriatric Research Education and Clinical Centers, Louis Stokes VA Medical Center, Cleveland, OH 44106, USA
| | - Yeojung Koh
- Harrington Discovery Institute, University Hospitals of Cleveland, Department of Psychiatry Case Western Reserve University, Geriatric Research Education and Clinical Centers, Louis Stokes VA Medical Center, Cleveland, OH 44106, USA
| | - Lucy P. Evans
- The University of Iowa Department of Pediatrics, University of Iowa, Iowa City, IA, USA
- The University of Iowa Department of Neurology, University of Iowa, Iowa City, IA, USA
- The University of Iowa Department of Medical Scientist Training Program, University of Iowa, Iowa City, IA, USA
| | - Alexander G. Bassuk
- The University of Iowa Department of Pediatrics, University of Iowa, Iowa City, IA, USA
- The University of Iowa Department of Neurology, University of Iowa, Iowa City, IA, USA
| | - Michael G. Anderson
- The Iowa City Department of Veterans Affairs Medical Center, Center for the Prevention and Treatment of Visual Loss, Iowa City, IA, USA
- The University of Iowa Department of Ophthalmology and Visual Sciences, University of Iowa, Iowa City, IA, USA
- The University of Iowa Department of Molecular Physiology and Biophysics, University of Iowa, Iowa City, IA, USA
| | - Laura Dutca
- The Iowa City Department of Veterans Affairs Medical Center, Center for the Prevention and Treatment of Visual Loss, Iowa City, IA, USA
| | - Indira T. Kudva
- Food Safety and Enteric Pathogens Research Unit, National Animal Disease Center, Agricultural Research Service, U.S. Department of Agriculture, Ames, IA, USA
| | | |
Collapse
|
10
|
Saltzman AB, Leng M, Bhatt B, Singh P, Chan DW, Dobrolecki L, Chandrasekaran H, Choi JM, Jain A, Jung SY, Lewis MT, Ellis MJ, Malovannaya A. gpGrouper: A Peptide Grouping Algorithm for Gene-Centric Inference and Quantitation of Bottom-Up Proteomics Data. Mol Cell Proteomics 2018; 17:2270-2283. [PMID: 30093420 DOI: 10.1074/mcp.tir118.000850] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 07/09/2018] [Indexed: 12/13/2022] Open
Abstract
In quantitative mass spectrometry, the method by which peptides are grouped into proteins can have dramatic effects on downstream analyses. Here we describe gpGrouper, an inference and quantitation algorithm that offers an alternative method for assignment of protein groups by gene locus and improves pseudo-absolute iBAQ quantitation by weighted distribution of shared peptide areas. We experimentally show that distributing shared peptide quantities based on unique peptide peak ratios improves quantitation accuracy compared with conventional winner-take-all scenarios. Furthermore, gpGrouper seamlessly handles two-species samples such as patient-derived xenografts (PDXs) without ignoring the host species or species-shared peptides. This is a critical capability for proper evaluation of proteomics data from PDX samples, where stromal infiltration varies across individual tumors. Finally, gpGrouper calculates peptide peak area (MS1) based expression estimates from multiplexed isobaric data, producing iBAQ results that are directly comparable across label-free, isotopic, and isobaric proteomics approaches.
Collapse
Affiliation(s)
- Alexander B Saltzman
- From the ‡Verna and Marrs McLean Department of Biochemistry and Molecular Biology
| | - Mei Leng
- From the ‡Verna and Marrs McLean Department of Biochemistry and Molecular Biology
| | - Bhoomi Bhatt
- From the ‡Verna and Marrs McLean Department of Biochemistry and Molecular Biology
| | - Purba Singh
- §Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030
| | - Doug W Chan
- §Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030
| | - Lacey Dobrolecki
- §Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030.,**Patient-Derived Xenograft and Advanced In Vivo Models Core
| | | | | | | | - Sung Y Jung
- From the ‡Verna and Marrs McLean Department of Biochemistry and Molecular Biology.,¶Mass Spectrometry Proteomics Core
| | - Michael T Lewis
- §Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030.,‖Dan L Duncan Comprehensive Cancer Center.,**Patient-Derived Xenograft and Advanced In Vivo Models Core.,‡‡Department of Molecular and Cellular Biology
| | - Matthew J Ellis
- §Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030.,‖Dan L Duncan Comprehensive Cancer Center.,‡‡Department of Molecular and Cellular Biology
| | - Anna Malovannaya
- From the ‡Verna and Marrs McLean Department of Biochemistry and Molecular Biology; .,‡‡Department of Molecular and Cellular Biology.,¶Mass Spectrometry Proteomics Core.,‖Dan L Duncan Comprehensive Cancer Center
| |
Collapse
|
11
|
Discrimination and quantification of homologous keratins from goat and sheep with dual protease digestion and PRM assays. J Proteomics 2018; 186:38-46. [DOI: 10.1016/j.jprot.2018.07.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 07/03/2018] [Accepted: 07/13/2018] [Indexed: 01/25/2023]
|
12
|
Hendy J, Warinner C, Bouwman A, Collins MJ, Fiddyment S, Fischer R, Hagan R, Hofman CA, Holst M, Chaves E, Klaus L, Larson G, Mackie M, McGrath K, Mundorff AZ, Radini A, Rao H, Trachsel C, Velsko IM, Speller CF. Proteomic evidence of dietary sources in ancient dental calculus. Proc Biol Sci 2018; 285:20180977. [PMID: 30051838 PMCID: PMC6083251 DOI: 10.1098/rspb.2018.0977] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Accepted: 06/25/2018] [Indexed: 12/18/2022] Open
Abstract
Archaeological dental calculus has emerged as a rich source of ancient biomolecules, including proteins. Previous analyses of proteins extracted from ancient dental calculus revealed the presence of the dietary milk protein β-lactoglobulin, providing direct evidence of dairy consumption in the archaeological record. However, the potential for calculus to preserve other food-related proteins has not yet been systematically explored. Here we analyse shotgun metaproteomic data from 100 archaeological dental calculus samples ranging from the Iron Age to the post-medieval period (eighth century BC to nineteenth century AD) in England, as well as 14 dental calculus samples from contemporary dental patients and recently deceased individuals, to characterize the range and extent of dietary proteins preserved in dental calculus. In addition to milk proteins, we detect proteomic evidence of foodstuffs such as cereals and plant products, as well as the digestive enzyme salivary amylase. We discuss the importance of optimized protein extraction methods, data analysis approaches and authentication strategies in the identification of dietary proteins from archaeological dental calculus. This study demonstrates that proteomic approaches can robustly identify foodstuffs in the archaeological record that are typically under-represented due to their poor macroscopic preservation.
Collapse
Affiliation(s)
- Jessica Hendy
- Department of Archaeology, Max Planck Institute for the Science of Human History, Jena, Germany
- BioArCh, Department of Archaeology, University of York, York, UK
| | - Christina Warinner
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
- Laboratories of Molecular Anthropology and Microbiome Research, Department of Anthropology, University of Oklahoma, Norman, USA
- Institute for Evolutionary Medicine, ETH-Zürich, University of Zürich, Zürich, Switzerland
- Department of Periodontology, College of Dentistry, University of Oklahoma Health Sciences Center, Oklahoma, OK, USA
| | - Abigail Bouwman
- Institute for Evolutionary Medicine, ETH-Zürich, University of Zürich, Zürich, Switzerland
| | - Matthew J Collins
- BioArCh, Department of Archaeology, University of York, York, UK
- EvoGenomics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Sarah Fiddyment
- BioArCh, Department of Archaeology, University of York, York, UK
| | - Roman Fischer
- Discovery Proteomics Facility, Target Discovery Institute, University of Oxford, Oxford, UK
| | - Richard Hagan
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
- Laboratories of Molecular Anthropology and Microbiome Research, Department of Anthropology, University of Oklahoma, Norman, USA
| | - Courtney A Hofman
- Laboratories of Molecular Anthropology and Microbiome Research, Department of Anthropology, University of Oklahoma, Norman, USA
| | - Malin Holst
- BioArCh, Department of Archaeology, University of York, York, UK
- York Osteoarchaeology Ltd, Bishop Wilton, York, UK
| | - Eros Chaves
- Department of Periodontology, College of Dentistry, University of Oklahoma Health Sciences Center, Oklahoma, OK, USA
- Pinellas Dental Specialties, Largo, FL 33776, USA
| | - Lauren Klaus
- Laboratories of Molecular Anthropology and Microbiome Research, Department of Anthropology, University of Oklahoma, Norman, USA
- Department of Periodontology, College of Dentistry, University of Oklahoma Health Sciences Center, Oklahoma, OK, USA
| | - Greger Larson
- The Palaeogenomics and Bio-Archaeology Research Network, Research Laboratory for Archaeology and the History of Art, University of Oxford, Oxford, UK
| | - Meaghan Mackie
- EvoGenomics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Krista McGrath
- BioArCh, Department of Archaeology, University of York, York, UK
| | - Amy Z Mundorff
- Department of Anthropology, College of Arts and Sciences, University of Tennessee, Knoxville, TN, USA
| | - Anita Radini
- BioArCh, Department of Archaeology, University of York, York, UK
| | - Huiyun Rao
- Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Christian Trachsel
- Functional Genomics Center, ETH-Zürich, University of Zürich, Zürich, Switzerland
| | - Irina M Velsko
- The Palaeogenomics and Bio-Archaeology Research Network, Research Laboratory for Archaeology and the History of Art, University of Oxford, Oxford, UK
| | - Camilla F Speller
- BioArCh, Department of Archaeology, University of York, York, UK
- Department of Anthropology, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
13
|
Ramalho RF, Li S, Radivojac P, Hahn MW. Proteomic Evidence for In-Frame and Out-of-Frame Alternatively Spliced Isoforms in Human and Mouse. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1284-1289. [PMID: 26394435 DOI: 10.1109/tcbb.2015.2480068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In order to find evidence for translation of alternatively spliced transcripts, especially those that result in a change in reading frame, we collected exon-skipping cases previously found by RNA-Seq and applied a computational approach to screen millions of mass spectra. These spectra came from seven human and six mouse tissues, five of which are the same between the two organisms: liver, kidney, lung, heart, and brain. Overall, we detected 4 percent of all exon-skipping events found in RNA-seq data, regardless of their effect on reading frame. The fraction of alternative isoforms detected did not differ between out-of-frame and in-frame events. Moreover, the fraction of identified alternative exon-exon junctions and constitutive junctions were similar. Together, our results suggest that both in-frame and out-of-frame translation may be actively used to regulate protein activity or localization.
Collapse
|
14
|
Yakubu RR, Weiss LM, Silmon de Monerri NC. Post-translational modifications as key regulators of apicomplexan biology: insights from proteome-wide studies. Mol Microbiol 2017; 107:1-23. [PMID: 29052917 DOI: 10.1111/mmi.13867] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 10/12/2017] [Accepted: 10/16/2017] [Indexed: 12/12/2022]
Abstract
Parasites of the Apicomplexa phylum, such as Plasmodium spp. and Toxoplasma gondii, undergo complex life cycles involving multiple stages with distinct biology and morphologies. Post-translational modifications (PTMs), such as phosphorylation, acetylation and glycosylation, regulate numerous cellular processes, playing a role in every aspect of cell biology. PTMs can occur on proteins at any time in their lifespan and through alterations of target protein activity, localization, protein-protein interactions, among other functions, dramatically increase proteome diversity and complexity. In addition, PTMs can be induced or removed on changes in cellular environment and state. Thus, PTMs are likely to be key regulators of developmental transitions, biology and pathogenesis of apicomplexan parasites. In this review we examine the roles of PTMs in both parasite-specific and conserved eukaryotic processes, and the potential crosstalk between PTMs, that together regulate the intricate lives of these protozoa.
Collapse
Affiliation(s)
- Rama R Yakubu
- Department of Pathology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, 10128, USA.,Department of Medicine, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, 10128, USA
| | - Louis M Weiss
- Department of Pathology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, 10128, USA.,Department of Medicine, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, 10128, USA
| | - Natalie C Silmon de Monerri
- Department of Pathology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, 10128, USA.,Department of Medicine, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, 10128, USA
| |
Collapse
|
15
|
Kim M, Eetemadi A, Tagkopoulos I. DeepPep: Deep proteome inference from peptide profiles. PLoS Comput Biol 2017; 13:e1005661. [PMID: 28873403 PMCID: PMC5600403 DOI: 10.1371/journal.pcbi.1005661] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 09/15/2017] [Accepted: 06/27/2017] [Indexed: 11/24/2022] Open
Abstract
Protein inference, the identification of the protein set that is the origin of a given peptide profile, is a fundamental challenge in proteomics. We present DeepPep, a deep-convolutional neural network framework that predicts the protein set from a proteomics mixture, given the sequence universe of possible proteins and a target peptide profile. In its core, DeepPep quantifies the change in probabilistic score of peptide-spectrum matches in the presence or absence of a specific protein, hence selecting as candidate proteins with the largest impact to the peptide profile. Application of the method across datasets argues for its competitive predictive ability (AUC of 0.80±0.18, AUPR of 0.84±0.28) in inferring proteins without need of peptide detectability on which the most competitive methods rely. We find that the convolutional neural network architecture outperforms the traditional artificial neural network architectures without convolution layers in protein inference. We expect that similar deep learning architectures that allow learning nonlinear patterns can be further extended to problems in metagenome profiling and cell type inference. The source code of DeepPep and the benchmark datasets used in this study are available at https://deeppep.github.io/DeepPep/. The accurate identification of proteins in a proteomics sample, called the protein inference problem, is a fundamental challenge in biomedical sciences. Current approaches are based on applications of traditional neural networks, linear optimization and Bayesian techniques. We here present DeepPep, a deep-convolutional neural network framework that predicts the protein set from a standard proteomics mixture, given all protein sequences and a peptide profile. Comparison to leading methods shows that DeepPep has most robust performance with various instruments and datasets. Our results provide evidence that using sequence-level location information of a peptide in the context of proteome sequence can result in more accurate and robust protein inference. We conclude that Deep Learning on protein sequence leads to superior platforms for protein inference that can be further refined with additional features and extended for far reaching applications.
Collapse
Affiliation(s)
- Minseung Kim
- Department of Computer Science, University of California, Davis, Davis, California, United States of America
- Genome Center, University of California, Davis, Davis, California, United States of America
| | - Ameen Eetemadi
- Department of Computer Science, University of California, Davis, Davis, California, United States of America
- Genome Center, University of California, Davis, Davis, California, United States of America
| | - Ilias Tagkopoulos
- Department of Computer Science, University of California, Davis, Davis, California, United States of America
- Genome Center, University of California, Davis, Davis, California, United States of America
- * E-mail:
| |
Collapse
|
16
|
Zhang SR, Shan YC, Jiang H, Liu JH, Zhou Y, Zhang LH, Zhang YK. The Null-Test for peptide identification algorithm in Shotgun proteomics. J Proteomics 2017; 163:118-125. [DOI: 10.1016/j.jprot.2017.05.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Revised: 05/09/2017] [Accepted: 05/11/2017] [Indexed: 12/24/2022]
|
17
|
New developments in probing and targeting protein acylation in malaria, leishmaniasis and African sleeping sickness. Parasitology 2017; 145:157-174. [PMID: 28270257 DOI: 10.1017/s0031182017000282] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Infections by protozoan parasites, such as Plasmodium falciparum or Leishmania donovani, have a significant health, social and economic impact and threaten billions of people living in tropical and sub-tropical regions of developing countries worldwide. The increasing range of parasite strains resistant to frontline therapeutics makes the identification of novel drug targets and the development of corresponding inhibitors vital. Post-translational modifications (PTMs) are important modulators of biology and inhibition of protein lipidation has emerged as a promising therapeutic strategy for treatment of parasitic diseases. In this review we summarize the latest insights into protein lipidation in protozoan parasites. We discuss how recent chemical proteomic approaches have delivered the first global overviews of protein lipidation in these organisms, contributing to our understanding of the role of this PTM in critical metabolic and cellular functions. Additionally, we highlight the development of new small molecule inhibitors to target parasite acyl transferases.
Collapse
|
18
|
Audain E, Uszkoreit J, Sachsenberg T, Pfeuffer J, Liang X, Hermjakob H, Sanchez A, Eisenacher M, Reinert K, Tabb DL, Kohlbacher O, Perez-Riverol Y. In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics. J Proteomics 2017; 150:170-182. [DOI: 10.1016/j.jprot.2016.08.002] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 07/30/2016] [Accepted: 08/02/2016] [Indexed: 12/24/2022]
|
19
|
Langella O, Valot B, Balliau T, Blein-Nicolas M, Bonhomme L, Zivy M. X!TandemPipeline: A Tool to Manage Sequence Redundancy for Protein Inference and Phosphosite Identification. J Proteome Res 2016; 16:494-503. [DOI: 10.1021/acs.jproteome.6b00632] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Olivier Langella
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Benoît Valot
- UMR
6249 Chrono-Environnement, CNRS, Université de Bourgogne Franche-Comté, 25030 Besançon, France
| | - Thierry Balliau
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Mélisande Blein-Nicolas
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Ludovic Bonhomme
- INRA/UBP, UMR 1095, Genetics, Diversity
and Ecophysiology of Cereals, F63100 Clermont-Ferrand, France
| | - Michel Zivy
- PAPPSO,
GQE - Le Moulon, INRA, Univ. Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| |
Collapse
|
20
|
Cerqueira FR, Ricardo AM, de Paiva Oliveira A, Graber A, Baumgartner C. MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm. BMC Bioinformatics 2016; 17:472. [PMID: 28105913 PMCID: PMC5249030 DOI: 10.1186/s12859-016-1341-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. RESULTS In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. CONCLUSIONS The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics.
Collapse
Affiliation(s)
| | - Adilson Mendes Ricardo
- Department of Informatics, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil.,Department of Computing and Construction, Centro Federal de Educação Tecnológica de Minas Gerais, Rua 19 de Novembro, 121, Timóteo, 35180-008, Brazil
| | - Alcione de Paiva Oliveira
- Department of Informatics, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil.,Department of Computer Science, University of Sheffield, Western Bank, S10 2TN, Sheffield, UK
| | - Armin Graber
- Research and Product Development of Genoptix, a Novartis company, 2110 Rutherford Rd, Carlsbad, 92008, USA
| | - Christian Baumgartner
- Institute of Health Care Engineering with European Notified Body of Medical Devices, Graz University of Technology, Stremayrgasse 16/II, Graz, A-8010, Austria
| |
Collapse
|
21
|
Kim MS, Zhong J, Pandey A. Common errors in mass spectrometry-based analysis of post-translational modifications. Proteomics 2016; 16:700-14. [PMID: 26667783 DOI: 10.1002/pmic.201500355] [Citation(s) in RCA: 88] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 11/05/2015] [Accepted: 12/08/2015] [Indexed: 12/29/2022]
Abstract
Mass spectrometry (MS) is a powerful tool to analyze complex mixtures of proteins in a high-throughput fashion. Proteome analysis has already become a routine task in biomedical research with the emergence of proteomics core facilities in most research institutions. Post-translational modifications (PTMs) represent a mechanism by which complex biological processes are orchestrated dynamically at the systems level. MS is rapidly becoming popular to discover new modifications and novel sites of known PTMs, revolutionizing the current understanding of diverse signaling pathways and biological processes. However, MS-based analysis of PTMs has its own caveats and pitfalls that can lead to erroneous conclusions. Here, we review the most common errors in MS-based PTM analyses with the goal of adopting strategies that maximize correct interpretation in the context of biological questions that are being addressed. Finally, we provide suggestions that should help mass spectrometrists, bioinformaticians and biologists to perform and interpret MS-based PTM analyses more accurately.
Collapse
Affiliation(s)
- Min-Sik Kim
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jun Zhong
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA.,Departments of Biological Chemistry, Pathology and Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
22
|
The M, Tasnim A, Käll L. How to talk about protein-level false discovery rates in shotgun proteomics. Proteomics 2016; 16:2461-9. [PMID: 27503675 PMCID: PMC5096025 DOI: 10.1002/pmic.201500431] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 05/12/2016] [Accepted: 07/20/2016] [Indexed: 12/04/2022]
Abstract
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses.
Collapse
Affiliation(s)
- Matthew The
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH, Solna, Sweden
| | - Ayesha Tasnim
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH, Solna, Sweden
| | - Lukas Käll
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH, Solna, Sweden.
| |
Collapse
|
23
|
Protein inference: A protein quantification perspective. Comput Biol Chem 2016; 63:21-29. [DOI: 10.1016/j.compbiolchem.2016.02.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 02/01/2016] [Indexed: 01/04/2023]
|
24
|
Norris EL, Headlam MJ, Dave KA, Smith DD, Bukreyev A, Singh T, Jayakody BA, Chappell KJ, Collins PL, Gorman JJ. Proteoform-Specific Insights into Cellular Proteome Regulation. Mol Cell Proteomics 2016; 15:3297-3320. [PMID: 27451424 PMCID: PMC5054351 DOI: 10.1074/mcp.o116.058438] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Indexed: 01/29/2023] Open
Abstract
Knowledge regarding compositions of proteomes at the proteoform level enhances insights into cellular phenotypes. A strategy is described herein for discovery of proteoform-specific information about cellular proteomes. This strategy involved analysis of data obtained by bottom-up mass spectrometry of multiple protein OGE separations on a fraction by fraction basis. The strategy was exemplified using five matched sets of lysates of uninfected and human respiratory syncytial virus-infected A549 cells. Template matching demonstrated that 67.3% of 10475 protein profiles identified focused to narrow pI windows indicative of efficacious focusing. Furthermore, correlation between experimental and theoretical pI gradients indicated reproducible focusing. Based on these observations a proteoform profiling strategy was developed to identify proteoforms, detect proteoform diversity and discover potential proteoform regulation. One component of this strategy involved examination of the focusing profiles for protein groups. A novel concordance analysis facilitated differentiation between proteoforms, including proteoforms generated by alternate splicing and proteolysis. Evaluation of focusing profiles and concordance analysis were applicable to cells from a single and/or multiple biological states. Statistical analyses identified proteoform variation between biological states. Regulation relevant to cellular responses to human respiratory syncytial virus was revealed. Western blotting and Protomap analyses validated the proteoform regulation. Discovery of STAT1, WARS, MX1, and HSPB1 proteoform regulation by human respiratory syncytial virus highlighted the impact of the profiling strategy. Novel truncated proteoforms of MX1 were identified in infected cells and phosphorylation driven regulation of HSPB1 proteoforms was correlated with infection. The proteoform profiling strategy is generally applicable to investigating interactions between viruses and host cells and the analysis of other biological systems.
Collapse
Affiliation(s)
| | | | | | - David D Smith
- §Statistics Unit, QIMR Berghofer Medical Research Institute, Herston, Queensland, Australia
| | - Alexander Bukreyev
- ¶Respiratory Virus Section, Laboratory of Infectious Diseases, National Institute for Allergy and Infectious Diseases, NIH, Bethesda, Maryland, and
| | | | | | - Keith J Chappell
- ‖School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, Queensland, Australia
| | - Peter L Collins
- ¶Respiratory Virus Section, Laboratory of Infectious Diseases, National Institute for Allergy and Infectious Diseases, NIH, Bethesda, Maryland, and
| | - Jeffrey J Gorman
- From the ‡Protein Discovery Centre and ‖School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, Queensland, Australia
| |
Collapse
|
25
|
Shortreed MR, Frey BL, Scalf M, Knoener RA, Cesnik AJ, Smith LM. Elucidating Proteoform Families from Proteoform Intact-Mass and Lysine-Count Measurements. J Proteome Res 2016; 15:1213-21. [PMID: 26941048 PMCID: PMC4917391 DOI: 10.1021/acs.jproteome.5b01090] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
![]()
Proteomics
is presently dominated by the “bottom-up”
strategy, in which proteins are enzymatically digested into peptides
for mass spectrometric identification. Although this approach is highly
effective at identifying large numbers of proteins present in complex
samples, the digestion into peptides renders it impossible to identify
the proteoforms from which they were derived. We present here a powerful
new strategy for the identification of proteoforms and the elucidation
of proteoform families (groups of related proteoforms) from the experimental
determination of the accurate proteoform mass and number of lysine
residues contained. Accurate proteoform masses are determined by standard
LC–MS analysis of undigested protein mixtures in an Orbitrap
mass spectrometer, and the lysine count is determined using the NeuCode
isotopic tagging method. We demonstrate the approach in analysis of
the yeast proteome, revealing 8637 unique proteoforms and 1178 proteoform
families. The elucidation of proteoforms and proteoform families afforded
here provides an unprecedented new perspective upon proteome complexity
and dynamics.
Collapse
Affiliation(s)
- Michael R Shortreed
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Brian L Frey
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Rachel A Knoener
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States.,Genome Center of Wisconsin, University of Wisconsin , 425G Henry Mall, Room 3420, Madison, Wisconsin 53706, United States
| |
Collapse
|
26
|
Blein-Nicolas M, Zivy M. Thousand and one ways to quantify and compare protein abundances in label-free bottom-up proteomics. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2016; 1864:883-95. [PMID: 26947242 DOI: 10.1016/j.bbapap.2016.02.019] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 01/21/2016] [Accepted: 02/24/2016] [Indexed: 11/18/2022]
Abstract
How to process and analyze MS data to quantify and statistically compare protein abundances in bottom-up proteomics has been an open debate for nearly fifteen years. Two main approaches are generally used: the first is based on spectral data generated during the process of identification (e.g. peptide counting, spectral counting), while the second makes use of extracted ion currents to quantify chromatographic peaks and infer protein abundances based on peptide quantification. These two approaches actually refer to multiple methods which have been developed during the last decade, but were submitted to deep evaluations only recently. In this paper, we compiled these different methods as exhaustively as possible. We also summarized the way they address the different problems raised by bottom-up protein quantification such as normalization, the presence of shared peptides, unequal peptide measurability and missing data. This article is part of a Special Issue entitled: Plant Proteomics--a bridge between fundamental processes and crop production, edited by Dr. Hans-Peter Mock.
Collapse
Affiliation(s)
- Mélisande Blein-Nicolas
- GQE-Le Moulon, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, F-91190 Gif-sur-Yvette, France
| | - Michel Zivy
- GQE-Le Moulon, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, F-91190 Gif-sur-Yvette, France.
| |
Collapse
|
27
|
Zhao C, Liu D, Teng B, He Z. BagReg: Protein inference through machine learning. Comput Biol Chem 2015; 57:12-20. [DOI: 10.1016/j.compbiolchem.2015.02.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 02/03/2015] [Indexed: 10/24/2022]
|
28
|
Creighton CJ, Huang S. Reverse phase protein arrays in signaling pathways: a data integration perspective. DRUG DESIGN DEVELOPMENT AND THERAPY 2015; 9:3519-27. [PMID: 26185419 PMCID: PMC4500628 DOI: 10.2147/dddt.s38375] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The reverse phase protein array (RPPA) data platform provides expression data for a prespecified set of proteins, across a set of tissue or cell line samples. Being able to measure either total proteins or posttranslationally modified proteins, even ones present at lower abundances, RPPA represents an excellent way to capture the state of key signaling transduction pathways in normal or diseased cells. RPPA data can be combined with those of other molecular profiling platforms, in order to obtain a more complete molecular picture of the cell. This review offers perspective on the use of RPPA as a component of integrative molecular analysis, using recent case examples from The Cancer Genome Altas consortium, showing how RPPA may provide additional insight into cancer besides what other data platforms may provide. There also exists a clear need for effective visualization approaches to RPPA-based proteomic results; this was highlighted by the recent challenge, put forth by the HPN-DREAM consortium, to develop visualization methods for a highly complex RPPA dataset involving many cancer cell lines, stimuli, and inhibitors applied over time course. In this review, we put forth a number of general guidelines for effective visualization of complex molecular datasets, namely, showing the data, ordering data elements deliberately, enabling generalization, focusing on relevant specifics, and putting things into context. We give examples of how these principles can be utilized in visualizing the intrinsic subtypes of breast cancer and in meaningfully displaying the entire HPN-DREAM RPPA dataset within a single page.
Collapse
Affiliation(s)
- Chad J Creighton
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA ; Dan L Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Shixia Huang
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA ; Dan L Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
29
|
Pejchinovski M, Klein J, Ramírez-Torres A, Bitsika V, Mermelekas G, Vlahou A, Mullen W, Mischak H, Jankowski V. Comparison of higher energy collisional dissociation and collision-induced dissociation MS/MS sequencing methods for identification of naturally occurring peptides in human urine. Proteomics Clin Appl 2015; 9:531-42. [DOI: 10.1002/prca.201400163] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Revised: 02/27/2015] [Accepted: 03/23/2015] [Indexed: 01/11/2023]
Affiliation(s)
- Martin Pejchinovski
- Charite-Universitätsmedizin Berlin; Berlin Germany
- Mosaiques Diagnostics GmbH; Hanover Germany
| | | | | | - Vasiliki Bitsika
- Biotechnology Division; Biomedical Research Foundation; Academy of Athens; Athens Greece
| | - George Mermelekas
- Biotechnology Division; Biomedical Research Foundation; Academy of Athens; Athens Greece
| | - Antonia Vlahou
- Biotechnology Division; Biomedical Research Foundation; Academy of Athens; Athens Greece
| | - William Mullen
- Institute of Cardiovascular and Medical Sciences; University of Glasgow; Glasgow UK
| | - Harald Mischak
- Mosaiques Diagnostics GmbH; Hanover Germany
- Institute of Cardiovascular and Medical Sciences; University of Glasgow; Glasgow UK
| | - Vera Jankowski
- Universitätsklinikum RWTH Aachen; Institute of Molecular Cardiovascular Research; Aachen Germany
| |
Collapse
|
30
|
Shalit T, Elinger D, Savidor A, Gabashvili A, Levin Y. MS1-based label-free proteomics using a quadrupole orbitrap mass spectrometer. J Proteome Res 2015; 14:1979-86. [PMID: 25780947 DOI: 10.1021/pr501045t] [Citation(s) in RCA: 75] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Presented is a data set for benchmarking MS1-based label-free quantitative proteomics using a quadrupole orbitrap mass spectrometer. Escherichia coli digest was spiked into a HeLa digest in four different concentrations, simulating protein expression differences in a background of an unchanged complex proteome. The data set provides a unique opportunity to evaluate the proteomic platform (instrumentation and software) in its ability to perform MS1-intensity-based label-free quantification. We show that the presented combination of informatics and instrumentation produces high precision and quantification accuracy. The data were also used to compare different quantitative protein inference methods such as iBAQ and Hi-N. The data can also be used as a resource for development and optimization of proteomics informatics tools, thus the raw data have been deposited to ProteomeXchange with identifier PXD001385.
Collapse
Affiliation(s)
- Tali Shalit
- †de Botton Institute for Protein Profiling and ‡Ilana and Pascal Mantoux Institute for Bioinformatics, The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Dalia Elinger
- †de Botton Institute for Protein Profiling and ‡Ilana and Pascal Mantoux Institute for Bioinformatics, The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Alon Savidor
- †de Botton Institute for Protein Profiling and ‡Ilana and Pascal Mantoux Institute for Bioinformatics, The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Alexandra Gabashvili
- †de Botton Institute for Protein Profiling and ‡Ilana and Pascal Mantoux Institute for Bioinformatics, The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yishai Levin
- †de Botton Institute for Protein Profiling and ‡Ilana and Pascal Mantoux Institute for Bioinformatics, The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
31
|
Alves G, Yu YK. Mass spectrometry-based protein identification with accurate statistical significance assignment. ACTA ACUST UNITED AC 2014; 31:699-706. [PMID: 25362092 DOI: 10.1093/bioinformatics/btu717] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. RESULTS We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. AVAILABILITY AND IMPLEMENTATION The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
32
|
Frantzi M, Bhat A, Latosinska A. Clinical proteomic biomarkers: relevant issues on study design & technical considerations in biomarker development. Clin Transl Med 2014; 3:7. [PMID: 24679154 PMCID: PMC3994249 DOI: 10.1186/2001-1326-3-7] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Accepted: 03/06/2014] [Indexed: 12/11/2022] Open
Abstract
Biomarker research is continuously expanding in the field of clinical proteomics. A combination of different proteomic-based methodologies can be applied depending on the specific clinical context of use. Moreover, current advancements in proteomic analytical platforms are leading to an expansion of biomarker candidates that can be identified. Specifically, mass spectrometric techniques could provide highly valuable tools for biomarker research. Ideally, these advances could provide with biomarkers that are clinically applicable for disease diagnosis and/ or prognosis. Unfortunately, in general the biomarker candidates fail to be implemented in clinical decision making. To improve on this current situation, a well-defined study design has to be established driven by a clear clinical need, while several checkpoints between the different phases of discovery, verification and validation have to be passed in order to increase the probability of establishing valid biomarkers. In this review, we summarize the technical proteomic platforms that are available along the different stages in the biomarker discovery pipeline, exemplified by clinical applications in the field of bladder cancer biomarker research.
Collapse
Affiliation(s)
- Maria Frantzi
- Mosaiques Diagnostics GmbH, Mellendorfer Strasse 7-9, D-30625 Hannover, Germany
- Biotechnology Division, Biomedical Research Foundation Academy of Athens, Soranou Ephessiou 4, 115 27 Athens, Greece
| | - Akshay Bhat
- Mosaiques Diagnostics GmbH, Mellendorfer Strasse 7-9, D-30625 Hannover, Germany
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Agnieszka Latosinska
- Biotechnology Division, Biomedical Research Foundation Academy of Athens, Soranou Ephessiou 4, 115 27 Athens, Greece
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
33
|
Serang O. The probabilistic convolution tree: efficient exact Bayesian inference for faster LC-MS/MS protein inference. PLoS One 2014; 9:e91507. [PMID: 24626234 PMCID: PMC3953406 DOI: 10.1371/journal.pone.0091507] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 02/12/2014] [Indexed: 11/18/2022] Open
Abstract
Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called “causal independence”). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to and the space to where is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions.
Collapse
|
34
|
Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014; 14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]
Abstract
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.
Collapse
Affiliation(s)
- Pieter Kelchtermans
- Department of Medical Protein Research, VIB, Ghent, Belgium; Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang, Mol, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Provenzano JC, Siqueira JF, Rôças IN, Domingues RR, Paes Leme AF, Silva MRS. Metaproteome analysis of endodontic infections in association with different clinical conditions. PLoS One 2013; 8:e76108. [PMID: 24143178 PMCID: PMC3797121 DOI: 10.1371/journal.pone.0076108] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Accepted: 08/21/2013] [Indexed: 12/12/2022] Open
Abstract
Analysis of the metaproteome of microbial communities is important to provide an insight of community physiology and pathogenicity. This study evaluated the metaproteome of endodontic infections associated with acute apical abscesses and asymptomatic apical periodontitis lesions. Proteins persisting or expressed after root canal treatment were also evaluated. Finally, human proteins associated with these infections were identified. Samples were taken from root canals of teeth with asymptomatic apical periodontitis before and after chemomechanical treatment using either NaOCl or chlorhexidine as the irrigant. Samples from abscesses were taken by aspiration of the purulent exudate. Clinical samples were processed for analysis of the exoproteome by using two complementary mass spectrometry platforms: nanoflow liquid chromatography coupled with linear ion trap quadrupole Velos Orbitrap and liquid chromatography-quadrupole time-of-flight. A total of 308 proteins of microbial origin were identified. The number of proteins in abscesses was higher than in asymptomatic cases. In canals irrigated with chlorhexidine, the number of identified proteins decreased substantially, while in the NaOCl group the number of proteins increased. The large majority of microbial proteins found in endodontic samples were related to metabolic and housekeeping processes, including protein synthesis, energy metabolism and DNA processes. Moreover, several other proteins related to pathogenicity and resistance/survival were found, including proteins involved with adhesion, biofilm formation and antibiotic resistance, stress proteins, exotoxins, invasins, proteases and endopeptidases (mostly in abscesses), and an archaeal protein linked to methane production. The majority of human proteins detected were related to cellular processes and metabolism, as well as immune defense. Interrogation of the metaproteome of endodontic microbial communities provides information on the physiology and pathogenicity of the community at the time of sampling. There is a growing need for expanded and more curated protein databases that permit more accurate identifications of proteins in metaproteomic studies.
Collapse
Affiliation(s)
- José Claudio Provenzano
- Department of Biochemistry, Chemistry Institute, Federal University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil ; Department of Endodontics, Faculty of Dentistry, Estácio de Sá University, Rio de Janeiro, RJ, Brazil
| | | | | | | | | | | |
Collapse
|
36
|
Serang O, Cansizoglu AE, Käll L, Steen H, Steen JA. Nonparametric Bayesian evaluation of differential protein quantification. J Proteome Res 2013; 12:4556-65. [PMID: 24024742 DOI: 10.1021/pr400678m] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Arbitrary cutoffs are ubiquitous in quantitative computational proteomics: maximum acceptable MS/MS PSM or peptide q value, minimum ion intensity to calculate a fold change, the minimum number of peptides that must be available to trust the estimated protein fold change (or the minimum number of PSMs that must be available to trust the estimated peptide fold change), and the "significant" fold change cutoff. Here we introduce a novel experimental setup and nonparametric Bayesian algorithm for determining the statistical quality of a proposed differential set of proteins or peptides. By comparing putatively nonchanging case-control evidence to an empirical null distribution derived from a control-control experiment, we successfully avoid some of these common parameters. We then apply our method to evaluating different fold-change rules and find that for our data a 1.2-fold change is the most permissive of the plausible fold-change rules.
Collapse
Affiliation(s)
- Oliver Serang
- Thermo Fisher Scientific Bremen , Hanna-Kunath-Straße 11, Bremen 28199, Germany
| | | | | | | | | |
Collapse
|
37
|
Ji C, Arnold RJ, Sokoloski KJ, Hardy RW, Tang H, Radivojac P. Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra. Proteomics 2013; 13:756-65. [PMID: 23303707 DOI: 10.1002/pmic.201100670] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Revised: 10/19/2012] [Accepted: 11/11/2012] [Indexed: 01/10/2023]
Abstract
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well-studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor-based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K-nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20-60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.
Collapse
Affiliation(s)
- Chao Ji
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | | | | | | | | | | |
Collapse
|
38
|
Huang T, Gong H, Yang C, He Z. ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics. Comput Biol Chem 2013; 43:46-54. [PMID: 23385215 DOI: 10.1016/j.compbiolchem.2012.12.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2012] [Revised: 12/30/2012] [Accepted: 12/30/2012] [Indexed: 11/28/2022]
Abstract
Protein inference is an important issue in proteomics research. Its main objective is to select a proper subset of candidate proteins that best explain the observed peptides. Although many methods have been proposed for solving this problem, several issues such as peptide degeneracy and one-hit wonders still remain unsolved. Therefore, the accurate identification of proteins that are truly present in the sample continues to be a challenging task. Based on the concept of peptide detectability, we formulate the protein inference problem as a constrained Lasso regression problem, which can be solved very efficiently through a coordinate descent procedure. The new inference algorithm is named as ProteinLasso, which explores an ensemble learning strategy to address the sparsity parameter selection problem in Lasso model. We test the performance of ProteinLasso on three datasets. As shown in the experimental results, ProteinLasso outperforms those state-of-the-art protein inference algorithms in terms of both identification accuracy and running efficiency. In addition, we show that ProteinLasso is stable under different parameter specifications. The source code of our algorithm is available at: http://sourceforge.net/projects/proteinlasso.
Collapse
Affiliation(s)
- Ting Huang
- School of Software, Dalian University of Technology, China
| | | | | | | |
Collapse
|