1
|
Uszkoreit J, Marcus K, Eisenacher M. A Review of Protein Inference. Methods Mol Biol 2025; 2859:53-64. [PMID: 39436596 DOI: 10.1007/978-1-0716-4152-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Protein inference is an often neglected though crucial step in most proteomic experiments. In the bottom-up proteomic approach, the actual molecules of interest, the proteins, are digested into peptides before measurement on a mass spectrometer. This approach introduces a loss of information: The actual proteins must be inferred based on the identified peptides. While this might seem trivial, there are certain problems, one of the biggest being the presence of peptides that are shared among proteins. These amino acid sequences can, based on the database used for identification, belong to more than one protein. If such peptides are identified in a sample, it cannot be said which proteins actually were in the sample, but only an estimate on the most probable proteins or protein groups can be given based on a predefined inference strategy.Here we describe the effect of the chosen database for peptide identification on the number of shared peptides. Afterward, the mainly used protein inference methods will be sketched, and the necessity of stringent false discovery rate on peptide and protein level is discussed. Finally, we explain how the tool "PIA or protein inference algorithms" can be used together with the workflow environment KNIME and OpenMS to perform protein inference in a common proteomic experiment.
Collapse
Affiliation(s)
- Julian Uszkoreit
- Medical Bioinformatics, Medical Faculty, Ruhr University Bochum, Bochum, Germany.
- Medizinisches Proteom-Center, Medical Faculty, Ruhr University Bochum, Bochum, Germany.
| | - Katrin Marcus
- Medical Proteome Analysis, Center for Proteindiagnostics (PRODI), Ruhr University Bochum, Bochum, Germany
- Medizinisches Proteom-Center, Medical Faculty, Ruhr University Bochum, Bochum, Germany
| | - Martin Eisenacher
- Medical Proteome Analysis, Center for Proteindiagnostics (PRODI), Ruhr University Bochum, Bochum, Germany
- Medizinisches Proteom-Center, Medical Faculty, Ruhr University Bochum, Bochum, Germany
| |
Collapse
|
2
|
Hoffmann N, Mayer G, Has C, Kopczynski D, Al Machot F, Schwudke D, Ahrends R, Marcus K, Eisenacher M, Turewicz M. A Current Encyclopedia of Bioinformatics Tools, Data Formats and Resources for Mass Spectrometry Lipidomics. Metabolites 2022; 12:584. [PMID: 35888710 PMCID: PMC9319858 DOI: 10.3390/metabo12070584] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/17/2022] [Accepted: 06/19/2022] [Indexed: 12/13/2022] Open
Abstract
Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipids, metabolites and proteins necessary for biomedical research. In this study, we catalogued freely available software tools, libraries, databases, repositories and resources that support lipidomics data analysis and determined the scope of currently used analytical technologies. Because of the tremendous importance of data interoperability, we assessed the support of standardized data formats in mass spectrometric (MS)-based lipidomics workflows. We included tools in our comparison that support targeted as well as untargeted analysis using direct infusion/shotgun (DI-MS), liquid chromatography-mass spectrometry, ion mobility or MS imaging approaches on MS1 and potentially higher MS levels. As a result, we determined that the Human Proteome Organization-Proteomics Standards Initiative standard data formats, mzML and mzTab-M, are already supported by a substantial number of recent software tools. We further discuss how mzTab-M can serve as a bridge between data acquisition and lipid bioinformatics tools for interpretation, capturing their output and transmitting rich annotated data for downstream processing. However, we identified several challenges of currently available tools and standards. Potential areas for improvement were: adaptation of common nomenclature and standardized reporting to enable high throughput lipidomics and improve its data handling. Finally, we suggest specific areas where tools and repositories need to improve to become FAIRer.
Collapse
Affiliation(s)
- Nils Hoffmann
- Forschungszentrum Jülich GmbH, Institute for Bio- and Geosciences (IBG-5), 52425 Jülich, Germany
| | - Gerhard Mayer
- Institute of Medical Systems Biology, Ulm University, 89081 Ulm, Germany;
| | - Canan Has
- Biological Mass Spectrometry, Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany;
- University Hospital Carl Gustav Carus, 01307 Dresden, Germany
- CENTOGENE GmbH, 18055 Rostock, Germany
| | - Dominik Kopczynski
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Fadi Al Machot
- Faculty of Science and Technology, Norwegian University for Life Science (NMBU), 1433 Ås, Norway;
| | - Dominik Schwudke
- Bioanalytical Chemistry, Forschungszentrum Borstel, Leibniz Lung Center, 23845 Borstel, Germany;
- Airway Research Center North, German Center for Lung Research (DZL), 23845 Borstel, Germany
- German Center for Infection Research (DZIF), TTU Tuberculosis, 23845 Borstel, Germany
| | - Robert Ahrends
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Katrin Marcus
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
| | - Martin Eisenacher
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
- Faculty of Medicine, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Michael Turewicz
- Institute for Clinical Biochemistry and Pathobiochemistry, German Diabetes Center (DDZ), Leibniz Center for Diabetes Research at Heinrich-Heine-University Düsseldorf, 40225 Düsseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Düsseldorf, 85764 Neuherberg, Germany
| |
Collapse
|
3
|
Moosa JM, Guan S, Moran MF, Ma B. Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification. J Proteome Res 2020; 19:1029-1036. [DOI: 10.1021/acs.jproteome.9b00555] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Johra Muhammad Moosa
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo N2L 3G1, Canada
| | - Shenheng Guan
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo N2L 3G1, Canada
- Program in Cell Biology and SPARC BioCentre, Hospital for Sick Children, 686 Bay St, Toronto, Ontario M5G 0A4, Canada
| | - Michael F. Moran
- Program in Cell Biology and SPARC BioCentre, Hospital for Sick Children, 686 Bay St, Toronto, Ontario M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, 686 Bay St, Toronto, Ontario M5G 0A4, Canada
| | - Bin Ma
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo N2L 3G1, Canada
| |
Collapse
|
4
|
Zhou WJ, Yang H, Zeng WF, Zhang K, Chi H, He SM. pValid: Validation Beyond the Target-Decoy Approach for Peptide Identification in Shotgun Proteomics. J Proteome Res 2019; 18:2747-2758. [PMID: 31244209 DOI: 10.1021/acs.jproteome.8b00993] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
As the de facto validation method in mass spectrometry-based proteomics, the target-decoy approach determines a threshold to estimate the false discovery rate and then filters those identifications beyond the threshold. However, the incorrect identifications within the threshold are still unknown and further validation methods are needed. In this study, we characterized a framework of validation and investigated a number of common and novel validation methods. We first defined the accuracy of a validation method by its false-positive rate (FPR) and false-negative rate (FNR) and, further, proved that a validation method with lower FPR and FNR led to identifications with higher sensitivity and precision. Then we proposed a validation method named pValid that incorporated an open database search and a theoretical spectrum prediction strategy via a machine-learning technology. pValid was compared with four common validation methods as well as a synthetic peptide validation method. Tests on three benchmark data sets indicated that pValid had an FPR of 0.03% and an FNR of 1.79% on average, both superior to the other four common validation methods. Tests on a synthetic peptide data set also indicated that the FPR and FNR of pValid were better than those of the synthetic peptide validation method. Tests on a large-scale human proteome data set indicated that pValid successfully flagged the highest number of incorrect identifications among all five methods. Further considering its cost-effectiveness, pValid has the potential to be a feasible validation tool for peptide identification.
Collapse
Affiliation(s)
- Wen-Jing Zhou
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Hao Yang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Kun Zhang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| |
Collapse
|
5
|
BioInfra.Prot: A comprehensive proteomics workflow including data standardization, protein inference, expression analysis and data publication. J Biotechnol 2017; 261:116-125. [DOI: 10.1016/j.jbiotec.2017.06.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 06/04/2017] [Accepted: 06/08/2017] [Indexed: 01/12/2023]
|
6
|
|
7
|
A Simplified Workflow for Protein Quantitation of Rat Brain Tissues Using Label-Free Proteomics and Spectral Counting. Methods Mol Biol 2016. [PMID: 27604744 DOI: 10.1007/978-1-4939-3816-2_36] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Mass spectrometry-based proteomics is an increasingly valuable tool for determining relative or quantitative protein abundance in brain tissues. A plethora of technical and analytical methods are available, but straightforward and practical approaches are often needed to facilitate reproducibility. This aspect is particularly important as an increasing number of studies focus on models of traumatic brain injury or brain trauma, for which brain tissue proteomes have not yet been fully described. This text provides suggested techniques for robust identification and quantitation of brain proteins by using molecular weight fractionation prior to mass spectrometry-based proteomics. Detailed sample preparation and generalized protocols for chromatography, mass spectrometry, spectral counting, and normalization are described. The rat cerebral cortex isolated from a model of blast-overpressure was used as an exemplary source of brain tissue. However, these techniques may be adapted for lysates generated from several types of cells or tissues and adapted by the end user.
Collapse
|
8
|
Uszkoreit J, Maerkens A, Perez-Riverol Y, Meyer HE, Marcus K, Stephan C, Kohlbacher O, Eisenacher M. PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface. J Proteome Res 2015; 14:2988-97. [DOI: 10.1021/acs.jproteome.5b00121] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Julian Uszkoreit
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Alexandra Maerkens
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | | | - Helmut E. Meyer
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Katrin Marcus
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Christian Stephan
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Oliver Kohlbacher
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| | - Martin Eisenacher
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, 44801 Bochum, Germany
| |
Collapse
|