1
|
Hsiao Y, Zhang H, Li GX, Deng Y, Yu F, Valipour Kahrood H, Steele JR, Schittenhelm RB, Nesvizhskii AI. Analysis and Visualization of Quantitative Proteomics Data Using FragPipe-Analyst. J Proteome Res 2024. [PMID: 39254081 DOI: 10.1021/acs.jproteome.4c00294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
The FragPipe computational proteomics platform is gaining widespread popularity among the proteomics research community because of its fast processing speed and user-friendly graphical interface. Although FragPipe produces well-formatted output tables that are ready for analysis, there is still a need for an easy-to-use and user-friendly downstream statistical analysis and visualization tool. FragPipe-Analyst addresses this need by providing an R shiny web server to assist FragPipe users in conducting downstream analyses of the resulting quantitative proteomics data. It supports major quantification workflows, including label-free quantification, tandem mass tags, and data-independent acquisition. FragPipe-Analyst offers a range of useful functionalities, such as various missing value imputation options, data quality control, unsupervised clustering, differential expression (DE) analysis using Limma, and gene ontology and pathway enrichment analysis using Enrichr. To support advanced analysis and customized visualizations, we also developed FragPipeAnalystR, an R package encompassing all FragPipe-Analyst functionalities that is extended to support site-specific analysis of post-translational modifications (PTMs). FragPipe-Analyst and FragPipeAnalystR are both open-source and freely available.
Collapse
Affiliation(s)
- Yi Hsiao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Haijian Zhang
- Monash Proteomics & Metabolomics Platform, Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Ginny Xiaohe Li
- Department of Pathology, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Yamei Deng
- Department of Pathology, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Hossein Valipour Kahrood
- Monash Proteomics & Metabolomics Platform, Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
- Monash Genomics & Bioinformatics Platform, Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Joel R Steele
- Monash Proteomics & Metabolomics Platform, Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Ralf B Schittenhelm
- Monash Proteomics & Metabolomics Platform, Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Pathology, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
2
|
Dai C, Pfeuffer J, Wang H, Zheng P, Käll L, Sachsenberg T, Demichev V, Bai M, Kohlbacher O, Perez-Riverol Y. quantms: a cloud-based pipeline for quantitative proteomics enables the reanalysis of public proteomics data. Nat Methods 2024:10.1038/s41592-024-02343-1. [PMID: 38965444 DOI: 10.1038/s41592-024-02343-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 06/03/2024] [Indexed: 07/06/2024]
Abstract
The volume of public proteomics data is rapidly increasing, causing a computational challenge for large-scale reanalysis. Here, we introduce quantms ( https://quant,ms.org/ ), an open-source cloud-based pipeline for massively parallel proteomics data analysis. We used quantms to reanalyze 83 public ProteomeXchange datasets, comprising 29,354 instrument files from 13,132 human samples, to quantify 16,599 proteins based on 1.03 million unique peptides. quantms is based on standard file formats improving the reproducibility, submission and dissemination of the data to ProteomeXchange.
Collapse
Affiliation(s)
- Chengxin Dai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Julianus Pfeuffer
- Algorithmic Bioinformatics, Freie Universität Berlin, Berlin, Germany
| | - Hong Wang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Ping Zheng
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Timo Sachsenberg
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | | | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Oliver Kohlbacher
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- Institute for Translational Bioinformatics, University Hospital Tübingen, Tübingen, Germany
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
3
|
Ergin EK, Myung JJ, Lange PF. Statistical Testing for Protein Equivalence Identifies Core Functional Modules Conserved across 360 Cancer Cell Lines and Presents a General Approach to Investigating Biological Systems. J Proteome Res 2024; 23:2169-2185. [PMID: 38804581 PMCID: PMC11166143 DOI: 10.1021/acs.jproteome.4c00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/04/2024] [Accepted: 05/17/2024] [Indexed: 05/29/2024]
Abstract
Quantitative proteomics has enhanced our capability to study protein dynamics and their involvement in disease using various techniques, including statistical testing, to discern the significant differences between conditions. While most focus is on what is different between conditions, exploring similarities can provide valuable insights. However, exploring similarities directly from the analyte level, such as proteins, genes, or metabolites, is not a standard practice and is not widely adopted. In this study, we propose a statistical framework called QuEStVar (Quantitative Exploration of Stability and Variability through statistical hypothesis testing), enabling the exploration of quantitative stability and variability of features with a combined statistical framework. QuEStVar utilizes differential and equivalence testing to expand statistical classifications of analytes when comparing conditions. We applied our method to an extensive data set of cancer cell lines and revealed a quantitatively stable core proteome across diverse tissues and cancer subtypes. The functional analysis of this set of proteins highlighted the molecular mechanism of cancer cells to maintain constant conditions of the tumorigenic environment via biological processes, including transcription, translation, and nucleocytoplasmic transport.
Collapse
Affiliation(s)
- Enes K. Ergin
- Department
of Pathology, University of British Columbia, Vancouver, British Columbia V6T 1Z7, Canada
- Michael
Cuccione Childhood Cancer Research Program, BC Children’s Hospital Research Institute, Vancouver, British Columbia V5Z 2H4, Canada
| | - Junia J.K. Myung
- Department
of Pathology, University of British Columbia, Vancouver, British Columbia V6T 1Z7, Canada
- Michael
Cuccione Childhood Cancer Research Program, BC Children’s Hospital Research Institute, Vancouver, British Columbia V5Z 2H4, Canada
| | - Philipp F. Lange
- Department
of Pathology, University of British Columbia, Vancouver, British Columbia V6T 1Z7, Canada
- Michael
Cuccione Childhood Cancer Research Program, BC Children’s Hospital Research Institute, Vancouver, British Columbia V5Z 2H4, Canada
| |
Collapse
|
4
|
Kohler D, Staniak M, Yu F, Nesvizhskii AI, Vitek O. An MSstats workflow for detecting differentially abundant proteins in large-scale data-independent acquisition mass spectrometry experiments with FragPipe processing. Nat Protoc 2024:10.1038/s41596-024-01000-3. [PMID: 38769142 DOI: 10.1038/s41596-024-01000-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/11/2024] [Indexed: 05/22/2024]
Abstract
Technological advances in mass spectrometry and proteomics have made it possible to perform larger-scale and more-complex experiments. The volume and complexity of the resulting data create major challenges for downstream analysis. In particular, next-generation data-independent acquisition (DIA) experiments enable wider proteome coverage than more traditional targeted approaches but require computational workflows that can manage much larger datasets and identify peptide sequences from complex and overlapping spectral features. Data-processing tools such as FragPipe, DIA-NN and Spectronaut have undergone substantial improvements to process spectral features in a reasonable time. Statistical analysis tools are needed to draw meaningful comparisons between experimental samples, but these tools were also originally designed with smaller datasets in mind. This protocol describes an updated version of MSstats that has been adapted to be compatible with large-scale DIA experiments. A very large DIA experiment, processed with FragPipe, is used as an example to demonstrate different MSstats workflows. The choice of workflow depends on the user's computational resources. For datasets that are too large to fit into a standard computer's memory, we demonstrate the use of MSstatsBig, a companion R package to MSstats. The protocol also highlights key decisions that have a major effect on both the results and the processing time of the analysis. The MSstats processing can be expected to take 1-3 h depending on the usage of MSstatsBig. The protocol can be run in the point-and-click graphical user interface MSstatsShiny or implemented with minimal coding expertise in R.
Collapse
Affiliation(s)
- Devon Kohler
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
- Barnett Institute for Chemical and Biological Analysis, Northeastern University, Boston, MA, USA
| | | | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Alexey I Nesvizhskii
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
- Barnett Institute for Chemical and Biological Analysis, Northeastern University, Boston, MA, USA.
| |
Collapse
|
5
|
Hsiao Y, Zhang H, Li GX, Deng Y, Yu F, Kahrood HV, Steele JR, Schittenhelm RB, Nesvizhskii AI. Analysis and visualization of quantitative proteomics data using FragPipe-Analyst. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.05.583643. [PMID: 38496650 PMCID: PMC10942459 DOI: 10.1101/2024.03.05.583643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
The FragPipe computational proteomics platform is gaining widespread popularity among the proteomics research community because of its fast processing speed and user-friendly graphical interface. Although FragPipe produces well-formatted output tables that are ready for analysis, there is still a need for an easy-to-use and user-friendly downstream statistical analysis and visualization tool. FragPipe-Analyst addresses this need by providing an R shiny web server to assist FragPipe users in conducting downstream analyses of the resulting quantitative proteomics data. It supports major quantification workflows including label-free quantification, tandem mass tags, and data-independent acquisition. FragPipe-Analyst offers a range of useful functionalities, such as various missing value imputation options, data quality control, unsupervised clustering, differential expression (DE) analysis using Limma, and gene ontology and pathway enrichment analysis using Enrichr. To support advanced analysis and customized visualizations, we also developed FragPipeAnalystR, an R package encompassing all FragPipe-Analyst functionalities that is extended to support site-specific analysis of post-translational modifications (PTMs). FragPipe-Analyst and FragPipeAnalystR are both open-source and freely available.
Collapse
Affiliation(s)
- Yi Hsiao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Haijian Zhang
- Monash Proteomics & Metabolomics Platform, Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Ginny Xiaohe Li
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yamei Deng
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hossein Valipour Kahrood
- Monash Proteomics & Metabolomics Platform, Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
- Monash Genomics & Bioinformatics Platform, Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Joel R. Steele
- Monash Proteomics & Metabolomics Platform, Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Ralf B. Schittenhelm
- Monash Proteomics & Metabolomics Platform, Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Alexey I. Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
6
|
Shajari E, Gagné D, Malick M, Roy P, Noël JF, Gagnon H, Brunet MA, Delisle M, Boisvert FM, Beaulieu JF. Application of SWATH Mass Spectrometry and Machine Learning in the Diagnosis of Inflammatory Bowel Disease Based on the Stool Proteome. Biomedicines 2024; 12:333. [PMID: 38397935 PMCID: PMC10886680 DOI: 10.3390/biomedicines12020333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/17/2024] [Accepted: 01/25/2024] [Indexed: 02/25/2024] Open
Abstract
Inflammatory bowel disease (IBD) flare-ups exhibit symptoms that are similar to other diseases and conditions, making diagnosis and treatment complicated. Currently, the gold standard for diagnosing and monitoring IBD is colonoscopy and biopsy, which are invasive and uncomfortable procedures, and the fecal calprotectin test, which is not sufficiently accurate. Therefore, it is necessary to develop an alternative method. In this study, our aim was to provide proof of concept for the application of Sequential Window Acquisition of All Theoretical Mass Spectra-Mass spectrometry (SWATH-MS) and machine learning to develop a non-invasive and accurate predictive model using the stool proteome to distinguish between active IBD patients and symptomatic non-IBD patients. Proteome profiles of 123 samples were obtained and data processing procedures were optimized to select an appropriate pipeline. The differentially abundant analysis identified 48 proteins. Utilizing correlation-based feature selection (Cfs), 7 proteins were selected for proceeding steps. To identify the most appropriate predictive machine learning model, five of the most popular methods, including support vector machines (SVMs), random forests, logistic regression, naive Bayes, and k-nearest neighbors (KNN), were assessed. The generated model was validated by implementing the algorithm on 45 prospective unseen datasets; the results showed a sensitivity of 96% and a specificity of 76%, indicating its performance. In conclusion, this study illustrates the effectiveness of utilizing the stool proteome obtained through SWATH-MS in accurately diagnosing active IBD via a machine learning model.
Collapse
Affiliation(s)
- Elmira Shajari
- Laboratory of Intestinal Physiopathology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| | - David Gagné
- Laboratory of Intestinal Physiopathology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Allumiqs, 975 Rue Léon-Trépanier, Sherbrooke, QC J1G 5J6, Canada
| | - Mandy Malick
- Laboratory of Intestinal Physiopathology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| | - Patricia Roy
- Laboratory of Intestinal Physiopathology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| | | | - Hugo Gagnon
- Allumiqs, 975 Rue Léon-Trépanier, Sherbrooke, QC J1G 5J6, Canada
| | - Marie A. Brunet
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Department of Pediatrics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| | - Maxime Delisle
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Department of Medicine, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| | - François-Michel Boisvert
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| | - Jean-François Beaulieu
- Laboratory of Intestinal Physiopathology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
- Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| |
Collapse
|
7
|
Repetto O, Vettori R, Steffan A, Cannizzaro R, De Re V. Circulating Proteins as Diagnostic Markers in Gastric Cancer. Int J Mol Sci 2023; 24:16931. [PMID: 38069253 PMCID: PMC10706891 DOI: 10.3390/ijms242316931] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 11/22/2023] [Accepted: 11/24/2023] [Indexed: 12/18/2023] Open
Abstract
Gastric cancer (GC) is a highly malignant disease affecting humans worldwide and has a poor prognosis. Most GC cases are detected at advanced stages due to the cancer lacking early detectable symptoms. Therefore, there is great interest in improving early diagnosis by implementing targeted prevention strategies. Markers are necessary for early detection and to guide clinicians to the best personalized treatment. The current semi-invasive endoscopic methods to detect GC are invasive, costly, and time-consuming. Recent advances in proteomics technologies have enabled the screening of many samples and the detection of novel biomarkers and disease-related signature signaling networks. These biomarkers include circulating proteins from different fluids (e.g., plasma, serum, urine, and saliva) and extracellular vesicles. We review relevant published studies on circulating protein biomarkers in GC and detail their application as potential biomarkers for GC diagnosis. Identifying highly sensitive and highly specific diagnostic markers for GC may improve patient survival rates and contribute to advancing precision/personalized medicine.
Collapse
Affiliation(s)
- Ombretta Repetto
- Facility of Bio-Proteomics, Immunopathology and Cancer Biomarkers, Centro di Riferimento Oncologico di Aviano (CRO), National Cancer Institute, IRCCS, 33081 Aviano, Italy
| | - Roberto Vettori
- Immunopathology and Cancer Biomarkers, Centro di Riferimento Oncologico di Aviano (CRO), National Cancer Institute, IRCCS, 33081 Aviano, Italy; (R.V.); (A.S.)
| | - Agostino Steffan
- Immunopathology and Cancer Biomarkers, Centro di Riferimento Oncologico di Aviano (CRO), National Cancer Institute, IRCCS, 33081 Aviano, Italy; (R.V.); (A.S.)
| | - Renato Cannizzaro
- Oncological Gastroenterology, Centro di Riferimento Oncologico di Aviano (CRO), National Cancer Institute, IRCCS, 33081 Aviano, Italy;
- Department of Medical, Surgical and Health Sciences, University of Trieste, 34127 Trieste, Italy
| | - Valli De Re
- Facility of Bio-Proteomics, Immunopathology and Cancer Biomarkers, Centro di Riferimento Oncologico di Aviano (CRO), National Cancer Institute, IRCCS, 33081 Aviano, Italy
| |
Collapse
|
8
|
Harris L, Fondrie WE, Oh S, Noble WS. Evaluating Proteomics Imputation Methods with Improved Criteria. J Proteome Res 2023; 22:3427-3438. [PMID: 37861703 PMCID: PMC10949645 DOI: 10.1021/acs.jproteome.3c00205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
Quantitative measurements produced by tandem mass spectrometry proteomics experiments typically contain a large proportion of missing values. Missing values hinder reproducibility, reduce statistical power, and make it difficult to compare across samples or experiments. Although many methods exist for imputing missing values, in practice, the most commonly used methods are among the worst performing. Furthermore, previous benchmarking studies have focused on relatively simple measurements of error such as the mean-squared error between imputed and held-out values. Here we evaluate the performance of commonly used imputation methods using three practical, "downstream-centric" criteria. These criteria measure the ability to identify differentially expressed peptides, generate new quantitative peptides, and improve the peptide lower limit of quantification. Our evaluation comprises several experiment types and acquisition strategies, including data-dependent and data-independent acquisition. We find that imputation does not necessarily improve the ability to identify differentially expressed peptides but that it can identify new quantitative peptides and improve the peptide lower limit of quantification. We find that MissForest is generally the best performing method per our downstream-centric criteria. We also argue that existing imputation methods do not properly account for the variance of peptide quantifications and highlight the need for methods that do.
Collapse
Affiliation(s)
- Lincoln Harris
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | | | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
9
|
Bennike TB. Advances in proteomics: characterization of the innate immune system after birth and during inflammation. Front Immunol 2023; 14:1254948. [PMID: 37868984 PMCID: PMC10587584 DOI: 10.3389/fimmu.2023.1254948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 09/26/2023] [Indexed: 10/24/2023] Open
Abstract
Proteomics is the characterization of the protein composition, the proteome, of a biological sample. It involves the large-scale identification and quantification of proteins, peptides, and post-translational modifications. This review focuses on recent developments in mass spectrometry-based proteomics and provides an overview of available methods for sample preparation to study the innate immune system. Recent advancements in the proteomics workflows, including sample preparation, have significantly improved the sensitivity and proteome coverage of biological samples including the technically difficult blood plasma. Proteomics is often applied in immunology and has been used to characterize the levels of innate immune system components after perturbations such as birth or during chronic inflammatory diseases like rheumatoid arthritis (RA) and inflammatory bowel disease (IBD). In cancers, the tumor microenvironment may generate chronic inflammation and release cytokines to the circulation. In these situations, the innate immune system undergoes profound and long-lasting changes, the large-scale characterization of which may increase our biological understanding and help identify components with translational potential for guiding diagnosis and treatment decisions. With the ongoing technical development, proteomics will likely continue to provide increasing insights into complex biological processes and their implications for health and disease. Integrating proteomics with other omics data and utilizing multi-omics approaches have been demonstrated to give additional valuable insights into biological systems.
Collapse
Affiliation(s)
- Tue Bjerg Bennike
- Medical Microbiology and Immunology, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| |
Collapse
|
10
|
Wang H, Dai C, Pfeuffer J, Sachsenberg T, Sanchez A, Bai M, Perez-Riverol Y. Tissue-based absolute quantification using large-scale TMT and LFQ experiments. Proteomics 2023; 23:e2300188. [PMID: 37488995 DOI: 10.1002/pmic.202300188] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 07/04/2023] [Accepted: 07/05/2023] [Indexed: 07/26/2023]
Abstract
Relative and absolute intensity-based protein quantification across cell lines, tissue atlases and tumour datasets is increasingly available in public datasets. These atlases enable researchers to explore fundamental biological questions, such as protein existence, expression location, quantity and correlation with RNA expression. Most studies provide MS1 feature-based label-free quantitative (LFQ) datasets; however, growing numbers of isobaric tandem mass tags (TMT) datasets remain unexplored. Here, we compare traditional intensity-based absolute quantification (iBAQ) proteome abundance ranking to an analogous method using reporter ion proteome abundance ranking with data from an experiment where LFQ and TMT were measured on the same samples. This new TMT method substitutes reporter ion intensities for MS1 feature intensities in the iBAQ framework. Additionally, we compared LFQ-iBAQ values to TMT-iBAQ values from two independent large-scale tissue atlas datasets (one LFQ and one TMT) using robust bottom-up proteomic identification, normalisation and quantitation workflows.
Collapse
Affiliation(s)
- Hong Wang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Chengxin Dai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Julianus Pfeuffer
- Algorithmic Bioinformatics, Freie Universität Berlin, Berlin, Germany
| | - Timo Sachsenberg
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Aniel Sanchez
- Section for Clinical Chemistry, Department of Translational Medicine, Lund University, Skåne University Hospital Malmö, Malmö, Sweden
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|