1
|
Yan B, Shi M, Cai S, Su Y, Chen R, Huang C, Chen DDY. Data-Driven Tool for Cross-Run Ion Selection and Peak-Picking in Quantitative Proteomics with Data-Independent Acquisition LC-MS/MS. Anal Chem 2023; 95:16558-16566. [PMID: 37906674 DOI: 10.1021/acs.analchem.3c02689] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Proteomics provides molecular bases of biology and disease, and liquid chromatography-tandem mass spectrometry (LC-MS/MS) is a platform widely used for bottom-up proteomics. Data-independent acquisition (DIA) improves the run-to-run reproducibility of LC-MS/MS in proteomics research. However, the existing DIA data processing tools sometimes produce large deviations from true values for the peptides and proteins in quantification. Peak-picking error and incorrect ion selection are the two main causes of the deviations. We present a cross-run ion selection and peak-picking (CRISP) tool that utilizes the important advantage of run-to-run consistency of DIA and simultaneously examines the DIA data from the whole set of runs to filter out the interfering signals, instead of only looking at a single run at a time. Eight datasets acquired by mass spectrometers from different vendors with different types of mass analyzers were used to benchmark our CRISP-DIA against other currently available DIA tools. In the benchmark datasets, for analytes with large content variation among samples, CRISP-DIA generally resulted in 20 to 50% relative decrease in error rates compared to other DIA tools, at both the peptide precursor level and the protein level. CRISP-DIA detected differentially expressed proteins more efficiently, with 3.3 to 90.3% increases in the numbers of true positives and 12.3 to 35.3% decreases in the false positive rates, in some cases. In the real biological datasets, CRISP-DIA showed better consistencies of the quantification results. The advantages of assimilating DIA data in multiple runs for quantitative proteomics were demonstrated, which can significantly improve the quantification accuracy.
Collapse
Affiliation(s)
- Binjun Yan
- Key Laboratory of Systems Biology, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| | - Mengtian Shi
- Key Laboratory of Systems Biology, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- College of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou 310053, China
| | - Siyu Cai
- Key Laboratory of Systems Biology, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- College of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou 310053, China
| | - Yuan Su
- Key Laboratory of Systems Biology, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- College of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou 310053, China
| | - Renhui Chen
- Key Laboratory of Systems Biology, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| | - Chiyuan Huang
- Key Laboratory of Systems Biology, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| | - David Da Yong Chen
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia V6T 1Z1, Canada
| |
Collapse
|
2
|
Gatto L, Aebersold R, Cox J, Demichev V, Derks J, Emmott E, Franks AM, Ivanov AR, Kelly RT, Khoury L, Leduc A, MacCoss MJ, Nemes P, Perlman DH, Petelski AA, Rose CM, Schoof EM, Van Eyk J, Vanderaa C, Yates JR, Slavov N. Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments. Nat Methods 2023; 20:375-386. [PMID: 36864200 PMCID: PMC10130941 DOI: 10.1038/s41592-023-01785-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 01/24/2023] [Indexed: 03/04/2023]
Abstract
Analyzing proteins from single cells by tandem mass spectrometry (MS) has recently become technically feasible. While such analysis has the potential to accurately quantify thousands of proteins across thousands of single cells, the accuracy and reproducibility of the results may be undermined by numerous factors affecting experimental design, sample preparation, data acquisition and data analysis. We expect that broadly accepted community guidelines and standardized metrics will enhance rigor, data quality and alignment between laboratories. Here we propose best practices, quality controls and data-reporting recommendations to assist in the broad adoption of reliable quantitative workflows for single-cell proteomics. Resources and discussion forums are available at https://single-cell.net/guidelines .
Collapse
Affiliation(s)
- Laurent Gatto
- Computational Biology and Bioinformatics Unit, de Duve Institute, Université Catholique de Louvain, Brussels, Belgium
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Juergen Cox
- Max Planck Institute of Biochemistry, Martinsried, Germany
| | | | - Jason Derks
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
| | - Edward Emmott
- Centre for Proteome Research, Department of Biochemistry and Systems Biology, University of Liverpool, Liverpool, UK
| | - Alexander M Franks
- Department of Statistics and Applied Probability, University of California Santa Barbara, Santa Barbara, CA, USA
| | - Alexander R Ivanov
- Department of Chemistry and Chemical Biology, Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA, USA
| | - Ryan T Kelly
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, USA
| | - Luke Khoury
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
| | - Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
| | | | - Peter Nemes
- Department of Chemistry and Biochemistry, University of Maryland, College Park, MD, USA
| | - David H Perlman
- Merck Exploratory Science Center, Merck Sharp & Dohme Corp., Cambridge, MA, USA
| | - Aleksandra A Petelski
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
- Parallel Squared Technology Institute, Watertown, MA, USA
| | - Christopher M Rose
- Department of Microchemistry, Proteomics and Lipidomics, Genentech Inc., South San Francisco, CA, USA
| | - Erwin M Schoof
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark
| | | | - Christophe Vanderaa
- Computational Biology and Bioinformatics Unit, de Duve Institute, Université Catholique de Louvain, Brussels, Belgium
| | - John R Yates
- Departments of Molecular Medicine and Neurobiology, the Scripps Research Institute, La Jolla, CA, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA.
- Parallel Squared Technology Institute, Watertown, MA, USA.
| |
Collapse
|
3
|
Jones AR, Deutsch EW, Vizcaíno JA. Is DIA proteomics data FAIR? Current data sharing practices, available bioinformatics infrastructure and recommendations for the future. Proteomics 2022; 23:e2200014. [PMID: 36074795 PMCID: PMC10155627 DOI: 10.1002/pmic.202200014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/27/2022] [Accepted: 08/29/2022] [Indexed: 11/06/2022]
Abstract
Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in e.g. instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards, since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 3BX, UK
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109, USA
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
4
|
Deutsch EW, Lane L, Overall CM, Bandeira N, Baker MS, Pineau C, Moritz RL, Corrales F, Orchard S, Van Eyk JE, Paik YK, Weintraub ST, Vandenbrouck Y, Omenn GS. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0. J Proteome Res 2019; 18:4108-4116. [PMID: 31599596 DOI: 10.1021/acs.jproteome.9b00542] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The Human Proteome Organization's (HUPO) Human Proteome Project (HPP) developed Mass Spectrometry (MS) Data Interpretation Guidelines that have been applied since 2016. These guidelines have helped ensure that the emerging draft of the complete human proteome is highly accurate and with low numbers of false-positive protein identifications. Here, we describe an update to these guidelines based on consensus-reaching discussions with the wider HPP community over the past year. The revised 3.0 guidelines address several major and minor identified gaps. We have added guidelines for emerging data independent acquisition (DIA) MS workflows and for use of the new Universal Spectrum Identifier (USI) system being developed by the HUPO Proteomics Standards Initiative (PSI). In addition, we discuss updates to the standard HPP pipeline for collecting MS evidence for all proteins in the HPP, including refinements to minimum evidence. We present a new plan for incorporating MassIVE-KB into the HPP pipeline for the next (HPP 2020) cycle in order to obtain more comprehensive coverage of public MS data sets. The main checklist has been reorganized under headings and subitems, and related guidelines have been grouped. In sum, Version 2.1 of the HPP MS Data Interpretation Guidelines has served well, and this timely update to version 3.0 will aid the HPP as it approaches its goal of collecting and curating MS evidence of translation and expression for all predicted ∼20 000 human proteins encoded by the human genome.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, Faculty of Medicine , University of Geneva , CMU, Michel Servet 1 , 1211 Geneva 4 , Switzerland
| | - Christopher M Overall
- Centre for Blood Research, Departments of Oral Biological & Medical Sciences, and Biochemistry & Molecular Biology, Faculty of Dentistry , The University of British Columbia , Vancouver , BC V6T 1Z4 , Canada
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry and Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences , University of California San Diego , La Jolla , California 92093 , United States
| | - Mark S Baker
- Department of Biomedical Sciences, Faculty of Medicine and Health Science , Macquarie University , Macquarie Park , NSW 2109 , Australia
| | - Charles Pineau
- Univ. Rennes , Inserm, EHESP, Irset (Institut de Recherche en Santé, Environnement et Travail) - UMR_S 1085 , F-35042 Rennes cedex , France
| | - Robert L Moritz
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| | - Fernando Corrales
- Functional Proteomics Laboratory, Centro Nacional de Biotecnología , Spanish Research Council , ProteoRed-.ISCIII , Madrid 117 , Spain
| | - Sandra Orchard
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus , Hinxton , Cambridge CB10 1SD , U.K
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, The Smidt Heart Institute, Department of Medicine , Cedars Sinai Medical Center , Los Angeles , California 90048 , United States
| | - Young-Ki Paik
- Yonsei Proteome Research Center , Yonsei University , 50 Yonsei-ro , Sudaemoon-ku , Seoul 03720 , Korea
| | - Susan T Weintraub
- The University of Texas Health Science Center at San Antonio , San Antonio , Texas 78229 , United States
| | - Yves Vandenbrouck
- Univ. Grenoble Alpes , CEA, INSERM, IRIG-BGE, U1038 , F-38000 Grenoble , France
| | - Gilbert S Omenn
- Institute for Systems Biology , Seattle , Washington 98109 , United States.,Departments of Computational Medicine & Bioinformatics, Internal Medicine, and Human Genetics and School of Public Health , University of Michigan , Ann Arbor , Michigan 48109-2218 , United States
| |
Collapse
|
5
|
Ignjatovic V, Geyer PE, Palaniappan KK, Chaaban JE, Omenn GS, Baker MS, Deutsch EW, Schwenk JM. Mass Spectrometry-Based Plasma Proteomics: Considerations from Sample Collection to Achieving Translational Data. J Proteome Res 2019; 18:4085-4097. [PMID: 31573204 DOI: 10.1021/acs.jproteome.9b00503] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The proteomic analysis of human blood and blood-derived products (e.g., plasma) offers an attractive avenue to translate research progress from the laboratory into the clinic. However, due to its unique protein composition, performing proteomics assays with plasma is challenging. Plasma proteomics has regained interest due to recent technological advances, but challenges imposed by both complications inherent to studying human biology (e.g., interindividual variability) and analysis of biospecimens (e.g., sample variability), as well as technological limitations remain. As part of the Human Proteome Project (HPP), the Human Plasma Proteome Project (HPPP) brings together key aspects of the plasma proteomics pipeline. Here, we provide considerations and recommendations concerning study design, plasma collection, quality metrics, plasma processing workflows, mass spectrometry (MS) data acquisition, data processing, and bioinformatic analysis. With exciting opportunities in studying human health and disease though this plasma proteomics pipeline, a more informed analysis of human plasma will accelerate interest while enhancing possibilities for the incorporation of proteomics-scaled assays into clinical practice.
Collapse
Affiliation(s)
- Vera Ignjatovic
- Haematology Research , Murdoch Children's Research Institute , Parkville , VIC 3052 , Australia.,Department of Paediatrics , The University of Melbourne , Parkville , VIC 3052 , Australia
| | - Philipp E Geyer
- NNF Center for Protein Research, Faculty of Health Sciences , University of Copenhagen , 2200 Copenhagen , Denmark.,Department of Proteomics and Signal Transduction , Max Planck Institute of Biochemistry , 82152 Martinsried , Germany
| | - Krishnan K Palaniappan
- Freenome , 259 East Grand Avenue , South San Francisco , California 94080 , United States
| | - Jessica E Chaaban
- Haematology Research , Murdoch Children's Research Institute , Parkville , VIC 3052 , Australia
| | - Gilbert S Omenn
- Departments of Computational Medicine & Bioinformatics, Human Genetics, and Internal Medicine and School of Public Health , University of Michigan , 100 Washtenaw Avenue , Ann Arbor , Michigan 48109-2218 , United States
| | - Mark S Baker
- Department of Biomedical Sciences, Faculty of Medicine & Health Sciences , Macquarie University , 75 Talavera Road , North Ryde , NSW 2109 , Australia
| | - Eric W Deutsch
- Institute for Systems Biology , 401 Terry Avenue North , Seattle , Washington 98109 , United States
| | - Jochen M Schwenk
- Affinity Proteomics, SciLifeLab , KTH Royal Institute of Technology , 171 65 Stockholm , Sweden
| |
Collapse
|
6
|
Tully B, Balleine RL, Hains PG, Zhong Q, Reddel RR, Robinson PJ. Addressing the Challenges of High-Throughput Cancer Tissue Proteomics for Clinical Application: ProCan. Proteomics 2019; 19:e1900109. [PMID: 31321850 DOI: 10.1002/pmic.201900109] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 07/11/2019] [Indexed: 11/09/2022]
Abstract
The cancer tissue proteome has enormous potential as a source of novel predictive biomarkers in oncology. Progress in the development of mass spectrometry (MS)-based tissue proteomics now presents an opportunity to exploit this by applying the strategies of comprehensive molecular profiling and big-data analytics that are refined in other fields of 'omics research. ProCan (ProCan is a registered trademark) is a program aiming to generate high-quality tissue proteomic data across a broad spectrum of cancer types. It is based on data-independent acquisition-MS proteomic analysis of annotated tissue samples sourced through collaboration with expert clinical and cancer research groups. The practical requirements of a high-throughput translational research program have shaped the approach that ProCan is taking to address challenges in study design, sample preparation, raw data acquisition, and data analysis. The ultimate goal is to establish a large proteomics knowledge-base that, in combination with other cancer 'omics data, will accelerate cancer research.
Collapse
Affiliation(s)
- Brett Tully
- ProCan, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Rosemary L Balleine
- ProCan, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Peter G Hains
- ProCan, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Qing Zhong
- ProCan, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Roger R Reddel
- ProCan, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Phillip J Robinson
- ProCan, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, 2145, Australia
| |
Collapse
|
7
|
Petyuk VA, Gatto L, Payne SH. Reproducibility and Transparency by Design. Mol Cell Proteomics 2019; 18:S202-S204. [PMID: 31273047 PMCID: PMC6692781 DOI: 10.1074/mcp.ip119.001567] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 06/24/2019] [Indexed: 12/24/2022] Open
Abstract
The reproducibility of bioinformatics analyses can be elevated to equal status with biological discovery. To achieve this, reproducibility must become part of the process, not an afterthought.
Collapse
Affiliation(s)
| | - Laurent Gatto
- de Duve Institute, Université Catholique de Louvain, Brussels, Belgium
| | | |
Collapse
|