51
|
Deep representation features from DreamDIA XMBD improve the analysis of data-independent acquisition proteomics. Commun Biol 2021; 4:1190. [PMID: 34650228 PMCID: PMC8517002 DOI: 10.1038/s42003-021-02726-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Accepted: 09/27/2021] [Indexed: 12/24/2022] Open
Abstract
We developed DreamDIAXMBD (denoted as DreamDIA), a software suite based on a deep representation model for data-independent acquisition (DIA) data analysis. DreamDIA adopts a data-driven strategy to capture comprehensive information from elution patterns of peptides in DIA data and achieves considerable improvements on both identification and quantification performance compared with other state-of-the-art methods such as OpenSWATH, Skyline and DIA-NN. Specifically, in contrast to existing methods which use only 6 to 10 selected fragment ions from spectral libraries, DreamDIA extracts additional features from hundreds of theoretical elution profiles originated from different ions of each precursor using a deep representation network. To achieve higher coverage of target peptides without sacrificing specificity, the extracted features are further processed by nonlinear discriminative models under the framework of positive-unlabeled learning with decoy peptides as affirmative negative controls. DreamDIA is publicly available at https://github.com/xmuyulab/DreamDIA-XMBD for high coverage and accuracy DIA data analysis.
Collapse
|
52
|
Santiago-Rodriguez TM, Hollister EB. Multi 'omic data integration: A review of concepts, considerations, and approaches. Semin Perinatol 2021; 45:151456. [PMID: 34256961 DOI: 10.1016/j.semperi.2021.151456] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The application of 'omic techniques including, but not limited to genomics/metagenomics, transcriptomics/meta-transcriptomics, proteomics/meta-proteomics, and metabolomics to generate multiple datasets from a single sample have facilitated hypothesis generation leading to the identification of biological, molecular and ecological functions and mechanisms, as well as associations and correlations. Despite their power and promise, a variety of challenges must be considered in the successful design and execution of a multi-omics study. In this review, various 'omic technologies applicable to single- and meta-organisms (i.e., host + microbiome) are described, and considerations for sample collection, storage and processing prior to data generation and analysis, as well as approaches to data storage, dissemination and analysis are discussed. Finally, case studies are included as examples of multi-omic applications providing novel insights and a more holistic understanding of biological processes.
Collapse
Affiliation(s)
| | - Emily B Hollister
- Diversigen, Inc, 3 Greenway Plaza, Suite 1575, Houston, TX 77046, USA.
| |
Collapse
|
53
|
Abstract
Direct infusion shotgun proteome analysis (DISPA) is a new paradigm for expedited mass spectrometry-based proteomics, but the original data analysis workflow was onerous. Here, we introduce CsoDIAq, a user-friendly software package for the identification and quantification of peptides and proteins from DISPA data. In addition to establishing a complete and automated analysis workflow with a graphical user interface, CsoDIAq introduces algorithmic concepts to spectrum-spectrum matching to improve peptide identification speed and sensitivity. These include spectra pooling to reduce search time complexity and a new spectrum-spectrum match score called match count and cosine, which improves target discrimination in a target-decoy analysis. Fragment mass tolerance correction also increased the number of peptide identifications. Finally, we adapt CsoDIAq to standard LC-MS DIA and show that it outperforms other spectrum-spectrum matching software.
Collapse
Affiliation(s)
- Caleb W Cranney
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, United States
| | - Jesse G Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, United States
| |
Collapse
|
54
|
Time-resolved in vivo ubiquitinome profiling by DIA-MS reveals USP7 targets on a proteome-wide scale. Nat Commun 2021; 12:5399. [PMID: 34518535 PMCID: PMC8438043 DOI: 10.1038/s41467-021-25454-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 08/11/2021] [Indexed: 11/08/2022] Open
Abstract
Mass spectrometry (MS)-based ubiquitinomics provides system-level understanding of ubiquitin signaling. Here we present a scalable workflow for deep and precise in vivo ubiquitinome profiling, coupling an improved sample preparation protocol with data-independent acquisition (DIA)-MS and neural network-based data processing specifically optimized for ubiquitinomics. Compared to data-dependent acquisition (DDA), our method more than triples identification numbers to 70,000 ubiquitinated peptides in single MS runs, while significantly improving robustness and quantification precision. Upon inhibition of the oncology target USP7, we simultaneously record ubiquitination and consequent changes in abundance of more than 8,000 proteins at high temporal resolution. While ubiquitination of hundreds of proteins increases within minutes of USP7 inhibition, we find that only a small fraction of those are ever degraded, thereby dissecting the scope of USP7 action. Our method enables rapid mode-of-action profiling of candidate drugs targeting DUBs or ubiquitin ligases at high precision and throughput. Combining improved sample preparation, data-independent acquisition mass spectrometry and deep learning, the authors develop a workflow for more robust and precise quantitative ubiquitinome profiling. They use this method to characterize targets of the deubiquitinase USP7 and effects of USP7 inhibitors.
Collapse
|
55
|
Lu YY, Bilmes J, Rodriguez-Mias RA, Villén J, Noble WS. DIAmeter: matching peptides to data-independent acquisition mass spectrometry data. Bioinformatics 2021; 37:i434-i442. [PMID: 34252924 PMCID: PMC8686675 DOI: 10.1093/bioinformatics/btab284] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Tandem mass spectrometry data acquired using data independent acquisition (DIA) is challenging to interpret because the data exhibits complex structure along both the mass-to-charge (m/z) and time axes. The most common approach to analyzing this type of data makes use of a library of previously observed DIA data patterns (a 'spectral library'), but this approach is expensive because the libraries do not typically generalize well across laboratories. RESULTS Here, we propose DIAmeter, a search engine that detects peptides in DIA data using only a peptide sequence database. Although some existing library-free DIA analysis methods (i) support data generated using both wide and narrow isolation windows, (ii) detect peptides containing post-translational modifications, (iii) analyze data from a variety of instrument platforms and (iv) are capable of detecting peptides even in the absence of detectable signal in the survey (MS1) scan, DIAmeter is the only method that offers all four capabilities in a single tool. AVAILABILITY AND IMPLEMENTATION The open source, Apache licensed source code is available as part of the Crux mass spectrometry analysis toolkit (http://crux.ms). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Young Lu
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jeff Bilmes
- Department of Electrical Engineering, University of Washington, Seattle, WA 98195, USA.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | | | - Judit Villén
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
56
|
Cadow J, Manica M, Mathis R, Guo T, Aebersold R, Rodríguez Martínez M. On the feasibility of deep learning applications using raw mass spectrometry data. Bioinformatics 2021; 37:i245-i253. [PMID: 34252933 PMCID: PMC8275322 DOI: 10.1093/bioinformatics/btab311] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
SUMMARY In recent years, SWATH-MS has become the proteomic method of choice for data-independent-acquisition, as it enables high proteome coverage, accuracy and reproducibility. However, data analysis is convoluted and requires prior information and expert curation. Furthermore, as quantification is limited to a small set of peptides, potentially important biological information may be discarded. Here we demonstrate that deep learning can be used to learn discriminative features directly from raw MS data, eliminating hence the need of elaborate data processing pipelines. Using transfer learning to overcome sample sparsity, we exploit a collection of publicly available deep learning models already trained for the task of natural image classification. These models are used to produce feature vectors from each mass spectrometry (MS) raw image, which are later used as input for a classifier trained to distinguish tumor from normal prostate biopsies. Although the deep learning models were originally trained for a completely different classification task and no additional fine-tuning is performed on them, we achieve a highly remarkable classification performance of 0.876 AUC. We investigate different types of image preprocessing and encoding. We also investigate whether the inclusion of the secondary MS2 spectra improves the classification performance. Throughout all tested models, we use standard protein expression vectors as gold standards. Even with our naïve implementation, our results suggest that the application of deep learning and transfer learning techniques might pave the way to the broader usage of raw mass spectrometry data in real-time diagnosis. AVAILABILITY AND IMPLEMENTATION The open source code used to generate the results from MS images is available on GitHub: https://ibm.biz/mstransc. The raw MS data underlying this article cannot be shared publicly for the privacy of individuals that participated in the study. Processed data including the MS images, their encodings, classification labels and results can be accessed at the following link: https://ibm.box.com/v/mstc-supplementary. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joris Cadow
- Cognitive Computing & Industry Solutions, IBM Research Europe - Zurich, Rueschlikon 8803, Switzerland
| | - Matteo Manica
- Cognitive Computing & Industry Solutions, IBM Research Europe - Zurich, Rueschlikon 8803, Switzerland
| | - Roland Mathis
- Cognitive Computing & Industry Solutions, IBM Research Europe - Zurich, Rueschlikon 8803, Switzerland
| | - Tiannan Guo
- Institute of Basic Medical Sciences, School of Life Science, Westlake University, Hangzhou 310024, China
| | - Ruedi Aebersold
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich 8093, Switzerland
| | - María Rodríguez Martínez
- Cognitive Computing & Industry Solutions, IBM Research Europe - Zurich, Rueschlikon 8803, Switzerland
| |
Collapse
|
57
|
Stancliffe E, Schwaiger-Haber M, Sindelar M, Patti GJ. DecoID improves identification rates in metabolomics through database-assisted MS/MS deconvolution. Nat Methods 2021; 18:779-787. [PMID: 34239103 PMCID: PMC9302972 DOI: 10.1038/s41592-021-01195-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 05/24/2021] [Indexed: 02/03/2023]
Abstract
Chimeric MS/MS spectra contain fragments from multiple precursor ions and therefore hinder compound identification in metabolomics. Historically, deconvolution of these chimeric spectra has been challenging and relied on specific experimental methods that introduce variation in the ratios of precursor ions between multiple tandem mass spectrometry (MS/MS) scans. DecoID provides a complementary, method-independent approach where database spectra are computationally mixed to match an experimentally acquired spectrum by using LASSO regression. We validated that DecoID increases the number of identified metabolites in MS/MS datasets from both data-independent and data-dependent acquisition without increasing the false discovery rate. We applied DecoID to publicly available data from the MetaboLights repository and to data from human plasma, where DecoID increased the number of identified metabolites from data-dependent acquisition data by over 30% compared to direct spectral matching. DecoID is compatible with any user-defined MS/MS database and provides automated searching for some of the largest MS/MS databases currently available.
Collapse
Affiliation(s)
- Ethan Stancliffe
- Department of Chemistry, Washington University in St. Louis, St. Louis, MO, USA
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Michaela Schwaiger-Haber
- Department of Chemistry, Washington University in St. Louis, St. Louis, MO, USA
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Miriam Sindelar
- Department of Chemistry, Washington University in St. Louis, St. Louis, MO, USA
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Gary J Patti
- Department of Chemistry, Washington University in St. Louis, St. Louis, MO, USA.
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
58
|
Messner CB, Demichev V, Bloomfield N, Yu JSL, White M, Kreidl M, Egger AS, Freiwald A, Ivosev G, Wasim F, Zelezniak A, Jürgens L, Suttorp N, Sander LE, Kurth F, Lilley KS, Mülleder M, Tate S, Ralser M. Ultra-fast proteomics with Scanning SWATH. Nat Biotechnol 2021; 39:846-854. [PMID: 33767396 PMCID: PMC7611254 DOI: 10.1038/s41587-021-00860-4] [Citation(s) in RCA: 130] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 02/18/2021] [Indexed: 01/31/2023]
Abstract
Accurate quantification of the proteome remains challenging for large sample series and longitudinal experiments. We report a data-independent acquisition method, Scanning SWATH, that accelerates mass spectrometric (MS) duty cycles, yielding quantitative proteomes in combination with short gradients and high-flow (800 µl min-1) chromatography. Exploiting a continuous movement of the precursor isolation window to assign precursor masses to tandem mass spectrometry (MS/MS) fragment traces, Scanning SWATH increases precursor identifications by ~70% compared to conventional data-independent acquisition (DIA) methods on 0.5-5-min chromatographic gradients. We demonstrate the application of ultra-fast proteomics in drug mode-of-action screening and plasma proteomics. Scanning SWATH proteomes capture the mode of action of fungistatic azoles and statins. Moreover, we confirm 43 and identify 11 new plasma proteome biomarkers of COVID-19 severity, advancing patient classification and biomarker discovery. Thus, our results demonstrate a substantial acceleration and increased depth in fast proteomic experiments that facilitate proteomic drug screens and clinical studies.
Collapse
Affiliation(s)
- Christoph B Messner
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Vadim Demichev
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Department of Biochemistry, Cambridge Centre for Proteomics, University of Cambridge, Cambridge, UK
| | | | - Jason S L Yu
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Matthew White
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Marco Kreidl
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Anna-Sophia Egger
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Anja Freiwald
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Core Facility - High Throughput Mass Spectrometry, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | | | | | - Aleksej Zelezniak
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Linda Jürgens
- Department of Infectious Diseases and Respiratory Medicine, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Norbert Suttorp
- Department of Infectious Diseases and Respiratory Medicine, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Leif Erik Sander
- Department of Infectious Diseases and Respiratory Medicine, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Florian Kurth
- Department of Infectious Diseases and Respiratory Medicine, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Department of Tropical Medicine, Bernhard Nocht Institute for Tropical Medicine & I. Department of Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Kathryn S Lilley
- Department of Biochemistry, Cambridge Centre for Proteomics, University of Cambridge, Cambridge, UK
| | - Michael Mülleder
- Core Facility - High Throughput Mass Spectrometry, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | | | - Markus Ralser
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK.
- Department of Biochemistry, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
| |
Collapse
|
59
|
Abstract
Mass-spectrometry-based proteomics enables quantitative analysis of thousands of human proteins. However, experimental and computational challenges restrict progress in the field. This review summarizes the recent flurry of machine-learning strategies using artificial deep neural networks (or "deep learning") that have started to break barriers and accelerate progress in the field of shotgun proteomics. Deep learning now accurately predicts physicochemical properties of peptides from their sequence, including tandem mass spectra and retention time. Furthermore, deep learning methods exist for nearly every aspect of the modern proteomics workflow, enabling improved feature selection, peptide identification, and protein inference.
Collapse
Affiliation(s)
- Jesse G. Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| |
Collapse
|
60
|
Bichmann L, Gupta S, Rosenberger G, Kuchenbecker L, Sachsenberg T, Ewels P, Alka O, Pfeuffer J, Kohlbacher O, Röst H. DIAproteomics: A Multifunctional Data Analysis Pipeline for Data-Independent Acquisition Proteomics and Peptidomics. J Proteome Res 2021; 20:3758-3766. [PMID: 34153189 DOI: 10.1021/acs.jproteome.1c00123] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Data-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. The main advantages include greater reproducibility and sensitivity and a greater dynamic range compared with data-dependent acquisition (DDA). However, the data analysis is complex and often requires expert knowledge when dealing with large-scale data sets. Here we present DIAproteomics, a multifunctional, automated, high-throughput pipeline implemented in the Nextflow workflow management system that allows one to easily process proteomics and peptidomics DIA data sets on diverse compute infrastructures. The central components are well-established tools such as the OpenSwathWorkflow for the DIA spectral library search and PyProphet for the false discovery rate assessment. In addition, it provides options to generate spectral libraries from existing DDA data and to carry out the retention time and chromatogram alignment. The output includes annotated tables and diagnostic visualizations from the statistical postprocessing and computation of fold-changes across pairwise conditions, predefined in an experimental design. DIAproteomics is well documented open-source software and is available under a permissive license to the scientific community at https://www.openms.de/diaproteomics/.
Collapse
Affiliation(s)
- Leon Bichmann
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany.,Institute for Cell Biology, Department of Immunology, University of Tübingen, Tübingen 72076, Germany
| | - Shubham Gupta
- Donnelly Center for Biomolecular Research, University of Toronto, Toronto, Ontario ON M5S 3E1, Canada
| | - George Rosenberger
- Department of Systems Biology, Columbia University, New York, New York 10032, United States
| | - Leon Kuchenbecker
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany
| | - Timo Sachsenberg
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany
| | - Phil Ewels
- Science for Life Laboratory (SciLifeLab), Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Oliver Alka
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany
| | - Julianus Pfeuffer
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany.,Institute for Informatics, Freie Universität Berlin, Berlin 14195, Germany.,Zuse Institute Berlin, Berlin 14195, Germany
| | - Oliver Kohlbacher
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany.,Institute for Biological and Medical Informatics, University of Tübingen, Tübingen 72076, Germany.,Institute for Translational Bioinformatics, University Hospital Tübingen, Tübingen 72076, Germany
| | - Hannes Röst
- Donnelly Center for Biomolecular Research, University of Toronto, Toronto, Ontario ON M5S 3E1, Canada
| |
Collapse
|
61
|
Forensic proteomics. Forensic Sci Int Genet 2021; 54:102529. [PMID: 34139528 DOI: 10.1016/j.fsigen.2021.102529] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 05/06/2021] [Accepted: 05/07/2021] [Indexed: 12/19/2022]
Abstract
Protein is a major component of all biological evidence, often the matrix that embeds other biomolecules such as polynucleotides, lipids, carbohydrates, and small molecules. The proteins in a sample reflect the transcriptional and translational program of the originating cell types. Because of this, proteins can be used to identify body fluids and tissues, as well as convey genetic information in the form of single amino acid polymorphisms, the result of non-synonymous SNPs. This review explores the application and potential of forensic proteomics. The historical role that protein analysis played in the development of forensic science is examined. This review details how innovations in proteomic mass spectrometry have addressed many of the historical limitations of forensic protein science, and how the application of forensic proteomics differs from proteomics in the life sciences. Two more developed applications of forensic proteomics are examined in detail: body fluid and tissue identification, and proteomic genotyping. The review then highlights developing areas of proteomics that have the potential to impact forensic science in the near future: fingermark analysis, species identification, peptide toxicology, proteomic sex estimation, and estimation of post-mortem intervals. Finally, the review highlights some of the newer innovations in proteomics that may drive further development of the field. In addition to potential impact, this review also attempts to evaluate the stage of each application in the development, validation and implementation process. This review is targeted at investigators who are interested in learning about proteomics in a forensic context and expanding the amount of information they can extract from biological evidence.
Collapse
|
62
|
Wilburn DB, Richards AL, Swaney DL, Searle BC. CIDer: A Statistical Framework for Interpreting Differences in CID and HCD Fragmentation. J Proteome Res 2021; 20:1951-1965. [PMID: 33729787 DOI: 10.1021/acs.jproteome.0c00964] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Library searching is a powerful technique for detecting peptides using either data independent or data dependent acquisition. While both large-scale spectrum library curators and deep learning prediction approaches have focused on beam-type CID fragmentation (HCD), resonance CID fragmentation remains a popular technique. Here we demonstrate an approach to model the differences between HCD and CID spectra, and present a software tool, CIDer, for converting libraries between the two fragmentation methods. We demonstrate that just using a combination of simple linear models and basic principles of peptide fragmentation, we can explain up to 43% of the variation between ions fragmented by HCD and CID across an array of collision energy settings. We further show that in some circumstances, searching converted CID libraries can detect more peptides than searching existing CID libraries or libraries of machine learning predictions from FASTA databases. These results suggest that leveraging information in existing libraries by converting between HCD and CID libraries may be an effective interim solution while large-scale CID libraries are being developed.
Collapse
Affiliation(s)
- Damien B Wilburn
- Institute for Systems Biology, Seattle, Washington 98109, United States.,Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Alicia L Richards
- Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California 94158, United States.,J. David Gladstone Institutes, San Francisco, California 94158, United States.,Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, California 94158, United States
| | - Danielle L Swaney
- Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California 94158, United States.,J. David Gladstone Institutes, San Francisco, California 94158, United States.,Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, California 94158, United States
| | - Brian C Searle
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
63
|
DU Z, SHAO W, QIN W. [Research progress and application of retention time prediction method based on deep learning]. Se Pu 2021; 39:211-218. [PMID: 34227303 PMCID: PMC9403805 DOI: 10.3724/sp.j.1123.2020.08015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Indexed: 11/25/2022] Open
Abstract
In "shotgun" proteomics strategy, the proteome is explained by analyzing tryptic digested peptides using liquid chromatography-mass spectrometry. In this strategy, the retention time of peptides in liquid chromatography separation can be predicted based on the peptide sequence. This is a useful feature for peptide identification. Therefore, the prediction of the retention time has attracted much research attention. Traditional methods calculate the physical and chemical properties of the peptides based on their amino acid sequence to obtain the retention time under certain chromatography conditions; however, these methods cannot be directly adopted for other chromatography conditions, nor can they be used across laboratories or instrument platforms. To solve this problem, in recent years, deep learning was introduced to proteomics research for retention time prediction. Deep learning is an advanced machine-learning method that has extraordinary capability to learn complex relationships from large-scale data. By stacking multiple hidden neural networks, deep learning can ingest raw data without manually designed features. Transfer learning is an important method in deep learning. It improves the learning process a new task through the transfer of knowledge from an already-learned related task. Transfer learning allows models trained using large datasets to be utilized across conditions by fine-tuning on smaller datasets, instead of retraining the whole model. Many retention time prediction methods have been developed. In the process of training the model, the sequences of peptides are encoded to represent peptide information. Deep learning considers the relationship between the characteristics of the peptides and their corresponding retention times without the need for manual input of the physical and chemical properties of the peptides. Compared with traditional methods, deep learning methods have higher accuracy and can be easily used under different chromatography conditions by transfer learning. If there are not enough datasets to train a new model, a trained model from other datasets can be used as a replacement after calibration with small datasets obtained from these chromatography conditions. While the retention times of modified peptides can also be predicted, the predictions are inadequate for complex modifications such as glycosylation, and this is one of the main problems to be solved. The predicted retention times were used to control the quality of peptide identification. With high accuracy, the predicted retention times can be considered as actual retention times. Therefore, the difference between predicted and observed retention times can serve as an effective and unbiased quantitative metric for evaluating the quality of peptide-spectrum matches (PSMs) reported using different peptide identification methods. Combined with fragment ion intensity prediction, retention time prediction is used to generate spectral libraries for data-independent acquisition (DIA)-based mass spectrometry analysis. Generally, DIA methods identify peptides using specific spectrum libraries obtained from data-dependent acquisition (DDA) experiments. As a result, only peptides detected in the DDA experiments can be present in the libraries and detected in DIA. Furthermore, it takes a lot of time and effort to build libraries from DDA experiments, and typically, they cannot be adopted across different laboratories or instrument platforms. In contrast, the pseudo spectral libraries generated by retention times and fragment ion intensity prediction can overcome these shortcomings. The pseudo spectral libraries generate theoretical spectra of all possible peptides without the need for DDA experiments. This paper reviews the research progress of deep learning methods in the prediction of retention time and in related applications in order to provide references for retention time prediction and protein identification. At the same time, the development direction and application trend of retention time prediction methods based on deep learning are discussed.
Collapse
|
64
|
Willems P, Fels U, Staes A, Gevaert K, Van Damme P. Use of Hybrid Data-Dependent and -Independent Acquisition Spectral Libraries Empowers Dual-Proteome Profiling. J Proteome Res 2021; 20:1165-1177. [PMID: 33467856 PMCID: PMC7871992 DOI: 10.1021/acs.jproteome.0c00350] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Indexed: 01/01/2023]
Abstract
In the context of bacterial infections, it is imperative that physiological responses can be studied in an integrated manner, meaning a simultaneous analysis of both the host and the pathogen responses. To improve the sensitivity of detection, data-independent acquisition (DIA)-based proteomics was found to outperform data-dependent acquisition (DDA) workflows in identifying and quantifying low-abundant proteins. Here, by making use of representative bacterial pathogen/host proteome samples, we report an optimized hybrid library generation workflow for DIA mass spectrometry relying on the use of data-dependent and in silico-predicted spectral libraries. When compared to searching DDA experiment-specific libraries only, the use of hybrid libraries significantly improved peptide detection to an extent suggesting that infection-relevant host-pathogen conditions could be profiled in sufficient depth without the need of a priori bacterial pathogen enrichment when studying the bacterial proteome. Proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD017904 and PXD017945.
Collapse
Affiliation(s)
- Patrick Willems
- Department
of Biochemistry and Microbiology, Ghent
University, Ghent 9000, Belgium
- Department
of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9000, Belgium
- VIB-UGent
Center for Plant Systems Biology, Ghent 9052, Belgium
| | - Ursula Fels
- Department
of Biochemistry and Microbiology, Ghent
University, Ghent 9000, Belgium
- VIB-UGent
Center for Medical Biotechnology, Ghent 9052, Belgium
| | - An Staes
- VIB-UGent
Center for Medical Biotechnology, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9000, Belgium
| | - Kris Gevaert
- VIB-UGent
Center for Medical Biotechnology, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9000, Belgium
| | - Petra Van Damme
- Department
of Biochemistry and Microbiology, Ghent
University, Ghent 9000, Belgium
| |
Collapse
|
65
|
Li C, Gao M, Yang W, Zhong C, Yu R. Diamond: A Multi-Modal DIA Mass Spectrometry Data Processing Pipeline. Bioinformatics 2021; 37:265-267. [PMID: 33416868 DOI: 10.1093/bioinformatics/btaa1093] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 11/24/2020] [Accepted: 12/23/2020] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION We developed Diamond, a Nextflow-based, containerized, multi-modal data-independent acquisition (DIA) mass spectrometry (MS) data processing pipeline for peptide identification and quantification. Diamond integrated two mainstream workflows for DIA data analysis, namely, spectrum-centric scoring (SCS) and peptide-centric scoring (PCS), for use cases both with and without assay libraries. This multi-modal pipeline serves as a versatile, easy-to-use, and easily extendable toolbox for large-scale DIA data processing. AVAILABILITY The Docker image is available at https://hub.docker.com/r/zeroli/diamond and the source codes are freely accessible at https://github.com/xmuyulab/Diamond.
Collapse
Affiliation(s)
- Chenxin Li
- School of Informatics, Xiamen University, China
| | | | | | | | - Rongshan Yu
- School of Informatics, Xiamen University, China.,Aginome Scientific, Xiamen, China
| |
Collapse
|
66
|
Hackett WE, Zaia J. Calculating Glycoprotein Similarities From Mass Spectrometric Data. Mol Cell Proteomics 2021; 20:100028. [PMID: 32883803 PMCID: PMC8724611 DOI: 10.1074/mcp.r120.002223] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 08/24/2020] [Accepted: 09/03/2020] [Indexed: 12/23/2022] Open
Abstract
Complex protein glycosylation occurs through biosynthetic steps in the secretory pathway that create macro- and microheterogeneity of structure and function. Required for all life forms, glycosylation diversifies and adapts protein interactions with binding partners that underpin interactions at cell surfaces and pericellular and extracellular environments. Because these biological effects arise from heterogeneity of structure and function, it is necessary to measure their changes as part of the quest to understand nature. Quite often, however, the assumption behind proteomics that posttranslational modifications are discrete additions that can be modeled using the genome as a template does not apply to protein glycosylation. Rather, it is necessary to quantify the glycosylation distribution at each glycosite and to aggregate this information into a population of mature glycoproteins that exist in a given biological system. To date, mass spectrometric methods for assigning singly glycosylated peptides are well-established. But it is necessary to quantify glycosylation heterogeneity accurately in order to gauge the alterations that occur during biological processes. The task is to quantify the glycosylated peptide forms as accurately as possible and then apply appropriate bioinformatics algorithms to the calculation of micro- and macro-similarities. In this review, we summarize current approaches for protein quantification as they apply to this glycoprotein similarity problem.
Collapse
Affiliation(s)
- William E Hackett
- Bioinformatics Program, Boston University, Boston, Massachusetts, USA
| | - Joseph Zaia
- Bioinformatics Program, Boston University, Boston, Massachusetts, USA; Department of Biochemistry, Boston University, Boston, Massachusetts, USA.
| |
Collapse
|
67
|
Abstract
Data-independent acquisition (DIA) for liquid chromatography tandem mass spectrometry (LC-MS/MS) can improve the depth and reproducibility of the acquired proteomics datasets. DIA solves some limitations of the conventional data-dependent acquisition (DDA) strategy, for example, bias in intensity-dependent precursor selection and limited dynamic range. These advantages, together with the recent developments in speed, sensitivity, and resolution in MS technology, position DIA as a great alternative to DDA. Recently, we demonstrated that the benefits of DIA are extendable to phosphoproteomics workflows, enabling increased depth, sensitivity, and reproducibility of our analysis of phosphopeptide-enriched samples. However, computational data analysis of phospho-DIA samples have some specific challenges and requirements to the software and downstream processing workflows. A step-by-step guide to analyze phospho-DIA raw data using either spectral libraries or directDIA in Spectronaut is presented here. Furthermore, a straightforward protocol to perform differential phosphorylation site analysis using the output results from Spectronaut is described.
Collapse
|
68
|
Li KW, Gonzalez-Lozano MA, Koopmans F, Smit AB. Recent Developments in Data Independent Acquisition (DIA) Mass Spectrometry: Application of Quantitative Analysis of the Brain Proteome. Front Mol Neurosci 2020; 13:564446. [PMID: 33424549 PMCID: PMC7793698 DOI: 10.3389/fnmol.2020.564446] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 12/02/2020] [Indexed: 12/13/2022] Open
Abstract
Mass spectrometry is the driving force behind current brain proteome analysis. In a typical proteomics approach, a protein isolate is digested into tryptic peptides and then analyzed by liquid chromatography–mass spectrometry. The recent advancements in data independent acquisition (DIA) mass spectrometry provide higher sensitivity and protein coverage than the classic data dependent acquisition. DIA cycles through a pre-defined set of peptide precursor isolation windows stepping through 400–1,200 m/z across the whole liquid chromatography gradient. All peptides within an isolation window are fragmented simultaneously and detected by tandem mass spectrometry. Peptides are identified by matching the ion peaks in a mass spectrum to a spectral library that contains information of the peptide fragment ions' pattern and its chromatography elution time. Currently, there are several reports on DIA in brain research, in particular the quantitative analysis of cellular and synaptic proteomes to reveal the spatial and/or temporal changes of proteins that underlie neuronal plasticity and disease mechanisms. Protocols in DIA are continuously improving in both acquisition and data analysis. The depth of analysis is currently approaching proteome-wide coverage, while maintaining high reproducibility in a stable and standardisable MS environment. DIA can be positioned as the method of choice for routine proteome analysis in basic brain research and clinical applications.
Collapse
Affiliation(s)
- Ka Wan Li
- Department of Molecular and Cellular Neurobiology, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Miguel A Gonzalez-Lozano
- Department of Molecular and Cellular Neurobiology, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Frank Koopmans
- Department of Molecular and Cellular Neurobiology, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - August B Smit
- Department of Molecular and Cellular Neurobiology, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| |
Collapse
|
69
|
Vaca Jacome AS, Peckner R, Shulman N, Krug K, DeRuff KC, Officer A, Christianson KE, MacLean B, MacCoss MJ, Carr SA, Jaffe JD. Avant-garde: an automated data-driven DIA data curation tool. Nat Methods 2020; 17:1237-1244. [PMID: 33199889 PMCID: PMC7723322 DOI: 10.1038/s41592-020-00986-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 09/25/2020] [Indexed: 12/03/2022]
Abstract
Several challenges remain in data-independent acquisition (DIA) data analysis, such as to confidently identify peptides, define integration boundaries, remove interferences, and control false discovery rates. In practice, a visual inspection of the signals is still required, which is impractical with large datasets. We present Avant-garde as a tool to refine DIA (and parallel reaction monitoring) data. Avant-garde uses a novel data-driven scoring strategy: signals are refined by learning from the dataset itself, using all measurements in all samples to achieve the best optimization. We evaluate the performance of Avant-garde using benchmark DIA datasets and show that it can determine the quantitative suitability of a peptide peak, and reach the same levels of selectivity, accuracy, and reproducibility as manual validation. Avant-garde is complementary to existing DIA analysis engines and aims to establish a strong foundation for subsequent analysis of quantitative mass spectrometry data.
Collapse
Affiliation(s)
| | - Ryan Peckner
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cogen Therapeutics, Cambridge, MA, USA
| | | | - Karsten Krug
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Adam Officer
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | | | - Steven A Carr
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jacob D Jaffe
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Inzen Therapeutics, Cambridge, MA, USA.
- Inzen Therapeutics, Cambridge, MA, USA.
| |
Collapse
|
70
|
Watanabe E, Kawashima Y, Suda W, Kakihara T, Takazawa S, Nakajima D, Nakamura R, Nishi A, Suzuki K, Ohara O, Fujishiro J. Discovery of Candidate Stool Biomarker Proteins for Biliary Atresia Using Proteome Analysis by Data-Independent Acquisition Mass Spectrometry. Proteomes 2020; 8:proteomes8040036. [PMID: 33260872 PMCID: PMC7709124 DOI: 10.3390/proteomes8040036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 11/25/2020] [Indexed: 11/16/2022] Open
Abstract
Biliary atresia (BA) is a destructive inflammatory obliterative cholangiopathy of the neonate that affects various parts of the bile duct. If early diagnosis followed by Kasai portoenterostomy is not performed, progressive liver cirrhosis frequently leads to liver transplantation in the early stage of life. Therefore, prompt diagnosis is necessary for the rescue of BA patients. However, the prompt diagnosis of BA remains challenging because specific and reliable biomarkers for BA are currently unavailable. In this study, we discovered potential biomarkers for BA using deep proteome analysis by data-independent acquisition mass spectrometry (DIA–MS). Four patients with BA and three patients with neonatal cholestasis of other etiologies (non-BA) were recruited for stool proteome analysis. Among the 2110 host-derived proteins detected in their stools, 49 proteins were significantly higher in patients with BA and 54 proteins were significantly lower. These varying stool protein levels in infants with BA can provide potential biomarkers for BA. As demonstrated in this study, the deep proteome analysis of stools has great potential not only in detecting new stool biomarkers for BA but also in elucidating the pathophysiology of BA and other pediatric diseases, especially in the field of pediatric gastroenterology.
Collapse
Affiliation(s)
- Eiichiro Watanabe
- Department of Pediatric Surgery, Faculty of Medicine, The University of Tokyo, Tokyo 113-8655, Japan; (E.W.); (T.K.); (K.S.)
| | - Yusuke Kawashima
- Department of Applied Genomics, Kazusa DNA Research Institute, Kisarazu 292-0818, Japan; (Y.K.); (D.N.); (R.N.); (O.O.)
| | - Wataru Suda
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan;
| | - Tomo Kakihara
- Department of Pediatric Surgery, Faculty of Medicine, The University of Tokyo, Tokyo 113-8655, Japan; (E.W.); (T.K.); (K.S.)
| | - Shinya Takazawa
- Department of Surgery, Gunma Children’s Medical Center, Shibukawa 277-8577, Japan; (S.T.); (A.N.)
| | - Daisuke Nakajima
- Department of Applied Genomics, Kazusa DNA Research Institute, Kisarazu 292-0818, Japan; (Y.K.); (D.N.); (R.N.); (O.O.)
| | - Ren Nakamura
- Department of Applied Genomics, Kazusa DNA Research Institute, Kisarazu 292-0818, Japan; (Y.K.); (D.N.); (R.N.); (O.O.)
| | - Akira Nishi
- Department of Surgery, Gunma Children’s Medical Center, Shibukawa 277-8577, Japan; (S.T.); (A.N.)
| | - Kan Suzuki
- Department of Pediatric Surgery, Faculty of Medicine, The University of Tokyo, Tokyo 113-8655, Japan; (E.W.); (T.K.); (K.S.)
| | - Osamu Ohara
- Department of Applied Genomics, Kazusa DNA Research Institute, Kisarazu 292-0818, Japan; (Y.K.); (D.N.); (R.N.); (O.O.)
| | - Jun Fujishiro
- Department of Pediatric Surgery, Faculty of Medicine, The University of Tokyo, Tokyo 113-8655, Japan; (E.W.); (T.K.); (K.S.)
- Correspondence: ; Tel.: +81-3-5800-8671; Fax: +81-3-5800-5104
| |
Collapse
|
71
|
Gertz ML, Chin CR, Tomoiaga D, MacKay M, Chang C, Butler D, Afshinnekoo E, Bezdan D, Schmidt MA, Mozsary C, Melnick A, Garrett-Bakelman F, Crucian B, Lee SMC, Zwart SR, Smith SM, Meydan C, Mason CE. Multi-omic, Single-Cell, and Biochemical Profiles of Astronauts Guide Pharmacological Strategies for Returning to Gravity. Cell Rep 2020; 33:108429. [PMID: 33242408 PMCID: PMC9444344 DOI: 10.1016/j.celrep.2020.108429] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/07/2020] [Accepted: 11/03/2020] [Indexed: 12/29/2022] Open
Abstract
The National Aeronautics and Space Administration (NASA) Twins Study created an integrative molecular profile of an astronaut during NASA’s first 1-year mission on the International Space Station (ISS) and included comparisons to an identical Earth-bound twin. The unique biochemical profiles observed when landing on Earth after such a long mission (e.g., spikes in interleukin-1 [IL-1]/6/10, c-reactive protein [CRP], C-C motif chemokine ligand 2 [CCL2], IL-1 receptor antagonist [IL-1ra], and tumor necrosis factor alpha [TNF-α]) opened new questions about the human body’s response to gravity and how to plan for future astronauts, particularly around initiation or resolution of inflammation. Here, single-cell, multi-omic (100-plex epitope profile and gene expression) profiling of peripheral blood mononuclear cells (PBMCs) showed changes to blood cell composition and gene expression post-flight, specifically for monocytes and dendritic cell precursors. These were consistent with flight-induced cytokine and immune system stress, followed by skeletal muscle regeneration in response to gravity. Finally, we examined these profiles relative to 6-month missions in 28 other astronauts and detail potential pharmacological interventions for returning to gravity in future missions. Gertz et al. present a re-analysis of the landing data from the NASA Twins Study, suggesting that the biochemical signature reflects muscle regeneration after atrophy rather than a detrimental inflammatory response. This is mediated through muscle-derived IL-6 anti-inflammatory cascades. Single-cell analysis supports this role. Potential pharmacological interventions are also discussed.
Collapse
Affiliation(s)
- Monica L Gertz
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA; Interdisciplinary Program in Neuroscience, George Mason University, Fairfax, VA 22030, USA
| | - Christopher R Chin
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA
| | - Delia Tomoiaga
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA
| | - Matthew MacKay
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA; The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY 10065, USA; Becton Dickinson & Co., Washington, DC 20001
| | | | - Daniel Butler
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA
| | - Ebrahim Afshinnekoo
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA; The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY 10065, USA; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Daniela Bezdan
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA; Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital, Tübingen 72076, Germany
| | - Michael A Schmidt
- Advanced Pattern Analysis and Countermeasures Group, Boulder, CO 80302, USA; Sovaris Aerospace, Boulder, CO 80302, USA
| | - Christopher Mozsary
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA
| | - Ari Melnick
- Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Francine Garrett-Bakelman
- Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Department of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA; University of Virginia Cancer Center, Charlottesville, VA 22908, USA
| | - Brian Crucian
- Human Health and Performance Directorate, NASA Johnson Space Center, Houston, TX 77058, USA
| | | | - Sara R Zwart
- Department of Preventive Medicine and Population Health, University of Texas Medical Branch, Galveston, TX 77555, USA
| | - Scott M Smith
- Human Health and Performance Directorate, NASA Johnson Space Center, Houston, TX 77058, USA
| | - Cem Meydan
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA; The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY 10065, USA; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA; The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY 10065, USA; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA; The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY 10065, USA.
| |
Collapse
|
72
|
Mun DG, Renuse S, Saraswat M, Madugundu A, Udainiya S, Kim H, Park SKR, Zhao H, Nirujogi RS, Na CH, Kannan N, Yates JR, Lee SW, Pandey A. PASS-DIA: A Data-Independent Acquisition Approach for Discovery Studies. Anal Chem 2020; 92:14466-14475. [PMID: 33079518 DOI: 10.1021/acs.analchem.0c02513] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
A data-independent acquisition (DIA) approach is being increasingly adopted as a promising strategy for identification and quantitation of proteomes. As most DIA data sets are acquired with wide isolation windows, highly complex MS/MS spectra are generated, which negatively impacts obtaining peptide information through classical protein database searches. Therefore, the analysis of DIA data mainly relies on the evidence of the existence of peptides from prebuilt spectral libraries. Consequently, one major weakness of this method is that it does not account for peptides that are not included in the spectral library, precluding the use of DIA for discovery studies. Here, we present a strategy termed Precursor ion And Small Slice-DIA (PASS-DIA) in which MS/MS spectra are acquired with small isolation windows (slices) and MS/MS spectra are interpreted with accurately determined precursor ion masses. This method enables the direct application of conventional spectrum-centric analysis pipelines for peptide identification and precursor ion-based quantitation. The performance of PASS-DIA was observed to be superior to both data-dependent acquisition (DDA) and conventional DIA experiments with 69 and 48% additional protein identifications, respectively. Application of PASS-DIA for the analysis of post-translationally modified peptides again highlighted its superior performance in characterizing phosphopeptides (77% more), N-terminal acetylated peptides (56% more), and N-glycopeptides (83% more) as compared to DDA alone. Finally, the use of PASS-DIA to characterize a rare proteome of human fallopian tube organoids enabled 34% additional protein identifications than DDA alone and revealed biologically relevant pathways including low abundance proteins. Overall, PASS-DIA is a novel DIA approach for use as a discovery tool that outperforms both conventional DDA and DIA experiments to provide additional protein information. We believe that the PASS-DIA method is an important strategy for discovery-type studies when deeper proteome characterization is required.
Collapse
Affiliation(s)
- Dong-Gi Mun
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota 55905, United States
| | - Santosh Renuse
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota 55905, United States.,Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, United States
| | - Mayank Saraswat
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota 55905, United States.,Center for Molecular Medicine, National Institute of Mental Health and Neurosciences (NIMHANS), Hosur Road, Bangalore 560029, India.,Institute of Bioinformatics, International Technology Park, Bangalore 560066, Karnataka, India.,Manipal Academy of Higher Education (MAHE), Manipal 576104 Karnataka, India
| | - Anil Madugundu
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota 55905, United States.,Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, United States.,Center for Molecular Medicine, National Institute of Mental Health and Neurosciences (NIMHANS), Hosur Road, Bangalore 560029, India.,Institute of Bioinformatics, International Technology Park, Bangalore 560066, Karnataka, India.,Manipal Academy of Higher Education (MAHE), Manipal 576104 Karnataka, India
| | - Savita Udainiya
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota 55905, United States.,Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, United States.,Center for Molecular Medicine, National Institute of Mental Health and Neurosciences (NIMHANS), Hosur Road, Bangalore 560029, India
| | - Hokeun Kim
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Sung-Kyu Robin Park
- Department of Molecular Medicine and Neurobiology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Hui Zhao
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota 55905, United States
| | - Raja Sekhar Nirujogi
- Medical Research Council Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Chan Hyun Na
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, United States.,Neurology, Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, United States
| | - Nagarajan Kannan
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota 55905, United States
| | - John R Yates
- Department of Molecular Medicine and Neurobiology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Sang-Won Lee
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Akhilesh Pandey
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota 55905, United States.,Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, United States.,Center for Molecular Medicine, National Institute of Mental Health and Neurosciences (NIMHANS), Hosur Road, Bangalore 560029, India.,Institute of Bioinformatics, International Technology Park, Bangalore 560066, Karnataka, India.,Manipal Academy of Higher Education (MAHE), Manipal 576104 Karnataka, India
| |
Collapse
|
73
|
Krasny L, Huang PH. Data-independent acquisition mass spectrometry (DIA-MS) for proteomic applications in oncology. Mol Omics 2020; 17:29-42. [PMID: 33034323 DOI: 10.1039/d0mo00072h] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Data-independent acquisition mass spectrometry (DIA-MS) is a next generation proteomic methodology that generates permanent digital proteome maps offering highly reproducible retrospective analysis of cellular and tissue specimens. The adoption of this technology has ushered a new wave of oncology studies across a wide range of applications including its use in molecular classification, oncogenic pathway analysis, drug and biomarker discovery and unravelling mechanisms of therapy response and resistance. In this review, we provide an overview of the experimental workflows commonly used in DIA-MS, including its current strengths and limitations versus conventional data-dependent acquisition mass spectrometry (DDA-MS). We further summarise a number of key studies to illustrate the power of this technology when applied to different facets of oncology. Finally we offer a perspective of the latest innovations in DIA-MS technology and machine learning-based algorithms necessary for driving the development of high-throughput, in-depth and reproducible proteomic assays that are compatible with clinical diagnostic workflows, which will ultimately enable the delivery of precision cancer medicine to achieve optimal patient outcomes.
Collapse
Affiliation(s)
- Lukas Krasny
- Division of Molecular Pathology, The Institute of Cancer Research, 237 Fulham Road, London, SW3 6JB, UK.
| | | |
Collapse
|
74
|
Cai X, Ge W, Yi X, Sun R, Zhu J, Lu C, Sun P, Zhu T, Ruan G, Yuan C, Liang S, Lyu M, Huang S, Zhu Y, Guo T. PulseDIA: Data-Independent Acquisition Mass Spectrometry Using Multi-Injection Pulsed Gas-Phase Fractionation. J Proteome Res 2020; 20:279-288. [PMID: 32975123 DOI: 10.1021/acs.jproteome.0c00381] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The performance of data-independent acquisition (DIA) mass spectrometry (MS) depends on the separation efficiency of peptide precursors. In Orbitrap-based mass spectrometers, separation efficiency of peptide precursors is limited by the relatively slow scanning rate compared to time of flight (TOF)-based MS. Here, we present PulseDIA, a multi-injection gas-phase fractionation (GPF) strategy for enhanced DIA-MS. This is achieved by equally dividing the conventional DIA analysis covering the entire mass range into multiple injections for DIA analyses with complementary windows. Using mouse liver digests, the PulseDIA method identified up to 50% more peptides and 29% more protein groups than that by conventional DIA with the same length of effective gradient time. Compared to conventional multi-injection GPF, PusleDIA exhibited higher flexibility and identified up to 18% more peptides and 8% more protein groups using two injections. The gain of peptides per effective time unit was the highest in PulseDIA compared to conventional DIA and GPF. We further applied the PulseDIA method to profile the proteome of 18 human tissue samples (benign and malignant) from nine cholangiocarcinoma (CCA) patients. PulseDIA identified 7796 protein groups in these CCA samples, with a 14% increase of protein group identification compared to the conventional DIA method. The missing value for protein matrix dropped by 7% using PulseDIA compared to DIA. A total of 681 significantly altered proteins were detected in CCA samples using PulseDIA, including several dysregulated proteins, which were absent in the conventional DIA analysis. Taken together, we present PulseDIA as an enhanced DIA-MS method with improved sensitivity and reproducibility.
Collapse
Affiliation(s)
- Xue Cai
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Weigang Ge
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Xiao Yi
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Rui Sun
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Jiang Zhu
- Center for Stem Cell Research and Application, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, Hubei, China
| | - Cong Lu
- Center for Stem Cell Research and Application, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, Hubei, China
| | - Ping Sun
- Department of Hepatobiliary Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, Hubei, China
| | - Tiansheng Zhu
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Guan Ruan
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Chunhui Yuan
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Shuang Liang
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Mengge Lyu
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Shiang Huang
- Center for Stem Cell Research and Application, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, Hubei, China
| | - Yi Zhu
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Tiannan Guo
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| |
Collapse
|
75
|
Genetics meets proteomics: perspectives for large population-based studies. Nat Rev Genet 2020; 22:19-37. [PMID: 32860016 DOI: 10.1038/s41576-020-0268-2] [Citation(s) in RCA: 183] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/14/2020] [Indexed: 12/22/2022]
Abstract
Proteomic analysis of cells, tissues and body fluids has generated valuable insights into the complex processes influencing human biology. Proteins represent intermediate phenotypes for disease and provide insight into how genetic and non-genetic risk factors are mechanistically linked to clinical outcomes. Associations between protein levels and DNA sequence variants that colocalize with risk alleles for common diseases can expose disease-associated pathways, revealing novel drug targets and translational biomarkers. However, genome-wide, population-scale analyses of proteomic data are only now emerging. Here, we review current findings from studies of the plasma proteome and discuss their potential for advancing biomedical translation through the interpretation of genome-wide association analyses. We highlight the challenges faced by currently available technologies and provide perspectives relevant to their future application in large-scale biobank studies.
Collapse
|
76
|
Remes PM, Yip P, MacCoss MJ. Highly Multiplex Targeted Proteomics Enabled by Real-Time Chromatographic Alignment. Anal Chem 2020; 92:11809-11817. [PMID: 32867497 DOI: 10.1021/acs.analchem.0c02075] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Targeted mass spectrometry methods produce high-quality quantitative data in terms of limits of detection and dynamic range, at the cost of a substantial compromise in throughput compared to methods such as data independent and data dependent acquisition. The logistical and experimental issues inherent to maintaining assays of even several hundred targets are significant. Prominent among these issues is the drift in analyte retention time as liquid chromatography (LC) columns wear, forcing targeted scheduling windows to be much larger than LC peak widths. If these problems could be solved, proteomics assays would be capable of targeting thousands of peptides in an hour-long experiment, enabling large cohort studies to be performed without sacrificing sensitivity and specificity. We describe a solution in the form of a new method for real-time chromatographic alignment and demonstrate its application to a 56 min LC-gradient HeLa digest assay with 1489 targets. The method is based on the periodic acquisition of untargeted survey scans in a reference experiment and alignment to those scans during subsequent experiments. We describe how the method enables narrower scheduled retention time windows to be used. The narrower scheduling windows enables more targets to be included in the assay or proportionally more time to be allocated to each target, improving the sensitivity. Finally, we point out how the procedure could be improved and how much additional target multiplexing could be gained in the future.
Collapse
Affiliation(s)
- Philip M Remes
- Thermo Fisher Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Ping Yip
- Thermo Fisher Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, 3720 15th Street NE, Seattle, Washington 98195, United States
| |
Collapse
|
77
|
Fernández-Costa C, Martínez-Bartolomé S, McClatchy DB, Saviola AJ, Yu NK, Yates JR. Impact of the Identification Strategy on the Reproducibility of the DDA and DIA Results. J Proteome Res 2020; 19:3153-3161. [PMID: 32510229 PMCID: PMC7898222 DOI: 10.1021/acs.jproteome.0c00153] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Data-independent acquisition (DIA) is a promising technique for the proteomic analysis of complex protein samples. A number of studies have claimed that DIA experiments are more reproducible than data-dependent acquisition (DDA), but these claims are unsubstantiated since different data analysis methods are used in the two methods. Data analysis in most DIA workflows depends on spectral library searches, whereas DDA typically employs sequence database searches. In this study, we examined the reproducibility of the DIA and DDA results using both sequence database and spectral library search. The comparison was first performed using a cell lysate and then extended to an interactome study. Protein overlap among the technical replicates in both DDA and DIA experiments was 30% higher with library-based identifications than with sequence database identifications. The reproducibility of quantification was also improved with library search compared to database search, with the mean of the coefficient of variation decreasing more than 30% and a reduction in the number of missing values of more than 35%. Our results show that regardless of the acquisition method, higher identification and quantification reproducibility is observed when library search was used.
Collapse
Affiliation(s)
- Carolina Fernández-Costa
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | | | - Daniel B. McClatchy
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Anthony J. Saviola
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nam-Kyung Yu
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | - John R. Yates
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
78
|
Nakajima D, Kawashima Y, Shibata H, Yasumi T, Isa M, Izawa K, Nishikomori R, Heike T, Ohara O. Simple and Sensitive Analysis for Dried Blood Spot Proteins by Sodium Carbonate Precipitation for Clinical Proteomics. J Proteome Res 2020; 19:2821-2827. [PMID: 32343581 DOI: 10.1021/acs.jproteome.0c00271] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Dried blood spots (DBS) are widely used for screening biomolecular profiles, including enzymatic activities. However, detection of minor proteins in DBS by liquid chromatography-mass spectrometry (LC-MS/MS) without pre-enrichment remains challenging because of the coexistence of large quantities of hydrophilic proteins. In this study, we address this problem by developing a simple method using sodium carbonate precipitation (SCP). SCP enriches hydrophobic proteins from DBS, allowing substantial removal of soluble proteins. In combination with SCP, we used quantitative LC-MS/MS proteome analysis in a data-independent acquisition mode (DIA) to enhance the sensitivity and quantification limits of proteome analysis. As a result, identification of 1977 proteins in DBS is possible, including 585 disease-related proteins listed in the Online Mendelian Inheritance in Man.
Collapse
Affiliation(s)
| | | | - Hirofumi Shibata
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto 606-8501, Japan
| | - Takahiro Yasumi
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto 606-8501, Japan
| | - Masahiko Isa
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto 606-8501, Japan
| | - Kazushi Izawa
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto 606-8501, Japan
| | - Ryuta Nishikomori
- Department of Pediatrics and Child Health, Kurume University School of Medicine, Kurume, Fukuoka 830-0111, Japan
| | - Toshio Heike
- Department of Pediatrics, Kyoto University Graduate School of Medicine, Kyoto 606-8501, Japan.,Hyogo Prefectural Amagasaki General Medical Center, Hyogo 660-8550, Japan
| | - Osamu Ohara
- Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| |
Collapse
|
79
|
Pino LK, Just SC, MacCoss MJ, Searle BC. Acquiring and Analyzing Data Independent Acquisition Proteomics Experiments without Spectrum Libraries. Mol Cell Proteomics 2020; 19:1088-1103. [PMID: 32312845 PMCID: PMC7338082 DOI: 10.1074/mcp.p119.001913] [Citation(s) in RCA: 141] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 04/14/2020] [Indexed: 11/06/2022] Open
Abstract
Data independent acquisition (DIA) is an attractive alternative to standard shotgun proteomics methods for quantitative experiments. However, most DIA methods require collecting exhaustive, sample-specific spectrum libraries with data dependent acquisition (DDA) to detect and quantify peptides. In addition to working with non-human samples, studies of splice junctions, sequence variants, or simply working with small sample yields can make developing DDA-based spectrum libraries impractical. Here we illustrate how to acquire, queue, and validate DIA data without spectrum libraries, and provide a workflow to efficiently generate DIA-only chromatogram libraries using gas-phase fractionation (GPF). We present best-practice methods for collecting DIA data using Orbitrap-based instruments and develop an understanding for why DIA using an Orbitrap mass spectrometer should be approached differently than when using time-of-flight instruments. Finally, we discuss several methods for analyzing DIA data without libraries.
Collapse
Affiliation(s)
- Lindsay K Pino
- Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Seth C Just
- Proteome Software, Inc. Portland, Oregon, USA
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Brian C Searle
- Institute for Systems Biology, Seattle, Washington, USA.
| |
Collapse
|
80
|
Noor Z, Mohamedali A, Ranganathan S. iSwathX 2.0 for Processing DDA Spectral Libraries for DIA Data Analysis. ACTA ACUST UNITED AC 2020; 70:e101. [PMID: 32478466 DOI: 10.1002/cpbi.101] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The iSwathX web application processes and normalizes mass spectrometry-based proteomics spectral libraries generated in the data-dependent acquisition (DDA) approach. These libraries are stored in various proteomics repositories such as PeptideAtlas and NIST, or are user-generated and provide reference data for data-independent acquisition (DIA) targeted data extraction and analysis. iSwathX 2.0 can efficiently normalize DDA data from different instruments, gathered at different instances, and make it compatible with specific DIA experiments. Novel functions for parallel processing of DDA libraries and DIA report files, along with various data visualizations, are available in iSwathX 2.0. The step-by-step protocols provided here describe how the libraries are uploaded, processed, visualized, and downloaded using various modules of the application. They also provide detailed guidelines on the use of DIA report files for data analysis and visualization. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Processing, combining, and visualizing two DDA libraries Basic Protocol 2: Parallel processing and combination of multiple DDA libraries Basic Protocol 3: Downstream processing, comparison, and visualization of DIA report files.
Collapse
Affiliation(s)
- Zainab Noor
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, Australia
| | - Abidali Mohamedali
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, Australia
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, Australia
| |
Collapse
|
81
|
Zhang F, Ge W, Ruan G, Cai X, Guo T. Data‐Independent Acquisition Mass Spectrometry‐Based Proteomics and Software Tools: A Glimpse in 2020. Proteomics 2020; 20:e1900276. [DOI: 10.1002/pmic.201900276] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 03/27/2020] [Indexed: 01/02/2023]
Affiliation(s)
- Fangfei Zhang
- Key Laboratory of Structural Biology of Zhejiang ProvinceSchool of Life SciencesWestlake University 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
- Institute of Basic Medical SciencesWestlake Institute for Advanced Study 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
| | - Weigang Ge
- Key Laboratory of Structural Biology of Zhejiang ProvinceSchool of Life SciencesWestlake University 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
- Institute of Basic Medical SciencesWestlake Institute for Advanced Study 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
| | - Guan Ruan
- Key Laboratory of Structural Biology of Zhejiang ProvinceSchool of Life SciencesWestlake University 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
- Institute of Basic Medical SciencesWestlake Institute for Advanced Study 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
| | - Xue Cai
- Key Laboratory of Structural Biology of Zhejiang ProvinceSchool of Life SciencesWestlake University 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
- Institute of Basic Medical SciencesWestlake Institute for Advanced Study 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
| | - Tiannan Guo
- Key Laboratory of Structural Biology of Zhejiang ProvinceSchool of Life SciencesWestlake University 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
- Institute of Basic Medical SciencesWestlake Institute for Advanced Study 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
| |
Collapse
|
82
|
Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis. Int J Mol Sci 2020; 21:ijms21082873. [PMID: 32326049 PMCID: PMC7216093 DOI: 10.3390/ijms21082873] [Citation(s) in RCA: 122] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 04/16/2020] [Accepted: 04/18/2020] [Indexed: 01/15/2023] Open
Abstract
Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the relationship between genotype and phenotype. Though many popular bioinformatics methods in proteomics are derived from other omics studies, novel analysis strategies are required to deal with the unique characteristics of proteomics data. In this review, we discuss the current developments in the bioinformatics methods used in proteomics and how they facilitate the mechanistic understanding of biological processes. We first introduce bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then we review the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies. We conclude with a discussion of how quantitative protein data can be used to reconstruct protein interactions and signaling networks.
Collapse
|
83
|
Zhong CQ, Wu J, Qiu X, Chen X, Xie C, Han J. Generation of a murine SWATH-MS spectral library to quantify more than 11,000 proteins. Sci Data 2020; 7:104. [PMID: 32218446 PMCID: PMC7099061 DOI: 10.1038/s41597-020-0449-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 03/06/2020] [Indexed: 12/16/2022] Open
Abstract
Targeted SWATH-MS data analysis is critically dependent on the spectral library. Comprehensive spectral libraries of human or several other organisms have been published, but the extensive spectral library for mouse, a widely used model organism is not available. Here, we present a large murine spectral library covering more than 11,000 proteins and 240,000 proteotypic peptides, which included proteins derived from 9 murine tissue samples and one murine L929 cell line. This resource supports the quantification of 67% of all murine proteins annotated by UniProtKB/Swiss-Prot. Furthermore, we applied the spectral library to SWATH-MS data from murine tissue samples. Data are available via SWATHAtlas (PASS01441). Measurement(s) | Mouse Protein • mass spectrum • spectral library | Technology Type(s) | mass spectrometry • combined ms-ms + spectral library search | Sample Characteristic - Organism | Mus musculus |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.11968230
Collapse
Affiliation(s)
- Chuan-Qi Zhong
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cellular Signaling Network, School of Life Sciences, Xiamen University, Xiamen, China.
| | - Jianfeng Wu
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cellular Signaling Network, School of Life Sciences, Xiamen University, Xiamen, China
| | - Xingfeng Qiu
- Department of Gastrointestinal Surgery, Zhongshan Hospital of Xiamen University, Xiamen, China
| | - Xi Chen
- Medical Research Institute, Wuhan University, Wuhan, China.,SpecAlly Life Technology Co., Ltd, Wuhan, China
| | - Changchuan Xie
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cellular Signaling Network, School of Life Sciences, Xiamen University, Xiamen, China
| | - Jiahuai Han
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cellular Signaling Network, School of Life Sciences, Xiamen University, Xiamen, China.
| |
Collapse
|
84
|
Searle BC, Swearingen KE, Barnes CA, Schmidt T, Gessulat S, Küster B, Wilhelm M. Generating high quality libraries for DIA MS with empirically corrected peptide predictions. Nat Commun 2020; 11:1548. [PMID: 32214105 PMCID: PMC7096433 DOI: 10.1038/s41467-020-15346-1] [Citation(s) in RCA: 131] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 02/28/2020] [Indexed: 11/09/2022] Open
Abstract
Data-independent acquisition approaches typically rely on experiment-specific spectrum libraries, requiring offline fractionation and tens to hundreds of injections. We demonstrate a library generation workflow that leverages fragmentation and retention time prediction to build libraries containing every peptide in a proteome, and then refines those libraries with empirical data. Our method specifically enables rapid, experiment-specific library generation for non-model organisms, which we demonstrate using the malaria parasite Plasmodium falciparum, and non-canonical databases, which we show by detecting missense variants in HeLa.
Collapse
Affiliation(s)
- Brian C Searle
- Institute for Systems Biology, Seattle, WA, USA.
- Proteome Software, Inc., Portland, OR, USA.
| | | | | | | | - Siegfried Gessulat
- Technical University of Munich, Freising, Germany
- SAP SE, Potsdam, Germany
| | - Bernhard Küster
- Technical University of Munich, Freising, Germany
- Bavarian Center for Biomolecular Mass Spectrometry, Freising, Germany
| | | |
Collapse
|
85
|
Pino LK, Searle BC, Yang HY, Hoofnagle AN, Noble WS, MacCoss MJ. Matrix-Matched Calibration Curves for Assessing Analytical Figures of Merit in Quantitative Proteomics. J Proteome Res 2020; 19:1147-1153. [PMID: 32037841 DOI: 10.1021/acs.jproteome.9b00666] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Mass spectrometry is a powerful tool for quantifying protein abundance in complex samples. Advances in sample preparation and the development of data-independent acquisition (DIA) mass spectrometry approaches have increased the number of peptides and proteins measured per sample. Here, we present a series of experiments demonstrating how to assess whether a peptide measurement is quantitative by mass spectrometry. Our results demonstrate that increasing the number of detected peptides in a proteomics experiment does not necessarily result in increased numbers of peptides that can be measured quantitatively.
Collapse
Affiliation(s)
- Lindsay K Pino
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Brian C Searle
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Han-Yin Yang
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Andrew N Hoofnagle
- Department of Laboratory Medicine, University of Washington, Seattle, Washington 98195, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States.,Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
86
|
Lou R, Tang P, Ding K, Li S, Tian C, Li Y, Zhao S, Zhang Y, Shui W. Hybrid Spectral Library Combining DIA-MS Data and a Targeted Virtual Library Substantially Deepens the Proteome Coverage. iScience 2020; 23:100903. [PMID: 32109675 PMCID: PMC7044796 DOI: 10.1016/j.isci.2020.100903] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 01/17/2020] [Accepted: 02/06/2020] [Indexed: 01/15/2023] Open
Abstract
Data-independent acquisition mass spectrometry (DIA-MS) is a powerful technique that enables relatively deep proteomic profiling with superior quantification reproducibility. DIA data mining predominantly relies on a spectral library of sufficient proteome coverage that, in most cases, is built on data-dependent acquisition-based analysis of the same sample. To expand the proteome coverage for a pre-determined protein family, we report herein on the construction of a hybrid spectral library that supplements a DIA experiment-derived library with a protein family-targeted virtual library predicted by deep learning. Leveraging this DIA hybrid library substantially deepens the coverage of three transmembrane protein families (G protein-coupled receptors, ion channels, and transporters) in mouse brain tissues with increases in protein identification of 37%–87% and peptide identification of 58%–161%. Moreover, of the 412 novel GPCR peptides exclusively identified with the DIA hybrid library strategy, 53.6% were validated as present in mouse brain tissues based on orthogonal experimental measurement. A virtual library is built for a selected protein family using deep learning models The hybrid library strategy vastly deepens the coverage for the targeted protein family About 53.6% of novel GPCR peptides identified with the DIA hybrid library are validated Extend the strategy to deep mapping of multiple transmembrane protein families
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Pan Tang
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kang Ding
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shanshan Li
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China
| | - Cuiping Tian
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China
| | - Yunxia Li
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 201210, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China.
| | - Yaoyang Zhang
- Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 201210, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China.
| |
Collapse
|
87
|
Van Puyvelde B, Willems S, Gabriels R, Daled S, De Clerck L, Vande Casteele S, Staes A, Impens F, Deforce D, Martens L, Degroeve S, Dhaenens M. Removing the Hidden Data Dependency of DIA with Predicted Spectral Libraries. Proteomics 2020; 20:e1900306. [PMID: 31981311 DOI: 10.1002/pmic.201900306] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 12/20/2019] [Indexed: 12/22/2022]
Abstract
Data-independent acquisition (DIA) generates comprehensive yet complex mass spectrometric data, which imposes the use of data-dependent acquisition (DDA) libraries for deep peptide-centric detection. Here, it is shown that DIA can be redeemed from this dependency by combining predicted fragment intensities and retention times with narrow window DIA. This eliminates variation in library building and omits stochastic sampling, finally making the DIA workflow fully deterministic. Especially for clinical proteomics, this has the potential to facilitate inter-laboratory comparison.
Collapse
Affiliation(s)
- Bart Van Puyvelde
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Sander Willems
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, 9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Simon Daled
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Laura De Clerck
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Sofie Vande Casteele
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - An Staes
- VIB-UGent Center for Medical Biotechnology, 9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium.,VIB Proteomics Core, 9000, Ghent, Belgium
| | - Francis Impens
- VIB-UGent Center for Medical Biotechnology, 9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium.,VIB Proteomics Core, 9000, Ghent, Belgium
| | - Dieter Deforce
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, 9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, 9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Maarten Dhaenens
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| |
Collapse
|
88
|
O’Donnell ST, Ross RP, Stanton C. The Progress of Multi-Omics Technologies: Determining Function in Lactic Acid Bacteria Using a Systems Level Approach. Front Microbiol 2020; 10:3084. [PMID: 32047482 PMCID: PMC6997344 DOI: 10.3389/fmicb.2019.03084] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 12/20/2019] [Indexed: 12/12/2022] Open
Abstract
Lactic Acid Bacteria (LAB) have long been recognized as having a significant impact ranging from commercial to health domains. A vast amount of research has been carried out on these microbes, deciphering many of the pathways and components responsible for these desirable effects. However, a large proportion of this functional information has been derived from a reductionist approach working with pure culture strains. This provides limited insight into understanding the impact of LAB within intricate systems such as the gut microbiome or multi strain starter cultures. Whole genome sequencing of strains and shotgun metagenomics of entire systems are powerful techniques that are currently widely used to decipher function in microbes, but they also have their limitations. An available genome or metagenome can provide an image of what a strain or microbiome, respectively, is potentially capable of and the functions that they may carry out. A top-down, multi-omics approach has the power to resolve the functional potential of an ecosystem into an image of what is being expressed, translated and produced. With this image, it is possible to see the real functions that members of a system are performing and allow more accurate and impactful predictions of the effects of these microorganisms. This review will discuss how technological advances have the potential to increase the yield of information from genomics, transcriptomics, proteomics and metabolomics. The potential for integrated omics to resolve the role of LAB in complex systems will also be assessed. Finally, the current software approaches for managing these omics data sets will be discussed.
Collapse
Affiliation(s)
- Shane Thomas O’Donnell
- Teagasc Food Research Centre, Moorepark, Fermoy, Ireland
- Department of Microbiology, University College Cork – National University of Ireland, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - R. Paul Ross
- Teagasc Food Research Centre, Moorepark, Fermoy, Ireland
- Department of Microbiology, University College Cork – National University of Ireland, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Catherine Stanton
- Teagasc Food Research Centre, Moorepark, Fermoy, Ireland
- APC Microbiome Ireland, Cork, Ireland
| |
Collapse
|
89
|
Demichev V, Messner CB, Vernardis SI, Lilley KS, Ralser M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat Methods 2020; 17:41-44. [PMID: 31768060 PMCID: PMC6949130 DOI: 10.1038/s41592-019-0638-x] [Citation(s) in RCA: 850] [Impact Index Per Article: 212.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 10/09/2019] [Indexed: 01/12/2023]
Abstract
We present an easy-to-use integrated software suite, DIA-NN, that exploits deep neural networks and new quantification and signal correction strategies for the processing of data-independent acquisition (DIA) proteomics experiments. DIA-NN improves the identification and quantification performance in conventional DIA proteomic applications, and is particularly beneficial for high-throughput applications, as it is fast and enables deep and confident proteome coverage when used in combination with fast chromatographic methods.
Collapse
Affiliation(s)
- Vadim Demichev
- Department of Biochemistry and The Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
- The Francis Crick Institute, Molecular Biology of Metabolism laboratory, London, UK
| | - Christoph B Messner
- The Francis Crick Institute, Molecular Biology of Metabolism laboratory, London, UK
| | - Spyros I Vernardis
- The Francis Crick Institute, Molecular Biology of Metabolism laboratory, London, UK
| | - Kathryn S Lilley
- Department of Biochemistry and The Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
| | - Markus Ralser
- The Francis Crick Institute, Molecular Biology of Metabolism laboratory, London, UK.
- Department of Biochemistry, Charité Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
90
|
Kawashima Y, Watanabe E, Umeyama T, Nakajima D, Hattori M, Honda K, Ohara O. Optimization of Data-Independent Acquisition Mass Spectrometry for Deep and Highly Sensitive Proteomic Analysis. Int J Mol Sci 2019; 20:ijms20235932. [PMID: 31779068 PMCID: PMC6928715 DOI: 10.3390/ijms20235932] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 11/20/2019] [Accepted: 11/23/2019] [Indexed: 02/07/2023] Open
Abstract
Data-independent acquisition (DIA)-mass spectrometry (MS)-based proteomic analysis overtop the existing data-dependent acquisition (DDA)-MS-based proteomic analysis to enable deep proteome coverage and precise relative quantitative analysis in single-shot liquid chromatography (LC)-MS/MS. However, DIA-MS-based proteomic analysis has not yet been optimized in terms of system robustness and throughput, particularly for its practical applications. We established a single-shot LC-MS/MS system with an MS measurement time of 90 min for a highly sensitive and deep proteomic analysis by optimizing the conditions of DIA and nanoLC. We identified 7020 and 4068 proteins from 200 ng and 10 ng, respectively, of tryptic floating human embryonic kidney cells 293 (HEK293F) cell digest by performing the constructed LC-MS method with a protein sequence database search. The numbers of identified proteins from 200 ng and 10 ng of tryptic HEK293F increased to 8509 and 5706, respectively, by searching the chromatogram library created by gas-phase fractionated DIA. Moreover, DIA protein quantification was highly reproducible, with median coefficients of variation of 4.3% in eight replicate analyses. We could demonstrate the power of this system by applying the proteomic analysis to detect subtle changes in protein profiles between cerebrums in germ-free and specific pathogen-free mice, which successfully showed that >40 proteins were differentially produced between the cerebrums in the presence or absence of bacteria.
Collapse
Affiliation(s)
- Yusuke Kawashima
- Department of Applied Genomics, Kazusa DNA Research Institute, Chiba 292-0818, Japan; (Y.K.); (D.N.)
| | - Eiichiro Watanabe
- Laboratory for Gut Homeostasis, RIKEN Center for Integrative Medical Sciences, Kanagawa 230-0045, Japan; (E.W.); (K.H.)
- Department of Microbiology and Immunology, Keio University School of Medicine, Tokyo 160-8582, Japan
- Department of Pediatric Surgery, Faculty of Medicine, The University of Tokyo, Tokyo 113-8655, Japan
| | - Taichi Umeyama
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Kanagawa 230-0045, Japan; (T.U.); (M.H.)
| | - Daisuke Nakajima
- Department of Applied Genomics, Kazusa DNA Research Institute, Chiba 292-0818, Japan; (Y.K.); (D.N.)
| | - Masahira Hattori
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Kanagawa 230-0045, Japan; (T.U.); (M.H.)
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Kenya Honda
- Laboratory for Gut Homeostasis, RIKEN Center for Integrative Medical Sciences, Kanagawa 230-0045, Japan; (E.W.); (K.H.)
- Department of Microbiology and Immunology, Keio University School of Medicine, Tokyo 160-8582, Japan
| | - Osamu Ohara
- Department of Applied Genomics, Kazusa DNA Research Institute, Chiba 292-0818, Japan; (Y.K.); (D.N.)
- Laboratory for Integrative Genomics, RIKEN Center for Integrative Medical Sciences, Kanagawa 230-0045, Japan
- Correspondence: ; Tel.: +81-438-52-391; Fax: +81-438-52-3914
| |
Collapse
|
91
|
Wang X, Shen S, Rasam SS, Qu J. MS1 ion current-based quantitative proteomics: A promising solution for reliable analysis of large biological cohorts. MASS SPECTROMETRY REVIEWS 2019; 38:461-482. [PMID: 30920002 PMCID: PMC6849792 DOI: 10.1002/mas.21595] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 02/28/2019] [Indexed: 05/04/2023]
Abstract
The rapidly-advancing field of pharmaceutical and clinical research calls for systematic, molecular-level characterization of complex biological systems. To this end, quantitative proteomics represents a powerful tool but an optimal solution for reliable large-cohort proteomics analysis, as frequently involved in pharmaceutical/clinical investigations, is urgently needed. Large-cohort analysis remains challenging owing to the deteriorating quantitative quality and snowballing missing data and false-positive discovery of altered proteins when sample size increases. MS1 ion current-based methods, which have become an important class of label-free quantification techniques during the past decade, show considerable potential to achieve reproducible protein measurements in large cohorts with high quantitative accuracy/precision. Nonetheless, in order to fully unleash this potential, several critical prerequisites should be met. Here we provide an overview of the rationale of MS1-based strategies and then important considerations for experimental and data processing techniques, with the emphasis on (i) efficient and reproducible sample preparation and LC separation; (ii) sensitive, selective and high-resolution MS detection; iii)accurate chromatographic alignment; (iv) sensitive and selective generation of quantitative features; and (v) optimal post-feature-generation data quality control. Prominent technical developments in these aspects are discussed. Finally, we reviewed applications of MS1-based strategy in disease mechanism studies, biomarker discovery, and pharmaceutical investigations.
Collapse
Affiliation(s)
- Xue Wang
- Department of Cell Stress BiologyRoswell Park Cancer InstituteBuffaloNew York
| | - Shichen Shen
- Department of Pharmaceutical SciencesUniversity at BuffaloState University of New YorkNew YorkNew York
| | - Sailee Suryakant Rasam
- Department of Biochemistry, University at BuffaloState University of New YorkNew YorkNew York
| | - Jun Qu
- Department of Cell Stress BiologyRoswell Park Cancer InstituteBuffaloNew York
- Department of Pharmaceutical SciencesUniversity at BuffaloState University of New YorkNew YorkNew York
- Department of Biochemistry, University at BuffaloState University of New YorkNew YorkNew York
| |
Collapse
|
92
|
Zhong CQ, Wu R, Chen X, Wu S, Shuai J, Han J. Systematic Assessment of the Effect of Internal Library in Targeted Analysis of SWATH-MS. J Proteome Res 2019; 19:477-492. [DOI: 10.1021/acs.jproteome.9b00669] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Chuan-Qi Zhong
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cellular Signaling Network, School of Life Sciences, Xiamen University, Xiamen 361102, China
| | - Rui Wu
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cellular Signaling Network, School of Life Sciences, Xiamen University, Xiamen 361102, China
| | - Xi Chen
- Medical Research Institute, Wuhan University, Wuhan 430072, China
- SpecAlly Life Technology Co., Ltd., Wuhan 430072, China
| | - Suqin Wu
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cellular Signaling Network, School of Life Sciences, Xiamen University, Xiamen 361102, China
| | - Jianwei Shuai
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cellular Signaling Network, School of Life Sciences, Xiamen University, Xiamen 361102, China
- Department of Physics, Xiamen University, Xiamen 361005, China
| | - Jiahuai Han
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cellular Signaling Network, School of Life Sciences, Xiamen University, Xiamen 361102, China
| |
Collapse
|
93
|
Zhang X, Qi Y, Zhang Q, Liu W. Application of mass spectrometry-based MHC immunopeptidome profiling in neoantigen identification for tumor immunotherapy. Biomed Pharmacother 2019; 120:109542. [PMID: 31629254 DOI: 10.1016/j.biopha.2019.109542] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 10/04/2019] [Accepted: 10/04/2019] [Indexed: 12/15/2022] Open
Abstract
One of the challenges for cancer vaccine and adoptive T-cell-based immunotherapy is to identify the major histocompatibility complex (MHC)-associated non-self neoantigens recognized by T cells. T cell epitope in silico prediction algorithms have been widely used for neoantigen prediction; nonetheless, this platform lacks the experimental evidence of directly identification of the presented epitopes on cell surface. Currently, mass spectrometry (MS)-based proteomics is an advanced analytical technology for large-scale peptide sequencing, which has become a powerful tool for directly profiling the immunopeptidome presented by MHC molecules. Integrating with next-generation sequencing, proteogenomic analysis provides the "gold standard" for neoantigen identification at protein level. This method discovers the tumor-specific neoantigens derived from somatic mutations, proteasome splicing, noncoding RNA, and post-translational modified antigens. Herein, we review basis of antigen processing and presentation, tumor antigen classification, existing approaches for neoantigen discovery, quantitative proteomics, epitope prediction programs, and advantages and drawbacks of proteomics workflow for MHC immunopeptidome profiling. Furthermore, we summarize 40 recently published reports addressing the fundamental theory, breakthrough and most advanced updates for the mass spectrometry-based neoantigen discovery for cancer immunotherapy.
Collapse
Affiliation(s)
- Xiaomei Zhang
- Guangdong Key Laboratory of Liver Disease Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China
| | - Yue Qi
- Thoracic & GI oncology branch, National Cancer Institute, CCR, NIH, Bethesda, MD 20814, USA
| | - Qi Zhang
- Guangdong Key Laboratory of Liver Disease Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China; Cell-Gene Therapy Translational Medicine Research Center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou 510630, China
| | - Wei Liu
- Guangdong Key Laboratory of Liver Disease Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China; Thoracic & GI oncology branch, National Cancer Institute, CCR, NIH, Bethesda, MD 20814, USA.
| |
Collapse
|
94
|
Ignjatovic V, Geyer PE, Palaniappan KK, Chaaban JE, Omenn GS, Baker MS, Deutsch EW, Schwenk JM. Mass Spectrometry-Based Plasma Proteomics: Considerations from Sample Collection to Achieving Translational Data. J Proteome Res 2019; 18:4085-4097. [PMID: 31573204 DOI: 10.1021/acs.jproteome.9b00503] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The proteomic analysis of human blood and blood-derived products (e.g., plasma) offers an attractive avenue to translate research progress from the laboratory into the clinic. However, due to its unique protein composition, performing proteomics assays with plasma is challenging. Plasma proteomics has regained interest due to recent technological advances, but challenges imposed by both complications inherent to studying human biology (e.g., interindividual variability) and analysis of biospecimens (e.g., sample variability), as well as technological limitations remain. As part of the Human Proteome Project (HPP), the Human Plasma Proteome Project (HPPP) brings together key aspects of the plasma proteomics pipeline. Here, we provide considerations and recommendations concerning study design, plasma collection, quality metrics, plasma processing workflows, mass spectrometry (MS) data acquisition, data processing, and bioinformatic analysis. With exciting opportunities in studying human health and disease though this plasma proteomics pipeline, a more informed analysis of human plasma will accelerate interest while enhancing possibilities for the incorporation of proteomics-scaled assays into clinical practice.
Collapse
Affiliation(s)
- Vera Ignjatovic
- Haematology Research , Murdoch Children's Research Institute , Parkville , VIC 3052 , Australia.,Department of Paediatrics , The University of Melbourne , Parkville , VIC 3052 , Australia
| | - Philipp E Geyer
- NNF Center for Protein Research, Faculty of Health Sciences , University of Copenhagen , 2200 Copenhagen , Denmark.,Department of Proteomics and Signal Transduction , Max Planck Institute of Biochemistry , 82152 Martinsried , Germany
| | - Krishnan K Palaniappan
- Freenome , 259 East Grand Avenue , South San Francisco , California 94080 , United States
| | - Jessica E Chaaban
- Haematology Research , Murdoch Children's Research Institute , Parkville , VIC 3052 , Australia
| | - Gilbert S Omenn
- Departments of Computational Medicine & Bioinformatics, Human Genetics, and Internal Medicine and School of Public Health , University of Michigan , 100 Washtenaw Avenue , Ann Arbor , Michigan 48109-2218 , United States
| | - Mark S Baker
- Department of Biomedical Sciences, Faculty of Medicine & Health Sciences , Macquarie University , 75 Talavera Road , North Ryde , NSW 2109 , Australia
| | - Eric W Deutsch
- Institute for Systems Biology , 401 Terry Avenue North , Seattle , Washington 98109 , United States
| | - Jochen M Schwenk
- Affinity Proteomics, SciLifeLab , KTH Royal Institute of Technology , 171 65 Stockholm , Sweden
| |
Collapse
|
95
|
Noor Z, Ranganathan S. Bioinformatics approaches for improving seminal plasma proteome analysis. Theriogenology 2019; 137:43-49. [PMID: 31186128 DOI: 10.1016/j.theriogenology.2019.05.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Reproduction efficiency of male animals is one of the key factors influencing the sustainability of livestock. Mass spectrometry (MS) based proteomics has become an important tool for studying seminal plasma proteomes. In this review, we summarize bioinformatics analysis strategies for current proteomics approaches, for identifying novel biomarkers of reproductive robustness.
Collapse
Affiliation(s)
- Zainab Noor
- Department of Molecular Sciences, Macquarie University, Sydney, Australia
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, Australia.
| |
Collapse
|
96
|
Fert-Bober J, Murray CI, Parker SJ, Van Eyk JE. Precision Profiling of the Cardiovascular Post-Translationally Modified Proteome: Where There Is a Will, There Is a Way. Circ Res 2019; 122:1221-1237. [PMID: 29700069 DOI: 10.1161/circresaha.118.310966] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
There is an exponential increase in biological complexity as initial gene transcripts are spliced, translated into amino acid sequence, and post-translationally modified. Each protein can exist as multiple chemical or sequence-specific proteoforms, and each has the potential to be a critical mediator of a physiological or pathophysiological signaling cascade. Here, we provide an overview of how different proteoforms come about in biological systems and how they are most commonly measured using mass spectrometry-based proteomics and bioinformatics. Our goal is to present this information at a level accessible to every scientist interested in mass spectrometry and its application to proteome profiling. We will specifically discuss recent data linking various protein post-translational modifications to cardiovascular disease and conclude with a discussion for enablement and democratization of proteomics across the cardiovascular and scientific community. The aim is to inform and inspire the readership to explore a larger breadth of proteoform, particularity post-translational modifications, related to their particular areas of expertise in cardiovascular physiology.
Collapse
Affiliation(s)
- Justyna Fert-Bober
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| | - Christopher I Murray
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| | - Sarah J Parker
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA.
| | - Jennifer E Van Eyk
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| |
Collapse
|
97
|
Glyco-DIA: a method for quantitative O-glycoproteomics with in silico-boosted glycopeptide libraries. Nat Methods 2019; 16:902-910. [PMID: 31384044 DOI: 10.1038/s41592-019-0504-x] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 06/26/2019] [Indexed: 12/21/2022]
Abstract
We report a liquid chromatography coupled to tandem mass spectrometry O-glycoproteomics strategy using data-independent acquisition (DIA) mode for direct analysis of O-glycoproteins. This approach enables characterization of glycopeptides and structures of O-glycans on a proteome-wide scale with quantification of stoichiometries (though it does not allow for direct unambiguous glycosite identification). The method relies on a spectral library of O-glycopeptides; the Glyco-DIA library contains sublibraries obtained from human cell lines and human serum, and it currently covers 2,076 O-glycoproteins (11,452 unique glycopeptide sequences) and the 5 most common core1 O-glycan structures. Applying the Glyco-DIA library to human serum without enrichment for glycopeptides enabled us to identify and quantify 269 distinct glycopeptide sequences bearing up to 5 different core1 O-glycans from 159 glycoproteins in a SingleShot analysis.
Collapse
|
98
|
Saleh S, Staes A, Deborggraeve S, Gevaert K. Targeted Proteomics for Studying Pathogenic Bacteria. Proteomics 2019; 19:e1800435. [DOI: 10.1002/pmic.201800435] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 06/04/2019] [Indexed: 02/04/2023]
Affiliation(s)
- Sara Saleh
- Department of Biomedical SciencesInstitute of Tropical Medicine B‐2000 Antwerp Belgium
- VIB Center for Medical Biotechnology B‐9000 Ghent Belgium
- Department of Biomolecular MedicineGhent University B‐9000 Ghent Belgium
| | - An Staes
- VIB Center for Medical Biotechnology B‐9000 Ghent Belgium
- Department of Biomolecular MedicineGhent University B‐9000 Ghent Belgium
| | - Stijn Deborggraeve
- Department of Biomedical SciencesInstitute of Tropical Medicine B‐2000 Antwerp Belgium
| | - Kris Gevaert
- VIB Center for Medical Biotechnology B‐9000 Ghent Belgium
- Department of Biomolecular MedicineGhent University B‐9000 Ghent Belgium
| |
Collapse
|
99
|
Genome Sequences of Three Cluster C Mycobacteriophages, Bipolarisk, Bread, and FudgeTart. Microbiol Resour Announc 2019; 8:8/28/e00290-19. [PMID: 31296672 PMCID: PMC6624755 DOI: 10.1128/mra.00290-19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Three mycobacteriophages, Bipolarisk, Bread, and FudgeTart, were isolated from enriched soil samples found in Crete, NE. All three phages are lytic, belong to subcluster C1, and infect Mycobacterium smegmatis mc2155. The structures of the three genomes are similar, with slight variations in gene number and content. Three mycobacteriophages, Bipolarisk, Bread, and FudgeTart, were isolated from enriched soil samples found in Crete, NE. All three phages are lytic, belong to subcluster C1, and infect Mycobacterium smegmatis mc2155. The structures of the three genomes are similar, with slight variations in gene number and content.
Collapse
|
100
|
Mun DG, Nam D, Kim H, Pandey A, Lee SW. Accurate Precursor Mass Assignment Improves Peptide Identification in Data-Independent Acquisition Mass Spectrometry. Anal Chem 2019; 91:8453-8460. [DOI: 10.1021/acs.analchem.9b01474] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- Dong-Gi Mun
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Dowoon Nam
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Hokeun Kim
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Akhilesh Pandey
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota 55902, United States
- Manipal Academy of Higher Education (MAHE), Manipal, 576104 Karnataka, India
| | - Sang-Won Lee
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| |
Collapse
|