1
|
Lazear MR. Sage: An Open-Source Tool for Fast Proteomics Searching and Quantification at Scale. J Proteome Res 2023; 22:3652-3659. [PMID: 37819886 DOI: 10.1021/acs.jproteome.3c00486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
The growing complexity and volume of proteomics data necessitate the development of efficient software tools for peptide identification and quantification from mass spectra. Given their central role in proteomics, it is imperative that these tools are auditable and extensible─requirements that are best fulfilled by open-source and permissively licensed software. This work presents Sage, a high-performance, open-source, and freely available proteomics pipeline. Scalable and cloud-ready, Sage matches the performance of state-of-the-art software tools while running an order of magnitude faster.
Collapse
Affiliation(s)
- Michael R Lazear
- Belharra Therapeutics, 3985 Sorrento Valley Boulevard Suite C, San Diego, California 92121, United States
| |
Collapse
|
2
|
MacCoss MJ, Alfaro JA, Faivre DA, Wu CC, Wanunu M, Slavov N. Sampling the proteome by emerging single-molecule and mass spectrometry methods. Nat Methods 2023; 20:339-346. [PMID: 36899164 PMCID: PMC10044470 DOI: 10.1038/s41592-023-01802-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
Mammalian cells have about 30,000-fold more protein molecules than mRNA molecules, which has major implications in the development of proteomics technologies. We review strategies that have been helpful for counting billions of protein molecules by liquid chromatography-tandem mass spectrometry (LC-MS/MS) and suggest that these strategies can benefit single-molecule methods, especially in mitigating the challenges of the wide dynamic range of the proteome.
Collapse
Affiliation(s)
- Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Javier Antonio Alfaro
- International Centre for Cancer Vaccine Science, University of Gdańsk, Gdańsk, Poland.
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada.
- School of Informatics, University of Edinburgh, Edinburgh, UK.
| | - Danielle A Faivre
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Christine C Wu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Meni Wanunu
- Department of Physics, Northeastern University, Boston, MA, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA.
- Parallel Squared Technology Institute, Watertown, MA, USA.
| |
Collapse
|
3
|
Yeung D, Spicer V, Zahedi RP, Krokhin O. Exploring the variable space of shallow machine learning models for reversed-phase retention time prediction. Comput Struct Biotechnol J 2023; 21:2446-2453. [PMID: 37090433 PMCID: PMC10113922 DOI: 10.1016/j.csbj.2023.02.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/24/2023] [Accepted: 02/24/2023] [Indexed: 03/02/2023] Open
Abstract
Peptide retention time (RT) prediction algorithms are tools to study and identify the physicochemical properties that drive the peptide-sorbent interaction. Traditional RT algorithms use multiple linear regression with manually curated parameters to determine the degree of direct contribution for each parameter and improvements to RT prediction accuracies relied on superior feature engineering. Deep learning led to a significant increase in RT prediction accuracy and automated feature engineering via chaining multiple learning modules. However, the significance and the identity of these extracted variables are not well understood due to the inherent complexity when interpreting "relationships-of-relationships" found in deep learning variables. To achieve both accuracy and interpretability simultaneously, we isolated individual modules used in deep learning and the isolated modules are the shallow learners employed for RT prediction in this work. Using a shallow convolutional neural network (CNN) and gated recurrent unit (GRU), we find that the spatial features obtained via the CNN correlate with real-world physicochemical properties namely cross-collisional sections (CCS) and variations of assessable surface area (ASA). Furthermore, we determined that the discovered parameters are "micro-coefficients" that contribute to the "macro-coefficient" - hydrophobicity. Manually embedding CCS and the variations of ASA to the GRU model yielded an R2 = 0.981 using only 525 variables and can represent 88% of the ∼110,000 tryptic peptides used in our dataset. This work highlights the feature discovery process of our shallow learners can achieve beyond traditional RT models in performance and have better interpretability when compared with the deep learning RT algorithms found in the literature.
Collapse
|
4
|
Lenčo J, Jadeja S, Naplekov DK, Krokhin OV, Khalikova MA, Chocholouš P, Urban J, Broeckhoven K, Nováková L, Švec F. Reversed-Phase Liquid Chromatography of Peptides for Bottom-Up Proteomics: A Tutorial. J Proteome Res 2022; 21:2846-2892. [PMID: 36355445 DOI: 10.1021/acs.jproteome.2c00407] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The performance of the current bottom-up liquid chromatography hyphenated with mass spectrometry (LC-MS) analyses has undoubtedly been fueled by spectacular progress in mass spectrometry. It is thus not surprising that the MS instrument attracts the most attention during LC-MS method development, whereas optimizing conditions for peptide separation using reversed-phase liquid chromatography (RPLC) remains somewhat in its shadow. Consequently, the wisdom of the fundaments of chromatography is slowly vanishing from some laboratories. However, the full potential of advanced MS instruments cannot be achieved without highly efficient RPLC. This is impossible to attain without understanding fundamental processes in the chromatographic system and the properties of peptides important for their chromatographic behavior. We wrote this tutorial intending to give practitioners an overview of critical aspects of peptide separation using RPLC to facilitate setting the LC parameters so that they can leverage the full capabilities of their MS instruments. After briefly introducing the gradient separation of peptides, we discuss their properties that affect the quality of LC-MS chromatograms the most. Next, we address the in-column and extra-column broadening. The last section is devoted to key parameters of LC-MS methods. We also extracted trends in practice from recent bottom-up proteomics studies and correlated them with the current knowledge on peptide RPLC separation.
Collapse
Affiliation(s)
- Juraj Lenčo
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - Siddharth Jadeja
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - Denis K Naplekov
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - Oleg V Krokhin
- Department of Internal Medicine, Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, 799 JBRC, 715 McDermot Avenue, WinnipegR3E 3P4, Manitoba, Canada
| | - Maria A Khalikova
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - Petr Chocholouš
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - Jiří Urban
- Department of Chemistry, Faculty of Science, Masaryk University, Kamenice 5, 625 00Brno, Czech Republic
| | - Ken Broeckhoven
- Department of Chemical Engineering (CHIS), Faculty of Engineering, Vrije Universiteit Brussel, Pleinlaan 2, 1050Brussel, Belgium
| | - Lucie Nováková
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - František Švec
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| |
Collapse
|
5
|
Enmark M, Häggström J, Samuelsson J, Fornstedt T. Building machine-learning-based model for retention time and resolution predictions in ion pair chromatography of oligonucleotides. J Chromatogr A 2022; 1671:462999. [DOI: 10.1016/j.chroma.2022.462999] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 03/22/2022] [Accepted: 03/25/2022] [Indexed: 01/29/2023]
|
6
|
Rupprecht F, Enge S, Schmidt K, Gao W, Miller R. Automating LC–MS/MS mass chromatogram quantification: Wavelet transform based peak detection and automated estimation of peak boundaries and signal-to-noise ratio using signal processing methods. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103211] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
7
|
Hruska M, Holub D. Evaluation of an integrative Bayesian peptide detection approach on a combinatorial peptide library. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2021; 27:217-234. [PMID: 34989269 DOI: 10.1177/14690667211066725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detection of peptides lies at the core of bottom-up proteomics analyses. We examined a Bayesian approach to peptide detection, integrating match-based models (fragments, retention time, isotopic distribution, and precursor mass) and peptide prior probability models under a unified probabilistic framework. To assess the relevance of these models and their various combinations, we employed a complete- and a tail-complete search of a low-precursor-mass synthetic peptide library based on oncogenic KRAS peptides. The fragment match was by far the most informative match-based model, while the retention time match was the only remaining such model with an appreciable impact--increasing correct detections by around 8 %. A peptide prior probability model built from a reference proteome greatly improved the detection over a uniform prior, essentially transforming de novo sequencing into a reference-guided search. The knowledge of a correct sequence tag in advance to peptide-spectrum matching had only a moderate impact on peptide detection unless the tag was long and of high certainty. The approach also derived more precise error rates on the analyzed combinatorial peptide library than those estimated using PeptideProphet and Percolator, showing its potential applicability for the detection of homologous peptides. Although the approach requires further computational developments for routine data analysis, it illustrates the value of peptide prior probabilities and presents a Bayesian approach for their incorporation into peptide detection.
Collapse
Affiliation(s)
- Miroslav Hruska
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
- Department of Computer Science, Faculty of Science, 98735Palacky University, Olomouc, Czech Republic
| | - Dusan Holub
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
| |
Collapse
|
8
|
Huang R, Zhu W, Xu Z, Chen J, Jiang B, Chen H, Chen W. Accurate Retention Time Prediction Based on Monolinked Peptide Information to Confidently Identify Cross-Linked Peptides. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:2410-2416. [PMID: 34320809 DOI: 10.1021/jasms.1c00120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cross-linking mass spectrometry methods have not been successfully applied to protein-protein interaction discovery at a proteome-wide level mainly due to the computation complexity (O (n2)) issue. In a previous report, we proposed a decision tree searching strategy (DTSS), which can reduce complexity by orders of magnitude. In this study, we further found that the monolinked peptides carry out the information on the retention time of the corresponding cross-linked pairs; therefore, the retention time of cross-linked peptide pairs can be predicted accurately. By utilizing the retention time as an extra filter, the false positive rate can be reduced by around 86% with a sensitivity loss of 10%. The method combined with DTSS (T-DTSS) not only benefits improving identification confidence but also leads to lower cutoff scores and facilitates substantially increasing inter-cross-link identification. T-DTSS was successfully applied to the identification of inter-cross-links obtained from Escherichia coli cell lysate cross-linked by a newly synthesized enrichable cross-linker, pDSBE. The approach can be applicable to both cleavable and noncleavable methods.
Collapse
Affiliation(s)
- Rong Huang
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong, Shanghai 201210, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Shijingshan District, Beijing 100049, China
| | - Wei Zhu
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong, Shanghai 201210, China
| | - Zili Xu
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong, Shanghai 201210, China
| | - Jiakang Chen
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong, Shanghai 201210, China
| | - Biao Jiang
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong, Shanghai 201210, China
| | - Hongli Chen
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong, Shanghai 201210, China
| | - Wenzhang Chen
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Pudong, Shanghai 201210, China
| |
Collapse
|
9
|
Giese SH, Sinn LR, Wegner F, Rappsilber J. Retention time prediction using neural networks increases identifications in crosslinking mass spectrometry. Nat Commun 2021; 12:3237. [PMID: 34050149 PMCID: PMC8163845 DOI: 10.1038/s41467-021-23441-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 04/26/2021] [Indexed: 12/13/2022] Open
Abstract
Crosslinking mass spectrometry has developed into a robust technique that is increasingly used to investigate the interactomes of organelles and cells. However, the incomplete and noisy information in the mass spectra of crosslinked peptides limits the numbers of protein-protein interactions that can be confidently identified. Here, we leverage chromatographic retention time information to aid the identification of crosslinked peptides from mass spectra. Our Siamese machine learning model xiRT achieves highly accurate retention time predictions of crosslinked peptides in a multi-dimensional separation of crosslinked E. coli lysate. Importantly, supplementing the search engine score with retention time features leads to a substantial increase in protein-protein interactions without affecting confidence. This approach is not limited to cell lysates and multi-dimensional separation but also improves considerably the analysis of crosslinked multiprotein complexes with a single chromatographic dimension. Retention times are a powerful complement to mass spectrometric information to increase the sensitivity of crosslinking mass spectrometry analyses.
Collapse
Affiliation(s)
- Sven H Giese
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, Potsdam, Germany
- Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Ludwig R Sinn
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany
| | - Fritz Wegner
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany
| | - Juri Rappsilber
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany.
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
10
|
DU Z, SHAO W, QIN W. [Research progress and application of retention time prediction method based on deep learning]. Se Pu 2021; 39:211-218. [PMID: 34227303 PMCID: PMC9403805 DOI: 10.3724/sp.j.1123.2020.08015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Indexed: 11/25/2022] Open
Abstract
In "shotgun" proteomics strategy, the proteome is explained by analyzing tryptic digested peptides using liquid chromatography-mass spectrometry. In this strategy, the retention time of peptides in liquid chromatography separation can be predicted based on the peptide sequence. This is a useful feature for peptide identification. Therefore, the prediction of the retention time has attracted much research attention. Traditional methods calculate the physical and chemical properties of the peptides based on their amino acid sequence to obtain the retention time under certain chromatography conditions; however, these methods cannot be directly adopted for other chromatography conditions, nor can they be used across laboratories or instrument platforms. To solve this problem, in recent years, deep learning was introduced to proteomics research for retention time prediction. Deep learning is an advanced machine-learning method that has extraordinary capability to learn complex relationships from large-scale data. By stacking multiple hidden neural networks, deep learning can ingest raw data without manually designed features. Transfer learning is an important method in deep learning. It improves the learning process a new task through the transfer of knowledge from an already-learned related task. Transfer learning allows models trained using large datasets to be utilized across conditions by fine-tuning on smaller datasets, instead of retraining the whole model. Many retention time prediction methods have been developed. In the process of training the model, the sequences of peptides are encoded to represent peptide information. Deep learning considers the relationship between the characteristics of the peptides and their corresponding retention times without the need for manual input of the physical and chemical properties of the peptides. Compared with traditional methods, deep learning methods have higher accuracy and can be easily used under different chromatography conditions by transfer learning. If there are not enough datasets to train a new model, a trained model from other datasets can be used as a replacement after calibration with small datasets obtained from these chromatography conditions. While the retention times of modified peptides can also be predicted, the predictions are inadequate for complex modifications such as glycosylation, and this is one of the main problems to be solved. The predicted retention times were used to control the quality of peptide identification. With high accuracy, the predicted retention times can be considered as actual retention times. Therefore, the difference between predicted and observed retention times can serve as an effective and unbiased quantitative metric for evaluating the quality of peptide-spectrum matches (PSMs) reported using different peptide identification methods. Combined with fragment ion intensity prediction, retention time prediction is used to generate spectral libraries for data-independent acquisition (DIA)-based mass spectrometry analysis. Generally, DIA methods identify peptides using specific spectrum libraries obtained from data-dependent acquisition (DDA) experiments. As a result, only peptides detected in the DDA experiments can be present in the libraries and detected in DIA. Furthermore, it takes a lot of time and effort to build libraries from DDA experiments, and typically, they cannot be adopted across different laboratories or instrument platforms. In contrast, the pseudo spectral libraries generated by retention times and fragment ion intensity prediction can overcome these shortcomings. The pseudo spectral libraries generate theoretical spectra of all possible peptides without the need for DDA experiments. This paper reviews the research progress of deep learning methods in the prediction of retention time and in related applications in order to provide references for retention time prediction and protein identification. At the same time, the development direction and application trend of retention time prediction methods based on deep learning are discussed.
Collapse
|
11
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
12
|
Gussakovsky D, Anderson G, Spicer V, Krokhin OV. Peptide separation selectivity in proteomics LC-MS experiments: Comparison of formic and mixed formic/heptafluorobutyric acids ion-pairing modifiers. J Sep Sci 2020; 43:3830-3839. [PMID: 32818315 DOI: 10.1002/jssc.202000578] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Separation selectivity and detection sensitivity of reversed-phase high-performance liquid chromatography with tandem mass spectrometry analyses were compared for formic (0.1%) and formic/heptafluorobutyric (0.1%/0.005%) acid based eluents using a proteomic data set of ∼12 000 paired peptides. The addition of a small amount of hydrophobic heptafluorobutyric acid ion-pairing modifier increased peptide retention by up to 10% acetonitrile depending on peptide charge, size, and hydrophobicity. Retention increase was greatest for peptides that were short, highly charged, and hydrophilic. There was an ∼3.75-fold reduction in MS signal observed across the whole population of peptides following the addition of heptafluorobutyric acid. This resulted in ∼36% and ∼21% reduction of detected proteins and unique peptides for the whole cell lysate digests, respectively. We also confirmed that the separation selectivity of the formic/heptafluorobutyric acid system was very similar to the commonly used conditions of 0.1% trifluoroacetic acid, and developed a new version of the Sequence-Specific Retention calculator model for the formic/heptafluorobutyric acid system showing the same ∼0.98 R2 -value accuracy as the Sequence-Specific Retention calculator formic acid model. In silico simulation of peptide distribution in separation space showed that the addition of 0.005% heptafluorobutyric acid to the 0.1% formic acid system increased potential proteome coverage by ∼11% of detectable species (tryptic peptides ≥ four amino acids).
Collapse
Affiliation(s)
- Daniel Gussakovsky
- Department of Chemistry, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Geoff Anderson
- Department of Chemistry, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Vic Spicer
- Manitoba Centre for Proteomics and Systems Biology, Winnipeg, Manitoba, Canada
| | - Oleg V Krokhin
- Manitoba Centre for Proteomics and Systems Biology, Winnipeg, Manitoba, Canada.,Department of Internal Medicine, University of Manitoba, Winnipeg, Manitoba, Canada
| |
Collapse
|
13
|
Bouwmeester R, Martens L, Degroeve S. Generalized Calibration Across Liquid Chromatography Setups for Generic Prediction of Small-Molecule Retention Times. Anal Chem 2020; 92:6571-6578. [DOI: 10.1021/acs.analchem.0c00233] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology VIB, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology VIB, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology VIB, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
| |
Collapse
|
14
|
Wen B, Li K, Zhang Y, Zhang B. Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis. Nat Commun 2020; 11:1759. [PMID: 32273506 PMCID: PMC7145864 DOI: 10.1038/s41467-020-15456-w] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 03/10/2020] [Indexed: 01/01/2023] Open
Abstract
Genomics-based neoantigen discovery can be enhanced by proteomic evidence, but there remains a lack of consensus on the performance of different quality control methods for variant peptide identification in proteogenomics. We propose to use the difference between accurately predicted and observed retention times for each peptide as a metric to evaluate different quality control methods. To this end, we develop AutoRT, a deep learning algorithm with high accuracy in retention time prediction. Analysis of three cancer data sets with a total of 287 tumor samples using different quality control strategies results in substantially different numbers of identified variant peptides and putative neoantigens. Our systematic evaluation, using the proposed retention time metric, provides insights and practical guidance on the selection of quality control strategies. We implement the recommended strategy in a computational workflow named NeoFlow to support proteogenomics-based neoantigen prioritization, enabling more sensitive discovery of putative neoantigens. Identifying mutation-derived neoantigens by proteogenomics requires robust strategies for quality control. Here, the authors propose peptide retention time as an evaluation metric for proteogenomics quality control methods, and develop a deep learning algorithm for accurate retention time prediction.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kai Li
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Yun Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
15
|
Ivanov MV, Bubis JA, Gorshkov V, Tarasova IA, Levitsky LI, Lobas AA, Solovyeva EM, Pridatchenko ML, Kjeldsen F, Gorshkov MV. DirectMS1: MS/MS-Free Identification of 1000 Proteins of Cellular Proteomes in 5 Minutes. Anal Chem 2020; 92:4326-4333. [DOI: 10.1021/acs.analchem.9b05095] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- Mark V. Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Julia A. Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M DK-5230, Denmark
| | - Irina A. Tarasova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Lev I. Levitsky
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Anna A. Lobas
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Elizaveta M. Solovyeva
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Marina L. Pridatchenko
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M DK-5230, Denmark
| | - Mikhail V. Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
- Moscow Institute of Physics and Technology (State University), 141700 Dolgoprudny, Russia
| |
Collapse
|
16
|
Abstract
In bottom-up proteomics, proteins are typically identified by enzymatic digestion into peptides, tandem mass spectrometry and comparison of the tandem mass spectra with those predicted from a sequence database for peptides within measurement uncertainty from the experimentally obtained mass. Although now decreasingly common, isolated proteins or simple protein mixtures can also be identified by measuring only the masses of the peptides resulting from the enzymatic digest, without any further fragmentation. Separation methods such as liquid chromatography and electrophoresis are often used to fractionate complex protein or peptide mixtures prior to analysis by mass spectrometry. Although the primary reason for this is to avoid ion suppression and improve data quality, these separations are based on physical and chemical properties of the peptides or proteins and therefore also provide information about them. Depending on the separation method, this could be protein molecular weight (SDS-PAGE), isoelectric point (IEF), charge at a known pH (ion exchange chromatography), or hydrophobicity (reversed phase chromatography). These separations produce approximate measurements on properties that to some extent can be predicted from amino acid sequences. In the case of molecular weight of proteins without posttranslational modifications this is straightforward: simply add the molecular weights of the amino acid residues in the protein. For IEF, charge and hydrophobicity, the order of the amino acids, and folding state of the peptide or protein also matter, but it is nevertheless possible to predict the behavior of peptides and proteins in these separation methods to a degree which renders such predictions useful. This chapter reviews the topic of using data from separation methods for identification and validation in proteomics, with special emphasis on predicting retention times of tryptic peptides in reversed-phase chromatography under acidic conditions, as this is one of the most commonly used separation methods in bottom-up proteomics.
Collapse
|
17
|
Samuelsson J, Eiriksson FF, Åsberg D, Thorsteinsdóttir M, Fornstedt T. Determining gradient conditions for peptide purification in RPLC with machine-learning-based retention time predictions. J Chromatogr A 2019; 1598:92-100. [DOI: 10.1016/j.chroma.2019.03.043] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 03/20/2019] [Accepted: 03/21/2019] [Indexed: 01/22/2023]
|
18
|
Chen AT, Franks A, Slavov N. DART-ID increases single-cell proteome coverage. PLoS Comput Biol 2019; 15:e1007082. [PMID: 31260443 PMCID: PMC6625733 DOI: 10.1371/journal.pcbi.1007082] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 07/12/2019] [Accepted: 05/06/2019] [Indexed: 01/09/2023] Open
Abstract
Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30-50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at http://dart-id.slavovlab.net.
Collapse
Affiliation(s)
- Albert Tian Chen
- Department of Bioengineering, Northeastern University, Boston, Massachusetts, United States of America
- Barnett Institute, Northeastern University, Boston, Massachusetts, United States of America
| | - Alexander Franks
- Department of Statistics and Applied Probability, University of California Santa Barbara, California, United States of America
| | - Nikolai Slavov
- Department of Bioengineering, Northeastern University, Boston, Massachusetts, United States of America
- Barnett Institute, Northeastern University, Boston, Massachusetts, United States of America
- Department of Biology, Northeastern University, Boston, Massachusetts, United States of America
| |
Collapse
|
19
|
Bouwmeester R, Martens L, Degroeve S. Comprehensive and Empirical Evaluation of Machine Learning Algorithms for Small Molecule LC Retention Time Prediction. Anal Chem 2019; 91:3694-3703. [DOI: 10.1021/acs.analchem.8b05820] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
20
|
Tarasova IA, Masselon CD, Gorshkov AV, Gorshkov MV. Predictive chromatography of peptides and proteins as a complementary tool for proteomics. Analyst 2018; 141:4816-4832. [PMID: 27419248 DOI: 10.1039/c6an00919k] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
In the last couple of decades, considerable effort has been focused on developing methods for quantitative and qualitative proteome characterization. The method of choice in this characterization is mass spectrometry used in combination with sample separation. One of the most widely used separation techniques at the front end of a mass spectrometer is high performance liquid chromatography (HPLC). A unique feature of HPLC is its specificity to the amino acid sequence of separated peptides and proteins. This specificity may provide additional information about the peptides or proteins under study which is complementary to the mass spectrometry data. The value of this information for proteomics has been recognized in the past few decades, which has stimulated significant effort in the development and implementation of computational and theoretical models for the prediction of peptide retention time for a given sequence. Here we review the advances in this area and the utility of predicted retention times for proteomic applications.
Collapse
Affiliation(s)
- Irina A Tarasova
- Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia.
| | - Christophe D Masselon
- CEA, iRTSV-BGE, Laboratoire d'Etude de la Dynamique des Protéomes, Grenoble, F-38000, France and INSERM, U1038-BGE, F-38000, Grenoble, France
| | - Alexander V Gorshkov
- N.N. Semenov Institute of Chemical Physics, Russian Academy of Sciences, Moscow 119991, Russia
| | - Mikhail V Gorshkov
- Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia. and Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow region 141700, Russia
| |
Collapse
|
21
|
Ma C, Ren Y, Yang J, Ren Z, Yang H, Liu S. Improved Peptide Retention Time Prediction in Liquid Chromatography through Deep Learning. Anal Chem 2018; 90:10881-10888. [PMID: 30114359 DOI: 10.1021/acs.analchem.8b02386] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The accuracy of peptide retention time (RT) prediction model in liquid chromatography (LC) is still not sufficient for wider implementation in proteomics practice. Herein, we propose deep learning as an ideal tool to considerably improve this prediction. A new peptide RT prediction tool, DeepRT, was designed using a capsule network model, and the public data sets containing peptides separated by reverse-phase liquid chromatography were used to evaluate the DeepRT performance. Compared with other prevailing RT predictors, DeepRT attained overall improvement in the prediction of peptide RTs with an R2 of ∼0.994. Moreover, DeepRT was able to accommodate to the peptides that were separated by different types of LC, such as strong cation exchange (SCX) and hydrophilic interaction liquid chromatography (HILIC) and to reach the RT prediction with R2 values of ∼0.996 for SCX and ∼0.993 for HILIC, respectively. If a large peptide data set is available for one type of LC, DeepRT can be promoted to DeepRT(+) using transfer learning. Based on a large peptide data set gained from SWATH, DeepRT(+) further elevated the accuracy of RT prediction for peptides in a small data set and enabled a satisfactory prediction upon limited peptides approximating hundreds. Further, DeepRT automatically learns retention-related properties of amino acids under different separation mechanisms, which are well consistent with retention coefficients (Rc) of the amino acids. DeepRT was thus proven to be an improved RT predictor with high flexibility and efficiency. DeepRT is available at https://github.com/horsepurve/DeepRTplus .
Collapse
Affiliation(s)
- Chunwei Ma
- BGI-Shenzhen , Beishan Industrial Zone 11th Building, Yantian District, Shenzhen , Guangdong 518083 , China.,China National GeneBank , BGI-Shenzhen , Shenzhen 518120 , China
| | - Yan Ren
- BGI-Shenzhen , Beishan Industrial Zone 11th Building, Yantian District, Shenzhen , Guangdong 518083 , China.,China National GeneBank , BGI-Shenzhen , Shenzhen 518120 , China
| | - Jiarui Yang
- BGI-Shenzhen , Beishan Industrial Zone 11th Building, Yantian District, Shenzhen , Guangdong 518083 , China.,China National GeneBank , BGI-Shenzhen , Shenzhen 518120 , China
| | - Zhe Ren
- BGI-Shenzhen , Beishan Industrial Zone 11th Building, Yantian District, Shenzhen , Guangdong 518083 , China.,China National GeneBank , BGI-Shenzhen , Shenzhen 518120 , China
| | - Huanming Yang
- BGI-Shenzhen , Beishan Industrial Zone 11th Building, Yantian District, Shenzhen , Guangdong 518083 , China.,James D. Watson Institute of Genome Sciences , Hangzhou 310008 , China
| | - Siqi Liu
- BGI-Shenzhen , Beishan Industrial Zone 11th Building, Yantian District, Shenzhen , Guangdong 518083 , China.,China National GeneBank , BGI-Shenzhen , Shenzhen 518120 , China
| |
Collapse
|
22
|
Mohammed Y, Palmblad M. Visualization and application of amino acid retention coefficients obtained from modeling of peptide retention. J Sep Sci 2018; 41:3644-3653. [PMID: 30047222 PMCID: PMC6175132 DOI: 10.1002/jssc.201800488] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 07/17/2018] [Accepted: 07/18/2018] [Indexed: 11/08/2022]
Abstract
We introduce a method for data inspection in liquid separations of peptides using amino acid retention coefficients and their relative change across experiments. Our method allows for the direct comparison between actual experimental conditions, regardless of sample content and without the use of internal standards. The modeling uses linear regression of peptide retention time as a function of amino acid composition. We demonstrate the pH dependency of the model in a control experiment where the pH of the mobile phase was changed in controlled way. We introduce a score to identify the false discovery rate on peptide spectrum match level that corresponds to the set of most robust models, i.e. to maximize the shared agreement between experiments. We demonstrate the method utility in reversed-phase liquid chromatography using 24 datasets with minimal peptide overlap. We apply our method on datasets obtained from a public repository representing various separation designs, including one-dimensional reversed-phase liquid chromatography followed by tandem mass spectrometry, and two-dimensional online strong cation exchange coupled to reversed-phase liquid chromatography followed by tandem mass spectrometry, and highlight new insights. Our method provides a simple yet powerful way to inspect data quality, in particular for multidimensional separations, improving comparability of data at no additional experimental cost.
Collapse
Affiliation(s)
- Yassene Mohammed
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands.,University of Victoria-Genome British Columbia Proteomics Centre, University of Victoria, Victoria, Canada
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands
| |
Collapse
|
23
|
Giese SH, Ishihama Y, Rappsilber J. Peptide Retention in Hydrophilic Strong Anion Exchange Chromatography Is Driven by Charged and Aromatic Residues. Anal Chem 2018. [PMID: 29528219 PMCID: PMC5937359 DOI: 10.1021/acs.analchem.7b05157] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Hydrophilic strong anion exchange chromatography (hSAX) is becoming a popular method for the prefractionation of proteomic samples. However, the use and further development of this approach is affected by the limited understanding of its retention mechanism and the absence of elution time prediction. Using a set of 59 297 confidentially identified peptides, we performed an explorative analysis and built a predictive deep learning model. As expected, charged residues are the major contributors to the retention time through electrostatic interactions. Aspartic acid and glutamic acid have a strong retaining effect and lysine and arginine have a strong repulsion effect. In addition, we also find the involvement of aromatic amino acids. This suggests a substantial contribution of cation-π interactions to the retention mechanism. The deep learning approach was validated using 5-fold cross-validation (CV) yielding a mean prediction accuracy of 70% during CV and 68% on a hold-out validation set. The results of this study emphasize that not only electrostatic interactions but rather diverse types of interactions must be integrated to build a reliable hSAX retention time predictor.
Collapse
Affiliation(s)
- Sven H Giese
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences , Kyoto University , Kyoto 606-8501 , Japan
| | - Juri Rappsilber
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany.,Graduate School of Pharmaceutical Sciences , Kyoto University , Kyoto 606-8501 , Japan.,Wellcome Centre for Cell Biology, School of Biological Sciences , University of Edinburgh , Edinburgh EH9 3BF , United Kingdom
| |
Collapse
|
24
|
Lobas AA, Levitsky LI, Fichtenbaum A, Surin AK, Pridatchenko ML, Mitulovic G, Gorshkov AV, Gorshkov MV. Predictive Liquid Chromatography of Peptides Based on Hydrophilic Interactions for Mass Spectrometry-Based Proteomics. JOURNAL OF ANALYTICAL CHEMISTRY 2018. [DOI: 10.1134/s1061934817140076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
25
|
Maboudi Afkham H, Qiu X, The M, Käll L. Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics. Bioinformatics 2017; 33:508-513. [PMID: 27797755 DOI: 10.1093/bioinformatics/btw619] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 09/20/2016] [Indexed: 12/17/2022] Open
Abstract
Motivation Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time . Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor E lude . Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction. Results In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies. Contact lukas.kall@scilifelab.se. Availability and Implementation Our software and the data used in our experiments is publicly available and can be downloaded from https://github.com/statisticalbiotechnology/GPTime .
Collapse
Affiliation(s)
- Heydar Maboudi Afkham
- Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, 17121 Solna, Sweden
| | - Xuanbin Qiu
- Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, 17121 Solna, Sweden
| | - Matthew The
- Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, 17121 Solna, Sweden
| | - Lukas Käll
- Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, 17121 Solna, Sweden
| |
Collapse
|
26
|
Moruz L, Käll L. Peptide retention time prediction. MASS SPECTROMETRY REVIEWS 2017; 36:615-623. [PMID: 26799864 DOI: 10.1002/mas.21488] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 11/12/2015] [Indexed: 06/05/2023]
Abstract
Most methods for interpreting data from shotgun proteomics experiments are to large degree dependent on being able to predict properties of peptide-ions. Often such predicted properties are limited to molecular mass and fragment spectra, but here we put focus on a perhaps underutilized property, a peptide's chromatographic retention time. We review a couple of different principles of retention time prediction,and their applications within computational proteomics. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:615-623, 2017.
Collapse
Affiliation(s)
- Luminita Moruz
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH, Stockholm, Sweden
| | - Lukas Käll
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH, Stockholm, Sweden
| |
Collapse
|
27
|
Gorshkov AV, Goloborodko AA, Pridatchenko ML, Tarasova IA, Rozdina IG, Evreinov VV, Gorshkov MV. Applicability of the critical-chromatography concept to proteomics problems: Separation of peptides modeled by a heterogeneous rod. POLYMER SCIENCE SERIES A 2017. [DOI: 10.1134/s0965545x17030063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
28
|
Gorshkov AV, Pridatchenko ML, Perlova TY, Tarasova IA, Levitsky LI, Gorshkov MV, Evreinov VV. Applicability of the critical chromatography concept to proteomic problems. II. Effect of mobile phase on the separation of peptides and proteins taking into account the amino acid sequence. JOURNAL OF ANALYTICAL CHEMISTRY 2017. [DOI: 10.1134/s106193481610004x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
29
|
Shortreed MR, Frey BL, Scalf M, Knoener RA, Cesnik AJ, Smith LM. Elucidating Proteoform Families from Proteoform Intact-Mass and Lysine-Count Measurements. J Proteome Res 2016; 15:1213-21. [PMID: 26941048 PMCID: PMC4917391 DOI: 10.1021/acs.jproteome.5b01090] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
![]()
Proteomics
is presently dominated by the “bottom-up”
strategy, in which proteins are enzymatically digested into peptides
for mass spectrometric identification. Although this approach is highly
effective at identifying large numbers of proteins present in complex
samples, the digestion into peptides renders it impossible to identify
the proteoforms from which they were derived. We present here a powerful
new strategy for the identification of proteoforms and the elucidation
of proteoform families (groups of related proteoforms) from the experimental
determination of the accurate proteoform mass and number of lysine
residues contained. Accurate proteoform masses are determined by standard
LC–MS analysis of undigested protein mixtures in an Orbitrap
mass spectrometer, and the lysine count is determined using the NeuCode
isotopic tagging method. We demonstrate the approach in analysis of
the yeast proteome, revealing 8637 unique proteoforms and 1178 proteoform
families. The elucidation of proteoforms and proteoform families afforded
here provides an unprecedented new perspective upon proteome complexity
and dynamics.
Collapse
Affiliation(s)
- Michael R Shortreed
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Brian L Frey
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Rachel A Knoener
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin , 1101 University Avenue, Madison, Wisconsin 53706, United States.,Genome Center of Wisconsin, University of Wisconsin , 425G Henry Mall, Room 3420, Madison, Wisconsin 53706, United States
| |
Collapse
|
30
|
Holman SW, McLean L, Eyers CE. RePLiCal: A QconCAT Protein for Retention Time Standardization in Proteomics Studies. J Proteome Res 2016; 15:1090-102. [PMID: 26775667 DOI: 10.1021/acs.jproteome.5b00988] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
This study introduces a new reversed-phase liquid chromatography retention time (RT) standard, RePLiCal (Reversed-phase liquid chromatography calibrant), produced using QconCAT technology. The synthetic protein contains 27 lysine-terminating calibrant peptides, meaning that the same complement of standards can be generated using either Lys-C or trypsin-based digestion protocols. RePLiCal was designed such that each constituent peptide is unique with respect to all eukaryotic proteomes, thereby enabling integration into a wide range of proteomic analyses. RePLiCal has been benchmarked against three commercially available peptide RT standard kits and outperforms all in terms of LC gradient coverage. RePLiCal also provides a higher number of calibrant points for chromatographic retention time standardization and normalization. The standard provides stable RTs over long analysis times and can be readily transferred between different LC gradients and nUHPLC instruments. Moreover, RePLiCal can be used to predict RTs for other peptides in a timely manner. Furthermore, it is shown that RePLiCal can be used effectively to evaluate trapping column performance for nUHPLC instruments using trap-elute configurations, to optimize gradients to maximize peptide and protein identification rates, and to recalibrate the m/z scale of mass spectrometry data post-acquisition.
Collapse
Affiliation(s)
- Stephen W Holman
- Centre for Proteome Research, Department of Biochemistry, Institute of Integrative Biology, University of Liverpool , Crown Street, Liverpool L69 7ZB, United Kingdom
| | - Lynn McLean
- Centre for Proteome Research, Department of Biochemistry, Institute of Integrative Biology, University of Liverpool , Crown Street, Liverpool L69 7ZB, United Kingdom
| | - Claire E Eyers
- Centre for Proteome Research, Department of Biochemistry, Institute of Integrative Biology, University of Liverpool , Crown Street, Liverpool L69 7ZB, United Kingdom
| |
Collapse
|
31
|
Parker SJ, Rost H, Rosenberger G, Collins BC, Malmström L, Amodei D, Venkatraman V, Raedschelders K, Van Eyk JE, Aebersold R. Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-independent Acquisition Mass Spectrometry. Mol Cell Proteomics 2015. [PMID: 26199342 DOI: 10.1074/mcp.o114.042267] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Accurate knowledge of retention time (RT) in liquid chromatography-based mass spectrometry data facilitates peptide identification, quantification, and multiplexing in targeted and discovery-based workflows. Retention time prediction is particularly important for peptide analysis in emerging data-independent acquisition (DIA) experiments such as SWATH-MS. The indexed RT approach, iRT, uses synthetic spiked-in peptide standards (SiRT) to set RT to a unit-less scale, allowing for normalization of peptide RT between different samples and chromatographic set-ups. The obligatory use of SiRTs can be costly and complicates comparisons and data integration if standards are not included in every sample. Reliance on SiRTs also prevents the inclusion of archived mass spectrometry data for generation of the peptide assay libraries central to targeted DIA-MS data analysis. We have identified a set of peptide sequences that are conserved across most eukaryotic species, termed Common internal Retention Time standards (CiRT). In a series of tests to support the appropriateness of the CiRT-based method, we show: (1) the CiRT peptides normalized RT in human, yeast, and mouse cell lysate derived peptide assay libraries and enabled merging of archived libraries for expanded DIA-MS quantitative applications; (2) CiRTs predicted RT in SWATH-MS data within a 2-min margin of error for the majority of peptides; and (3) normalization of RT using the CiRT peptides enabled the accurate SWATH-MS-based quantification of 340 synthetic isotopically labeled peptides that were spiked into either human or yeast cell lysate. To automate and facilitate the use of these CiRT peptide lists or other custom user-defined internal RT reference peptides in DIA workflows, an algorithm was designed to automatically select a high-quality subset of datapoints for robust linear alignment of RT for use. Implementations of this algorithm are available for the OpenSWATH and Skyline platforms. Thus, CiRT peptides can be used alone or as a complement to SiRTs for RT normalization across peptide spectral libraries and in quantitative DIA-MS studies.
Collapse
Affiliation(s)
- Sarah J Parker
- ‡‡Advanced Clinical Biosystems Research Institute, The Heart Institute, and Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, California
| | - Hannes Rost
- §Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland; ¶PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - George Rosenberger
- §Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland; ¶PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Ben C Collins
- §Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | | | | | - Vidya Venkatraman
- ‡‡Advanced Clinical Biosystems Research Institute, The Heart Institute, and Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, California
| | - Koen Raedschelders
- ‡‡Advanced Clinical Biosystems Research Institute, The Heart Institute, and Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, California
| | - Jennifer E Van Eyk
- From the ‡Department of Medicine, Johns Hopkins University, Baltimore Maryland; ‡‡Advanced Clinical Biosystems Research Institute, The Heart Institute, and Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, California
| | - Ruedi Aebersold
- §Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland; §§Faculty of Science, University of Zurich, Zurich, Switzerland
| |
Collapse
|
32
|
Tarasova IA, Goloborodko AA, Perlova TY, Pridatchenko ML, Gorshkov AV, Evreinov VV, Ivanov AR, Gorshkov MV. Application of Statistical Thermodynamics To Predict the Adsorption Properties of Polypeptides in Reversed-Phase HPLC. Anal Chem 2015; 87:6562-9. [DOI: 10.1021/acs.analchem.5b00595] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Irina A. Tarasova
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Anton A. Goloborodko
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Tatyana Y. Perlova
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Marina L. Pridatchenko
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Alexander V. Gorshkov
- N.
N. Semenov’s Institute of Chemical Physics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Victor V. Evreinov
- N.
N. Semenov’s Institute of Chemical Physics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Alexander R. Ivanov
- Barnett
Institute of Chemical and Biological Analysis, Department of Chemistry
and Chemical Biology, Northeastern University, Boston, Massachusetts 02115, United States
| | - Mikhail V. Gorshkov
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
- Moscow Institute of Physics and Technology (State University), 141707 Dolgoprudny, Moscow Region, Russia
| |
Collapse
|
33
|
Applications of Peptide Retention Time in Proteomic Data Analysis. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 845:67-75. [DOI: 10.1007/978-94-017-9523-4_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
34
|
Karch KR, Zee BM, Garcia BA. High resolution is not a strict requirement for characterization and quantification of histone post-translational modifications. J Proteome Res 2014; 13:6152-9. [PMID: 25325711 PMCID: PMC4261946 DOI: 10.1021/pr500902f] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
![]()
Mass
spectrometry (MS) is a powerful tool to accurately identify and quantify
histone post-translational modifications (PTMs). High-resolution mass
analyzers have been regarded as essential for these PTM analyses because
the mass accuracy afforded is sufficient to differentiate trimethylation
versus acetylation (42.0470 and 42.0106 Da, respectively), whereas
lower-resolution mass analyzers cannot. Noting this limitation, we
sought to determine whether lower-resolution detectors are nonetheless
adequate for histone PTM analysis by comparing the low-resolution
LTQ Velos Pro with the high-resolution LTQ-Orbitrap Velos Pro. We
first determined that the optimal scan mode on the LTQ Velos Pro is
the Enhanced scan mode with respect to apparent resolution, number
of MS and MS/MS scans per run, and reproducibility of label-free quantifications.
We next compared the performance of the LTQ Velos Pro to the LTQ-Orbitrap
Velos Pro using the same criteria for comparison, and we found that
the main difference is that the LTQ-Orbitrap Velos Pro is able to
resolve the difference between acetylation and trimethylation while
the LTQ Velos Pro cannot. However, using heavy isotope labeled synthetic
peptide standards and retention time information enables confident
assignment of these modifications and comparable quantification between
the instruments. Therefore, lower-resolution instruments can confidently
be utilized for histone PTM analysis.
Collapse
Affiliation(s)
- Kelly R Karch
- Epigenetics Program, Department of Biochemistry and Biophysics, Smilow Center for Translational Research, Perelman School of Medicine, University of Pennsylvania , 3400 Civic Center Boulevard, Building 421, Philadelphia, Pennsylvania 19104-5157, United States
| | | | | |
Collapse
|
35
|
Ivanov MV, Levitsky LI, Lobas AA, Panic T, Laskay ÜA, Mitulovic G, Schmid R, Pridatchenko ML, Tsybin YO, Gorshkov MV. Empirical Multidimensional Space for Scoring Peptide Spectrum Matches in Shotgun Proteomics. J Proteome Res 2014; 13:1911-20. [DOI: 10.1021/pr401026y] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Mark V. Ivanov
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Inststitutskii per., 9, Dolgoprudny 141700, Moscow region, Russia
| | - Lev I. Levitsky
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Inststitutskii per., 9, Dolgoprudny 141700, Moscow region, Russia
| | - Anna A. Lobas
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Inststitutskii per., 9, Dolgoprudny 141700, Moscow region, Russia
| | - Tanja Panic
- Medical University of Vienna, Spitalgasse 23, Vienna 1090, Austria
| | - Ünige A. Laskay
- Biomolecular
Mass Spectrometry Laboratory, Ecole Polytechnique Fédérale de Lausanne, 2 av. Forel, Lausanne 1015, Switzerland
| | - Goran Mitulovic
- Medical University of Vienna, Spitalgasse 23, Vienna 1090, Austria
| | - Rainer Schmid
- Medical University of Vienna, Spitalgasse 23, Vienna 1090, Austria
| | - Marina L. Pridatchenko
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Yury O. Tsybin
- Biomolecular
Mass Spectrometry Laboratory, Ecole Polytechnique Fédérale de Lausanne, 2 av. Forel, Lausanne 1015, Switzerland
| | - Mikhail V. Gorshkov
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Inststitutskii per., 9, Dolgoprudny 141700, Moscow region, Russia
| |
Collapse
|
36
|
Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014; 14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]
Abstract
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.
Collapse
Affiliation(s)
- Pieter Kelchtermans
- Department of Medical Protein Research, VIB, Ghent, Belgium; Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang, Mol, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Grigoryan M, Shamshurin D, Spicer V, Krokhin OV. Unifying Expression Scale for Peptide Hydrophobicity in Proteomic Reversed Phase High-Pressure Liquid Chromatography Experiments. Anal Chem 2013; 85:10878-86. [DOI: 10.1021/ac402310t] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Marine Grigoryan
- Manitoba Centre for Proteomics and Systems
Biology and ‡Department of Internal Medicine, University of Manitoba, 799 JBRC,
715 McDermot Avenue, Winnipeg, R3E 3P4, Canada
| | - Dmitry Shamshurin
- Manitoba Centre for Proteomics and Systems
Biology and ‡Department of Internal Medicine, University of Manitoba, 799 JBRC,
715 McDermot Avenue, Winnipeg, R3E 3P4, Canada
| | - Victor Spicer
- Manitoba Centre for Proteomics and Systems
Biology and ‡Department of Internal Medicine, University of Manitoba, 799 JBRC,
715 McDermot Avenue, Winnipeg, R3E 3P4, Canada
| | - Oleg V. Krokhin
- Manitoba Centre for Proteomics and Systems
Biology and ‡Department of Internal Medicine, University of Manitoba, 799 JBRC,
715 McDermot Avenue, Winnipeg, R3E 3P4, Canada
| |
Collapse
|
38
|
A new peptide retention time prediction method for mass spectrometry based proteomic analysis by a serial and parallel support vector machine model. Se Pu 2013; 30:857-63. [DOI: 10.3724/sp.j.1123.2012.06021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
39
|
Okawa S, Fischer B, Krijgsveld J. Properties of isotope patterns and their utility for peptide identification in large-scale proteomic experiments. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2013; 27:1067-1075. [PMID: 23592210 DOI: 10.1002/rcm.6551] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2012] [Revised: 02/14/2013] [Accepted: 02/17/2013] [Indexed: 06/02/2023]
Abstract
RATIONALE In proteomic experiments, isotope patterns are routinely generated for all detected peptides. While this pattern is determined by peptide composition, it has not been evaluated as a parameter that can help in the process of peptide identification. METHODS First, we investigated how the relative isotope abundance (RIA) accuracy in proteomic data sets depends on the spectral intensity, resolution, and the number of mass spectrometry (MS) 1 scans, using an Orbitrap Velos mass spectrometer. Next, we explored the discriminatory power of isotope patterns in the context of proteome analyses of various complexities, either alone or in combination with a Mascot database search. Finally, we provide a theoretical framework for the required accuracies of both peptide mass and RIA for peptide identification. RESULTS We demonstrate that the RIA error obtained in routine proteome analyses is 4-5%, and that this is only modestly influenced by spectral intensity, resolution, and the number of MS1 scans. While RIA alone has no discriminatory power, in a Mascot search isotope patterns can distinguish top scoring hits from runner-up hits in 70-95% of cases. Our theoretical approach shows that RIA accuracy needs to be ~0.2% in order to uniquely identify peptides in full proteomes. CONCLUSIONS Our results demonstrate that isotope patterns can have discriminatory power when used in combination with a classical database search. Inclusion of this parameter in proteomic workflows may help to increase confidence in peptide identification, but in practical terms this will be limited to small proteomes.
Collapse
Affiliation(s)
- Satoshi Okawa
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | | | | |
Collapse
|
40
|
Abstract
In bottom-up proteomics, proteins are typically identified by enzymatic digestion into peptides, tandem mass spectrometry and comparison of the tandem mass spectra with those predicted from a sequence database for peptides within measurement uncertainty from the experimentally obtained mass. Although now decreasingly common, isolated proteins or simple protein mixtures can also be identified by measuring only the masses of the peptides resulting from the enzymatic digest, without any further fragmentation. Separation methods such as liquid chromatography and electrophoresis are often used to fractionate complex protein or peptide mixtures prior to analysis by mass spectrometry. Although the primary reason for this is to avoid ion suppression and improve data quality, these separations are based on physical and chemical properties of the peptides or proteins and therefore also provide information about them. Depending on the separation method, this could be protein molecular weight (SDS-PAGE), isoelectric point (IEF), charge at a known pH (ion exchange chromatography), or hydrophobicity (reversed phase chromatography). These separations produce approximate measurements on properties that to some extent can be predicted from amino acid sequences. In the case of molecular weight of proteins without posttranslational modifications this is straightforward: simply add the molecular weights of the amino acid residues in the protein. For IEF, charge and hydrophobicity, the order of the amino acids, and folding state of the peptide or protein also matter, but it is nevertheless possible to predict the behavior of peptides and proteins in these separation methods to a degree which renders such predictions useful. This chapter reviews the topic of using data from separation methods for identification and validation in proteomics, with special emphasis on predicting retention times of tryptic peptides in reversed-phase chromatography under acidic conditions, as this is one of the most commonly used separation methods in proteomics.
Collapse
Affiliation(s)
- Alex A Henneman
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, The Netherlands
| | | |
Collapse
|
41
|
Abstract
Selected reaction monitoring (SRM) has a long history of use in the area of quantitative MS. In recent years, the approach has seen increased application to quantitative proteomics, facilitating multiplexed relative and absolute quantification studies in a variety of organisms. This article discusses SRM, after introducing the context of quantitative proteomics (specifically primarily absolute quantification) where it finds most application, and considers topics such as the theory and advantages of SRM, the selection of peptide surrogates for protein quantification, the design of optimal SRM co-ordinates and the handling of SRM data. A number of published studies are also discussed to demonstrate the impact that SRM has had on the field of quantitative proteomics.
Collapse
|
42
|
Wan C, Liu J, Fong V, Lugowski A, Stoilova S, Bethune-Waddell D, Borgeson B, Havugimana PC, Marcotte EM, Emili A. ComplexQuant: high-throughput computational pipeline for the global quantitative analysis of endogenous soluble protein complexes using high resolution protein HPLC and precision label-free LC/MS/MS. J Proteomics 2012; 81:102-11. [PMID: 23063720 DOI: 10.1016/j.jprot.2012.10.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Revised: 10/01/2012] [Accepted: 10/04/2012] [Indexed: 12/29/2022]
Abstract
The experimental isolation and characterization of stable multi-protein complexes are essential to understanding the molecular systems biology of a cell. To this end, we have developed a high-throughput proteomic platform for the systematic identification of native protein complexes based on extensive fractionation of soluble protein extracts by multi-bed ion exchange high performance liquid chromatography (IEX-HPLC) combined with exhaustive label-free LC/MS/MS shotgun profiling. To support these studies, we have built a companion data analysis software pipeline, termed ComplexQuant. Proteins present in the hundreds of fractions typically collected per experiment are first identified by exhaustively interrogating MS/MS spectra using multiple database search engines within an integrative probabilistic framework, while accounting for possible post-translation modifications. Protein abundance is then measured across the fractions based on normalized total spectral counts and precursor ion intensities using a dedicated tool, PepQuant. This analysis allows co-complex membership to be inferred based on the similarity of extracted protein co-elution profiles. Each computational step has been optimized for processing large-scale biochemical fractionation datasets, and the reliability of the integrated pipeline has been benchmarked extensively. This article is part of a Special Issue entitled: From protein structures to clinical applications.
Collapse
Affiliation(s)
- Cuihong Wan
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, Ontario, Canada M5S 3E1
| | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Yang C, He Z, Yang C, Yu W. Peptide reranking with protein-peptide correspondence and precursor peak intensity information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1212-1219. [PMID: 22350209 DOI: 10.1109/tcbb.2012.29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Searching tandem mass spectra against a protein database has been a mainstream method for peptide identification. Improving peptide identification results by ranking true Peptide-Spectrum Matches (PSMs) over their false counterparts leads to the development of various reranking algorithms. In peptide reranking, discriminative information is essential to distinguish true PSMs from false PSMs. Generally, most peptide reranking methods obtain discriminative information directly from database search scores or by training machine learning models. Information in the protein database and MS1 spectra (i.e., single stage MS spectra) is ignored. In this paper, we propose to use information in the protein database and MS1 spectra to rerank peptide identification results. To quantitatively analyze their effects to peptide reranking results, three peptide reranking methods are proposed: PPMRanker, PPIRanker, and MIRanker. PPMRanker only uses Protein-Peptide Map (PPM) information from the protein database, PPIRanker only uses Precursor Peak Intensity (PPI) information, and MIRanker employs both PPM information and PPI information. According to our experiments on a standard protein mixture data set, a human data set and a mouse data set, PPMRanker and MIRanker achieve better peptide reranking results than PetideProphet, PeptideProphet+NSP (number of sibling peptides) and a score regularization method SRPI. The source codes of PPMRanker, PPIRanker, and MIRanker, and all supplementary documents are available at our website: http://bioinformatics.ust.hk/pepreranking/. Alternatively, these documents can also be downloaded from: http://sourceforge.net/projects/pepreranking/.
Collapse
Affiliation(s)
- Chao Yang
- The Hong Kong University of Science and Technology, RM B007D, University Apartment Tower B, Clear Water Bay, Kowloon, Hong Kong.
| | | | | | | |
Collapse
|
44
|
Krokhin O. Peptide retention prediction in reversed-phase chromatography: proteomic applications. Expert Rev Proteomics 2012; 9:1-4. [PMID: 22292816 DOI: 10.1586/epr.11.79] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
45
|
Moruz L, Staes A, Foster JM, Hatzou M, Timmerman E, Martens L, Käll L. Chromatographic retention time prediction for posttranslationally modified peptides. Proteomics 2012; 12:1151-9. [DOI: 10.1002/pmic.201100386] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Luminita Moruz
- Science for Life Laboratory, Department of Biochemistry and Biophysics; Stockholm University; Solna Sweden
- Stockholm Bioinformatics Center; Stockholm University; Solna Sweden
| | - An Staes
- Department of Medical Protein Research; VIB; Ghent Belgium
- Department of Biochemistry; Ghent University; Ghent Belgium
| | - Joseph M. Foster
- EMBL Outstation, European Bioinformatics Institute; Wellcome Trust Genome Campus; Hinxton Cambridge UK
| | - Maria Hatzou
- Science for Life Laboratory, Department of Biochemistry and Biophysics; Stockholm University; Solna Sweden
| | - Evy Timmerman
- Department of Medical Protein Research; VIB; Ghent Belgium
- Department of Biochemistry; Ghent University; Ghent Belgium
| | - Lennart Martens
- Department of Medical Protein Research; VIB; Ghent Belgium
- Department of Biochemistry; Ghent University; Ghent Belgium
| | - Lukas Käll
- Stockholm Bioinformatics Center; Stockholm University; Solna Sweden
- Science for Life Laboratory, School of Biotechnology; Royal Institute of Technology (KTH); Solna Sweden
| |
Collapse
|
46
|
Gallien S, Peterman S, Kiyonami R, Souady J, Duriez E, Schoen A, Domon B. Highly multiplexed targeted proteomics using precise control of peptide retention time. Proteomics 2012; 12:1122-33. [DOI: 10.1002/pmic.201100533] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Sebastien Gallien
- Luxembourg Clinical Proteomics center (LCP); Centre de Recherche Public de la Santé; Strassen Luxembourg
| | | | | | - Jamal Souady
- Luxembourg Clinical Proteomics center (LCP); Centre de Recherche Public de la Santé; Strassen Luxembourg
| | - Elodie Duriez
- Luxembourg Clinical Proteomics center (LCP); Centre de Recherche Public de la Santé; Strassen Luxembourg
| | | | - Bruno Domon
- Luxembourg Clinical Proteomics center (LCP); Centre de Recherche Public de la Santé; Strassen Luxembourg
| |
Collapse
|
47
|
On the utility of predictive chromatography to complement mass spectrometry based intact protein identification. Anal Bioanal Chem 2011; 402:2521-9. [PMID: 21901462 DOI: 10.1007/s00216-011-5350-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2011] [Revised: 07/22/2011] [Accepted: 08/19/2011] [Indexed: 10/17/2022]
Abstract
The amino acid sequence determines the individual protein three-dimensional structure and its functioning in an organism. Therefore, "reading" a protein sequence and determining its changes due to mutations or post-translational modifications is one of the objectives of proteomic experiments. The commonly utilized approach is gradient high-performance liquid chromatography (HPLC) in combination with tandem mass spectrometry. While serving as a way to simplify the protein mixture, the liquid chromatography may be an additional analytical tool providing complementary information about the protein structure. Previous attempts to develop "predictive" HPLC for large biomacromolecules were limited by empirically derived equations based purely on the adsorption mechanisms of the retention and applicable to relatively small polypeptide molecules. A mechanism of the large biomacromolecule retention in reversed-phase gradient HPLC was described recently in thermodynamics terms by the analytical model of liquid chromatography at critical conditions (BioLCCC). In this work, we applied the BioLCCC model to predict retention of the intact proteins as well as their large proteolytic peptides separated under different HPLC conditions. The specific aim of these proof-of-principle studies was to demonstrate the feasibility of using "predictive" HPLC as a complementary tool to support the analysis of identified intact proteins in top-down, middle-down, and/or targeted selected reaction monitoring (SRM)-based proteomic experiments.
Collapse
|
48
|
Shamshurin D, Spicer V, Krokhin OV. Defining intrinsic hydrophobicity of amino acids’ side chains in random coil conformation. Reversed-phase liquid chromatography of designed synthetic peptides vs. random peptide data sets. J Chromatogr A 2011; 1218:6348-55. [DOI: 10.1016/j.chroma.2011.06.092] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2011] [Revised: 06/21/2011] [Accepted: 06/27/2011] [Indexed: 11/25/2022]
|
49
|
Cao W, Ma D, Kapur A, Patankar MS, Ma Y, Li L. RT-SVR+q: a strategy for post-Mascot analysis using retention time and q value metric to improve peptide and protein identifications. J Proteomics 2011; 75:480-90. [PMID: 21888997 DOI: 10.1016/j.jprot.2011.08.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Revised: 07/31/2011] [Accepted: 08/17/2011] [Indexed: 10/17/2022]
Abstract
Shotgun proteomics commonly utilizes database search like Mascot to identify proteins from tandem MS/MS spectra. False discovery rate (FDR) is often used to assess the confidence of peptide identifications. However, a widely accepted FDR of 1% sacrifices the sensitivity of peptide identification while improving the accuracy. This article details a machine learning approach combining retention time based support vector regressor (RT-SVR) with q value based statistical analysis to improve peptide and protein identifications with high sensitivity and accuracy. The use of confident peptide identifications as training examples and careful feature selection ensures high R values (>0.900) for all models. The application of RT-SVR model on Mascot results (p=0.10) increases the sensitivity of peptide identifications. q Value, as a function of deviation between predicted and experimental RTs (ΔRT), is used to assess the significance of peptide identifications. We demonstrate that the peptide and protein identifications increase by up to 89.4% and 83.5%, respectively, for a specified q value of 0.01 when applying the method to proteomic analysis of the natural killer leukemia cell line (NKL). This study establishes an effective methodology and provides a platform for profiling confident proteomes in more relevant species as well as a future investigation of accurate protein quantification.
Collapse
Affiliation(s)
- Weifeng Cao
- Department of Chemistry, University of Wisconsin-Madison, 777 Highland Ave., Madison, WI 53705, USA.
| | | | | | | | | | | |
Collapse
|
50
|
Perez-Riverol Y, Sánchez A, Ramos Y, Schmidt A, Müller M, Betancourt L, González LJ, Vera R, Padron G, Besada V. In silico analysis of accurate proteomics, complemented by selective isolation of peptides. J Proteomics 2011; 74:2071-82. [PMID: 21658481 DOI: 10.1016/j.jprot.2011.05.034] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2011] [Revised: 05/06/2011] [Accepted: 05/22/2011] [Indexed: 01/28/2023]
Abstract
Protein identification by mass spectrometry is mainly based on MS/MS spectra and the accuracy of molecular mass determination. However, the high complexity and dynamic ranges for any species of proteomic samples, surpass the separation capacity and detection power of the most advanced multidimensional liquid chromatographs and mass spectrometers. Only a tiny portion of signals is selected for MS/MS experiments and a still considerable number of them do not provide reliable peptide identification. In this article, an in silico analysis for a novel methodology of peptides and proteins identification is described. The approach is based on mass accuracy, isoelectric point (pI), retention time (t(R)) and N-terminal amino acid determination as protein identification criteria regardless of high quality MS/MS spectra. When the methodology was combined with the selective isolation methods, the number of unique peptides and identified proteins increases. Finally, to demonstrate the feasibility of the methodology, an OFFGEL-LC-MS/MS experiment was also implemented. We compared the more reliable peptide identified with MS/MS information, and peptide identified with three experimental features (pI, t(R), molecular mass). Also, two theoretical assumptions from MS/MS identification (selective isolation of peptides and N-terminal amino acid) were analyzed. Our results show that using the information provided by these features and selective isolation methods we could found the 93% of the high confidence protein identified by MS/MS with false-positive rate lower than 5%.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- Department of Proteomics, Center for Genetic Engineering and Biotechnology, Cubanacán, Playa, Ciudad de la Habana, Cuba.
| | | | | | | | | | | | | | | | | | | |
Collapse
|