1
|
Ivanov MV, Kopeykina AS, Gorshkov MV. Reanalysis of DIA Data Demonstrates the Capabilities of MS/MS-Free Proteomics to Reveal New Biological Insights in Disease-Related Samples. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024. [PMID: 38938158 DOI: 10.1021/jasms.4c00134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
Data-independent acquisition (DIA) at the shortened data acquisition time is becoming a method of choice for quantitative proteomic applications requiring high throughput analysis of large cohorts of samples. With the advent of the combination of high resolution mass spectrometry with an asymmetric track lossless analyzer, these DIA capabilities were further extended with the recent demonstration of quantitative analyses at the speed of up to hundreds of samples per day. In particular, the proteomic data for the brain samples related to multiple system atrophy disease were acquired using 7 and 28 min chromatography gradients (Guzman et al., Nat. Biotech. 2024). In this work, we applied the recently introduced DirectMS1 method to reanalysis of these data using only MS1 spectra. Both DirectMS1 and DIA results were matched against long gradient DDA analysis from the earlier study of the same sample cohort. While the quantitation efficiency of DirectMS1 was comparable with DIA on the same data sets, we found an additional five proteins of biological significance relevant to the analyzed tissue samples. Among the findings, DirectMS1 was able to detect decreased caspase activity for Vimentin protein in the multiple system atrophy samples missed by the MS/MS-based quantitation methods. Our study suggests that DirectMS1 can be an efficient MS1-only addition to the analysis of DIA data in high-throughput quantitative proteomic studies.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Anna S Kopeykina
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| |
Collapse
|
2
|
Kuhnen G, Class LC, Badekow S, Hanisch KL, Rohn S, Kuballa J. Python workflow for the selection and identification of marker peptides-proof-of-principle study with heated milk. Anal Bioanal Chem 2024; 416:3349-3360. [PMID: 38607384 PMCID: PMC11106092 DOI: 10.1007/s00216-024-05286-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 03/26/2024] [Accepted: 04/02/2024] [Indexed: 04/13/2024]
Abstract
The analysis of almost holistic food profiles has developed considerably over the last years. This has also led to larger amounts of data and the ability to obtain more information about health-beneficial and adverse constituents in food than ever before. Especially in the field of proteomics, software is used for evaluation, and these do not provide specific approaches for unique monitoring questions. An additional and more comprehensive way of evaluation can be done with the programming language Python. It offers broad possibilities by a large ecosystem for mass spectrometric data analysis, but needs to be tailored for specific sets of features, the research questions behind. It also offers the applicability of various machine-learning approaches. The aim of the present study was to develop an algorithm for selecting and identifying potential marker peptides from mass spectrometric data. The workflow is divided into three steps: (I) feature engineering, (II) chemometric data analysis, and (III) feature identification. The first step is the transformation of the mass spectrometric data into a structure, which enables the application of existing data analysis packages in Python. The second step is the data analysis for selecting single features. These features are further processed in the third step, which is the feature identification. The data used exemplarily in this proof-of-principle approach was from a study on the influence of a heat treatment on the milk proteome/peptidome.
Collapse
Affiliation(s)
- Gesine Kuhnen
- GALAB Laboratories GmbH, Am Schleusengraben 7, 21029, Hamburg, Germany
- Department of Food Chemistry and Analysis, Institute of Food Technology and Food Chemistry, Technical University Berlin, Gustav Meyer Allee 25, 13355, Berlin, Germany
| | - Lisa-Carina Class
- GALAB Laboratories GmbH, Am Schleusengraben 7, 21029, Hamburg, Germany
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
| | - Svenja Badekow
- GALAB Laboratories GmbH, Am Schleusengraben 7, 21029, Hamburg, Germany
| | - Kim Lara Hanisch
- GALAB Laboratories GmbH, Am Schleusengraben 7, 21029, Hamburg, Germany
| | - Sascha Rohn
- Department of Food Chemistry and Analysis, Institute of Food Technology and Food Chemistry, Technical University Berlin, Gustav Meyer Allee 25, 13355, Berlin, Germany
| | - Jürgen Kuballa
- GALAB Laboratories GmbH, Am Schleusengraben 7, 21029, Hamburg, Germany.
| |
Collapse
|
3
|
Strauss MT, Bludau I, Zeng WF, Voytik E, Ammar C, Schessner JP, Ilango R, Gill M, Meier F, Willems S, Mann M. AlphaPept: a modern and open framework for MS-based proteomics. Nat Commun 2024; 15:2168. [PMID: 38461149 PMCID: PMC10924963 DOI: 10.1038/s41467-024-46485-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/20/2024] [Indexed: 03/11/2024] Open
Abstract
In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making efficient analysis a principal challenge. A plethora of different computational tools can process the MS data to derive peptide and protein identification and quantification. However, during the last years there has been dramatic progress in computer science, including collaboration tools that have transformed research and industry. To leverage these advances, we develop AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Numba for just-in-time compilation on CPU and GPU achieves hundred-fold speed improvements. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while accessing the latest advances. We provide an easy on-ramp for community contributions through the concept of literate programming, implemented in Jupyter Notebooks. Large datasets can rapidly be processed as shown by the analysis of hundreds of proteomes in minutes per file, many-fold faster than acquisition. AlphaPept can be used to build automated processing pipelines with web-serving functionality and compatibility with downstream analysis tools. It provides easy access via one-click installation, a modular Python library for advanced users, and via an open GitHub repository for developers.
Collapse
Affiliation(s)
- Maximilian T Strauss
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Isabell Bludau
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Wen-Feng Zeng
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Eugenia Voytik
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Constantin Ammar
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Julia P Schessner
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | | | | | - Florian Meier
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
- Functional Proteomics, Jena University Hospital, Jena, Germany
| | - Sander Willems
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Matthias Mann
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
4
|
Gorshkov V, Kjeldsen F. Exploiting Charge State Distribution To Probe Intramolecular Interactions in Gas-Phase Phosphopeptides and Enhance Proteomics Analyses. Anal Chem 2024; 96:1167-1177. [PMID: 38183295 DOI: 10.1021/acs.analchem.3c04270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2024]
Abstract
Charging of analytes is a prerequisite for performing mass spectrometry analysis. In proteomics, electrospray ionization is the dominant technique for this process. Although the observation of differences in the peptide charge state distribution (CSD) is well-known among experimentalists, its analytical value remains underexplored. To investigate the utility of this dimension, we analyzed several public data sets, comprising over 250,000 peptide CSD profiles from the human proteome. We found that the dimensions of the CSD demonstrate high reproducibility across multiple laboratories, mass analyzers, and extensive time intervals. The general observation was that the CSD enabled effective partitioning of the peptide property space, resulting in enhanced discrimination between sequence and constitutional peptide isomers. Next, by evaluating the CSD values of phosphorylated peptides, we were able to differentiate between phosphopeptides that indicate the formation of intramolecular structures in the gas phase and those that do not. The reproducibility of the CSD values (mean cosine similarity above 0.97 for most of the experiments) qualified CSD data suitable to train a deep-learning model capable of accurately predicting CSD values (mean cosine similarity - 0.98). When we applied the CSD dimension to MS1- and MS2-based proteomics experiments, we consistently observed around a 5% increase in protein and peptide identification rate. Even though the CSD dimension is not as effective a discriminator as the widely used retention time dimension, it still holds the potential for application in direct infusion proteomics.
Collapse
Affiliation(s)
- Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| |
Collapse
|
5
|
Ivanov MV, Garibova LA, Postoenko VI, Levitsky LI, Gorshkov MV. On the excessive use of coefficient of variation as a metric of quantitation quality in proteomics. Proteomics 2024; 24:e2300090. [PMID: 37496303 DOI: 10.1002/pmic.202300090] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/05/2023] [Accepted: 07/18/2023] [Indexed: 07/28/2023]
Abstract
The coefficient of variation (CV) is often used in proteomics as a proxy to characterize the performance of a quantitation method and/or the related software. In this note, we question the excessive reliance on this metric in quantitative proteomics that may result in erroneous conclusions. We support this note using a ground-truth Human-Yeast-E. coli dataset demonstrating in a number of cases that erroneous data processing methods may lead to a low CV which has nothing to do with these methods' performances in quantitation.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| | - Leyla A Garibova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| | - Valeriy I Postoenko
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| | - Lev I Levitsky
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
6
|
Postoenko VI, Garibova LA, Levitsky LI, Bubis JA, Gorshkov MV, Ivanov MV. IQMMA: Efficient MS1 Intensity Extraction Pipeline Using Multiple Feature Detection Algorithms for DDA Proteomics. J Proteome Res 2023; 22:2827-2835. [PMID: 37579078 DOI: 10.1021/acs.jproteome.3c00075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/16/2023]
Abstract
One of the key steps in data dependent acquisition (DDA) proteomics is detection of peptide isotopic clusters, also called "features", in MS1 spectra and matching them to MS/MS-based peptide identifications. A number of peptide feature detection tools became available in recent years, each relying on its own matching algorithm. Here, we provide an integrated solution, the intensity-based Quantitative Mix and Match Approach (IQMMA), which integrates a number of untargeted peptide feature detection algorithms and returns the most probable intensity values for the MS/MS-based identifications. IQMMA was tested using available proteomic data acquired for both well-characterized (ground truth) and real-world biological samples, including a mix of Yeast and E. coli digests spiked at different concentrations into the Human K562 digest used as a background, and a set of glioblastoma cell lines. Three open-source feature detection algorithms were integrated: Dinosaur, biosaur2, and OpenMS FeatureFinder. None of them was found optimal when applied individually to all the data sets employed in this work; however, their combined use in IQMMA improved efficiency of subsequent protein quantitation. The software implementing IQMMA is freely available at https://github.com/PostoenkoVI/IQMMA under Apache 2.0 license.
Collapse
Affiliation(s)
- Valeriy I Postoenko
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
- Moscow Institute of Physics and Technology, National Research University, G. Dolgoprudny, Institutsky Lane 9, Dolgoprudny 141701, Russia
| | - Leyla A Garibova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
- Moscow Institute of Physics and Technology, National Research University, G. Dolgoprudny, Institutsky Lane 9, Dolgoprudny 141701, Russia
| | - Lev I Levitsky
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| |
Collapse
|
7
|
Penanes P, Gorshkov V, Ivanov MV, Gorshkov MV, Kjeldsen F. Potential of Negative-Ion-Mode Proteomics: An MS1-Only Approach. J Proteome Res 2023; 22:2734-2742. [PMID: 37395192 PMCID: PMC10407931 DOI: 10.1021/acs.jproteome.3c00307] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Indexed: 07/04/2023]
Abstract
Current proteomics approaches rely almost exclusively on using the positive ionization mode, resulting in inefficient ionization of many acidic peptides. This study investigates protein identification efficiency in the negative ionization mode using the DirectMS1 method. DirectMS1 is an ultrafast data acquisition method based on accurate peptide mass measurements and predicted retention times. Our method achieves the highest rate of protein identification in the negative ion mode to date, identifying over 1000 proteins in a human cell line at a 1% false discovery rate. This is accomplished using a single-shot 10 min separation gradient, comparable to lengthy MS/MS-based analyses. Optimizing separation and experimental conditions was achieved by utilizing mobile buffers containing 2.5 mM imidazole and 3% isopropanol. The study emphasized the complementary nature of data obtained in positive and negative ion modes. Combining the results from all replicates in both polarities increased the number of identified proteins to 1774. Additionally, we analyzed the method's efficiency using different proteases for protein digestion. Among the four studied proteases (LysC, GluC, AspN, and trypsin), trypsin and LysC demonstrated the highest protein identification yield. This suggests that digestion procedures utilized in positive-mode proteomics can be effectively applied in the negative ion mode. Data are deposited to ProteomeXchange: PXD040583.
Collapse
Affiliation(s)
- Pelayo
A. Penanes
- Department
of Biochemistry and Molecular Biology, University
of Southern Denmark, DK-5230 Odense M, Denmark
| | - Vladimir Gorshkov
- Department
of Biochemistry and Molecular Biology, University
of Southern Denmark, DK-5230 Odense M, Denmark
| | - Mark V. Ivanov
- V.
L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical
Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Mikhail V. Gorshkov
- V.
L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical
Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Frank Kjeldsen
- Department
of Biochemistry and Molecular Biology, University
of Southern Denmark, DK-5230 Odense M, Denmark
| |
Collapse
|
8
|
Solovyeva EM, Bubis JA, Tarasova IA, Lobas AA, Ivanov MV, Nazarov AA, Shutkov IA, Gorshkov MV. On the Feasibility of Using an Ultra-Fast DirectMS1 Method of Proteome-Wide Analysis for Searching Drug Targets in Chemical Proteomics. BIOCHEMISTRY. BIOKHIMIIA 2022; 87:1342-1353. [PMID: 36509723 DOI: 10.1134/s000629792211013x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein quantitation in tissue cells or physiological fluids based on liquid chromatography/mass spectrometry is one of the key sources of information on the mechanisms of cell functioning during chemotherapeutic treatment. Information on significant changes in protein expression upon treatment can be obtained by chemical proteomics and requires analysis of the cellular proteomes, as well as development of experimental and bioinformatic methods for identification of the drug targets. Low throughput of whole proteome analysis based on liquid chromatography and tandem mass spectrometry is one of the main factors limiting the scale of these studies. The method of direct mass spectrometric identification of proteins, DirectMS1, is one of the approaches developed in recent years allowing ultrafast proteome-wide analyses employing minute-scale gradients for separation of proteolytic mixtures. Aim of this work was evaluation of both possibilities and limitations of the method for identification of drug targets at the level of whole proteome and for revealing cellular processes activated by the treatment. Particularly, the available literature data on chemical proteomics obtained earlier for a large set of onco-pharmaceuticals using multiplex quantitative proteome profiling were analyzed. The results obtained were further compared with the proteome-wide data acquired by the DirectMS1 method using ultrashort separation gradients to evaluate efficiency of the method in identifying known drug targets. Using ovarian cancer cell line A2780 as an example, a whole-proteome comparison of two cell lysis techniques was performed, including the freeze-thaw lysis commonly employed in chemical proteomics and the one based on ultrasonication for cell disruption, which is the widely accepted as a standard in proteomic studies. Also, the proteome-wide profiling was performed using ultrafast DirectMS1 method for A2780 cell line treated with lonidamine, followed by gene ontology analyses to evaluate capabilities of the method in revealing regulation of proteins in the cellular processes associated with drug treatment.
Collapse
Affiliation(s)
- Elizaveta M Solovyeva
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Irina A Tarasova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Anna A Lobas
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Alexey A Nazarov
- Faculty of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Ilya A Shutkov
- Faculty of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia.
| |
Collapse
|
9
|
Ivanov MV, Bubis JA, Gorshkov V, Tarasova IA, Levitsky LI, Solovyeva EM, Lipatova AV, Kjeldsen F, Gorshkov MV. DirectMS1Quant: Ultrafast Quantitative Proteomics with MS/MS-Free Mass Spectrometry. Anal Chem 2022; 94:13068-13075. [PMID: 36094425 DOI: 10.1021/acs.analchem.2c02255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Recently, we presented the DirectMS1 method of ultrafast proteome-wide analysis based on minute-long LC gradients and MS1-only mass spectra acquisition. Currently, the method provides the depth of human cell proteome coverage of 2500 proteins at a 1% false discovery rate (FDR) when using 5 min LC gradients and 7.3 min runtime in total. While the standard MS/MS approaches provide 4000-5000 protein identifications within a couple of hours of instrumentation time, we advocate here that the higher number of identified proteins does not always translate into better quantitation quality of the proteome analysis. To further elaborate on this issue, we performed a one-on-one comparison of quantitation results obtained using DirectMS1 with three popular MS/MS-based quantitation methods: label-free (LFQ) and tandem mass tag quantitation (TMT), both based on data-dependent acquisition (DDA) and data-independent acquisition (DIA). For comparison, we performed a series of proteome-wide analyses of well-characterized (ground truth) and biologically relevant samples, including a mix of UPS1 proteins spiked at different concentrations into an Echerichia coli digest used as a background and a set of glioblastoma cell lines. MS1-only data was analyzed using a novel quantitation workflow called DirectMS1Quant developed in this work. The results obtained in this study demonstrated comparable quantitation efficiency of 5 min DirectMS1 with both TMT and DIA methods, yet the latter two utilized a 10-20-fold longer instrumentation time.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Irina A Tarasova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Lev I Levitsky
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Elizaveta M Solovyeva
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Anastasiya V Lipatova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| |
Collapse
|
10
|
Samukhina YV, Matyushin DD, Grinevich OI, Buryak AK. A Deep Convolutional Neural Network for Prediction of Peptide Collision Cross Sections in Ion Mobility Spectrometry. Biomolecules 2021; 11:1904. [PMID: 34944547 PMCID: PMC8699202 DOI: 10.3390/biom11121904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/13/2021] [Accepted: 12/17/2021] [Indexed: 11/26/2022] Open
Abstract
Most frequently, the identification of peptides in mass spectrometry-based proteomics is carried out using high-resolution tandem mass spectrometry. In order to increase the accuracy of analysis, additional information on the peptides such as chromatographic retention time and collision cross section in ion mobility spectrometry can be used. An accurate prediction of the collision cross section values allows erroneous candidates to be rejected using a comparison of the observed values and the predictions based on the amino acids sequence. Recently, a massive high-quality data set of peptide collision cross sections was released. This opens up an opportunity to apply the most sophisticated deep learning techniques for this task. Previously, it was shown that a recurrent neural network allows for predicting these values accurately. In this work, we present a deep convolutional neural network that enables us to predict these values more accurately compared with previous studies. We use a neural network with complex architecture that contains both convolutional and fully connected layers and comprehensive methods of converting a peptide to multi-channel 1D spatial data and vector. The source code and pre-trained model are available online.
Collapse
Affiliation(s)
| | - Dmitriy D. Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia; (Y.V.S.); (O.I.G.); (A.K.B.)
| | | | | |
Collapse
|
11
|
Ivanov MV, Solovyeva EM, Bubis JA, Gorshkov MV. Improving the Protein Inference from Bottom-Up Proteomic Data Using Identifications from MS1 Spectra. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1258-1262. [PMID: 33900766 DOI: 10.1021/jasms.1c00061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Protein inference is one of the crucial steps in proteome characterization using a bottom-up approach. Multiple algorithms to solve the problem are focused on extensive analysis of shared peptides identified from fragmentation mass spectra (MS/MS). However, many protein homologues with a similar amino acid sequence typically have identical lists of identified peptides due to the problem of proteome undersampling in a bottom-up approach and, thus, cannot be distinguished by existing protein inference methods. Here, we propose the use of peptide feature information extracted from precursor mass spectra to assist in identification of proteins otherwise indistinguishable from MS/MS. The proposed method was integrated with a protein inference algorithm based on the parsimony principle and built-in in the postsearch utility Scavager. The results demonstrate increasing accuracy and efficiency of homologous protein identifications for the well characterized data sets including the one with known protein sequences from iPRG-2016 study.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Building 2, Moscow 119334, Russia
| | - Elizaveta M Solovyeva
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Building 2, Moscow 119334, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Building 2, Moscow 119334, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Building 2, Moscow 119334, Russia
| |
Collapse
|
12
|
Ivanov MV, Bubis JA, Gorshkov V, Abdrakhimov DA, Kjeldsen F, Gorshkov MV. Boosting MS1-only Proteomics with Machine Learning Allows 2000 Protein Identifications in Single-Shot Human Proteome Analysis Using 5 min HPLC Gradient. J Proteome Res 2021; 20:1864-1873. [PMID: 33720732 DOI: 10.1021/acs.jproteome.0c00863] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Proteome-wide analyses rely on tandem mass spectrometry and the extensive separation of proteolytic mixtures. This imposes considerable instrumental time consumption, which is one of the main obstacles in the broader acceptance of proteomics in biomedical and clinical research. Recently, we presented a fast proteomic method termed DirectMS1 based on ultrashort LC gradients as well as MS1-only mass spectra acquisition and data processing. The method allows significant reduction of the proteome-wide analysis time to a few minutes at the depth of quantitative proteome coverage of 1000 proteins at 1% false discovery rate (FDR). In this work, to further increase the capabilities of the DirectMS1 method, we explored the opportunities presented by the recent progress in the machine-learning area and applied the LightGBM decision tree boosting algorithm to the scoring of peptide feature matches when processing MS1 spectra. Furthermore, we integrated the peptide feature identification algorithm of DirectMS1 with the recently introduced peptide retention time prediction utility, DeepLC. Additional approaches to improve the performance of the DirectMS1 method are discussed and demonstrated, such as using FAIMS for gas-phase ion separation. As a result of all improvements to DirectMS1, we succeeded in identifying more than 2000 proteins at 1% FDR from the HeLa cell line in a 5 min gradient LC-FAIMS/MS1 analysis. The data sets generated and analyzed during the current study have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier PXD023977.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Daniil A Abdrakhimov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia.,Moscow Institute of Physics and Technology, Institutsky lane 9, Dolgoprudny, Moscow Region 141700, Russia
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| |
Collapse
|