1
|
Ivanov MV, Kopeykina AS, Kazakova EM, Tarasova IA, Sun Z, Postoenko VI, Yang J, Gorshkov MV. Modified Decision Tree with Custom Splitting Logic Improves Generalization across Multiple Brains' Proteomic Data Sets of Alzheimer's Disease. J Proteome Res 2025. [PMID: 39984290 DOI: 10.1021/acs.jproteome.4c00677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2025]
Abstract
Many factors negatively affect a generalization of the findings in discovery proteomics. They include differentiation between patient cohorts, a variety of experimental conditions, etc. We presented a machine-learning-based workflow for proteomics data analysis, aiming at improving generalizability across multiple data sets. In particular, we customized the decision tree model by introducing a new parameter, min_groups_leaf, which regulates the presence of the samples from each data set inside the model's leaves. Further, we analyzed a trend for the feature importance's curve as a function of the novel parameter for feature selection to a list of proteins with significantly improved generalization. The developed workflow was tested using five proteomic data sets obtained for post-mortem human brain samples of Alzheimer's disease. The data sets consisted of 535 LC-MS/MS acquisition files. The results were obtained for two different pipelines of data processing: (1) MS1-only processing based on DirectMS1 search engine and (2) a standard MS/MS-based one. Using the developed workflow, we found seven proteins with expression patterns that were unique for asymptomatic Alzheimer patients. Two of them, Serotransferrin TRFE and DNA repair nuclease APEX1, may be potentially important for explaining the lack of dementia in patients with the presence of neuritic plaques and neurofibrillary tangles.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Anna S Kopeykina
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Elizaveta M Kazakova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Irina A Tarasova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Zhao Sun
- Clinical Systems Biology Key Laboratory, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Institute of Infection and Immunity, Henan Academy of Innovations in Medical Science, Zhengzhou 450052, China
| | - Valeriy I Postoenko
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Jinghua Yang
- Clinical Systems Biology Key Laboratory, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Institute of Infection and Immunity, Henan Academy of Innovations in Medical Science, Zhengzhou 450052, China
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| |
Collapse
|
2
|
Nawaz MA, Pamirsky IE, Golokhvast KS. Bioinformatics in Russia: history and present-day landscape. Brief Bioinform 2024; 25:bbae513. [PMID: 39402695 PMCID: PMC11473191 DOI: 10.1093/bib/bbae513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 08/12/2024] [Accepted: 10/01/2024] [Indexed: 10/19/2024] Open
Abstract
Bioinformatics has become an interdisciplinary subject due to its universal role in molecular biology research. The current status of Russia's bioinformatics research in Russia is not known. Here, we review the history of bioinformatics in Russia, present the current landscape, and highlight future directions and challenges. Bioinformatics research in Russia is driven by four major industries: information technology, pharmaceuticals, biotechnology, and agriculture. Over the past three decades, despite a delayed start, the field has gained momentum, especially in protein and nucleic acid research. Dedicated and shared centers for genomics, proteomics, and bioinformatics are active in different regions of Russia. Present-day bioinformatics in Russia is characterized by research issues related to genetics, metagenomics, OMICs, medical informatics, computational biology, environmental informatics, and structural bioinformatics. Notable developments are in the fields of software (tools, algorithms, and pipelines), use of high computation power (e.g. by the Siberian Supercomputer Center), and large-scale sequencing projects (the sequencing of 100 000 human genomes). Government funding is increasing, policies are being changed, and a National Genomic Information Database is being established. An increased focus on eukaryotic genome sequencing, the development of a common place for developers and researchers to share tools and data, and the use of biological modeling, machine learning, and biostatistics are key areas for future focus. Universities and research institutes have started to implement bioinformatics modules. A critical mass of bioinformaticians is essential to catch up with the global pace in the discipline.
Collapse
Affiliation(s)
- Muhammad A Nawaz
- Advanced Engineering School (Agrobiotek), National Research Tomsk State University, Lenin Ave, 36, Tomsk Oblast, Tomsk 634050, Russia
- Centre for Research in the Field of Materials and Technologies, National Research Tomsk State University, Lenin Ave, 36, Tomsk Oblast, Tomsk 634050, Russia
| | - Igor E Pamirsky
- Advanced Engineering School (Agrobiotek), National Research Tomsk State University, Lenin Ave, 36, Tomsk Oblast, Tomsk 634050, Russia
- Siberian Federal Scientific Centre of Agrobiotechnology, Centralnaya st., 2b, Presidium, Krasnoobsk, 633501, Novosibirsk Oblast, Russia
| | - Kirill S Golokhvast
- Advanced Engineering School (Agrobiotek), National Research Tomsk State University, Lenin Ave, 36, Tomsk Oblast, Tomsk 634050, Russia
- Siberian Federal Scientific Centre of Agrobiotechnology, Centralnaya st., 2b, Presidium, Krasnoobsk, 633501, Novosibirsk Oblast, Russia
| |
Collapse
|
3
|
Ivanov MV, Kopeykina AS, Gorshkov MV. Reanalysis of DIA Data Demonstrates the Capabilities of MS/MS-Free Proteomics to Reveal New Biological Insights in Disease-Related Samples. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:1775-1785. [PMID: 38938158 DOI: 10.1021/jasms.4c00134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
Data-independent acquisition (DIA) at the shortened data acquisition time is becoming a method of choice for quantitative proteomic applications requiring high throughput analysis of large cohorts of samples. With the advent of the combination of high resolution mass spectrometry with an asymmetric track lossless analyzer, these DIA capabilities were further extended with the recent demonstration of quantitative analyses at the speed of up to hundreds of samples per day. In particular, the proteomic data for the brain samples related to multiple system atrophy disease were acquired using 7 and 28 min chromatography gradients (Guzman et al., Nat. Biotech. 2024). In this work, we applied the recently introduced DirectMS1 method to reanalysis of these data using only MS1 spectra. Both DirectMS1 and DIA results were matched against long gradient DDA analysis from the earlier study of the same sample cohort. While the quantitation efficiency of DirectMS1 was comparable with DIA on the same data sets, we found an additional five proteins of biological significance relevant to the analyzed tissue samples. Among the findings, DirectMS1 was able to detect decreased caspase activity for Vimentin protein in the multiple system atrophy samples missed by the MS/MS-based quantitation methods. Our study suggests that DirectMS1 can be an efficient MS1-only addition to the analysis of DIA data in high-throughput quantitative proteomic studies.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Anna S Kopeykina
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| |
Collapse
|
4
|
Fedorov II, Protasov SA, Tarasova IA, Gorshkov MV. Ultrafast Proteomics. BIOCHEMISTRY. BIOKHIMIIA 2024; 89:1349-1361. [PMID: 39245450 DOI: 10.1134/s0006297924080017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 09/10/2024]
Abstract
Current stage of proteomic research in the field of biology, medicine, development of new drugs, population screening, or personalized approaches to therapy dictates the need to analyze large sets of samples within the reasonable experimental time. Until recently, mass spectrometry measurements in proteomics were characterized as unique in identifying and quantifying cellular protein composition, but low throughput, requiring many hours to analyze a single sample. This was in conflict with the dynamics of changes in biological systems at the whole cellular proteome level upon the influence of external and internal factors. Thus, low speed of the whole proteome analysis has become the main factor limiting developments in functional proteomics, where it is necessary to annotate intracellular processes not only in a wide range of conditions, but also over a long period of time. Enormous level of heterogeneity of tissue cells or tumors, even of the same type, dictates the need to analyze biological systems at the level of individual cells. These studies involve obtaining molecular characteristics for tens, if not hundreds of thousands of individual cells, including their whole proteome profiles. Development of mass spectrometry technologies providing high resolution and mass measurement accuracy, predictive chromatography, new methods for peptide separation by ion mobility and processing of proteomic data based on artificial intelligence algorithms have opened a way for significant, if not radical, increase in the throughput of whole proteome analysis and led to implementation of the novel concept of ultrafast proteomics. Work done just in the last few years has demonstrated the proteome-wide analysis throughput of several hundred samples per day at a depth of several thousand proteins, levels unimaginable three or four years ago. The review examines background of these developments, as well as modern methods and approaches that implement ultrafast analysis of the entire proteome.
Collapse
Affiliation(s)
- Ivan I Fedorov
- Moscow Institute of Physics and Technology (National University), Dolgoprudny, Moscow Region, 141700, Russia
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Sergey A Protasov
- Moscow Institute of Physics and Technology (National University), Dolgoprudny, Moscow Region, 141700, Russia
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Irina A Tarasova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia.
| |
Collapse
|
5
|
Kuhnen G, Class LC, Badekow S, Hanisch KL, Rohn S, Kuballa J. Python workflow for the selection and identification of marker peptides-proof-of-principle study with heated milk. Anal Bioanal Chem 2024; 416:3349-3360. [PMID: 38607384 PMCID: PMC11106092 DOI: 10.1007/s00216-024-05286-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 03/26/2024] [Accepted: 04/02/2024] [Indexed: 04/13/2024]
Abstract
The analysis of almost holistic food profiles has developed considerably over the last years. This has also led to larger amounts of data and the ability to obtain more information about health-beneficial and adverse constituents in food than ever before. Especially in the field of proteomics, software is used for evaluation, and these do not provide specific approaches for unique monitoring questions. An additional and more comprehensive way of evaluation can be done with the programming language Python. It offers broad possibilities by a large ecosystem for mass spectrometric data analysis, but needs to be tailored for specific sets of features, the research questions behind. It also offers the applicability of various machine-learning approaches. The aim of the present study was to develop an algorithm for selecting and identifying potential marker peptides from mass spectrometric data. The workflow is divided into three steps: (I) feature engineering, (II) chemometric data analysis, and (III) feature identification. The first step is the transformation of the mass spectrometric data into a structure, which enables the application of existing data analysis packages in Python. The second step is the data analysis for selecting single features. These features are further processed in the third step, which is the feature identification. The data used exemplarily in this proof-of-principle approach was from a study on the influence of a heat treatment on the milk proteome/peptidome.
Collapse
Affiliation(s)
- Gesine Kuhnen
- GALAB Laboratories GmbH, Am Schleusengraben 7, 21029, Hamburg, Germany
- Department of Food Chemistry and Analysis, Institute of Food Technology and Food Chemistry, Technical University Berlin, Gustav Meyer Allee 25, 13355, Berlin, Germany
| | - Lisa-Carina Class
- GALAB Laboratories GmbH, Am Schleusengraben 7, 21029, Hamburg, Germany
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
| | - Svenja Badekow
- GALAB Laboratories GmbH, Am Schleusengraben 7, 21029, Hamburg, Germany
| | - Kim Lara Hanisch
- GALAB Laboratories GmbH, Am Schleusengraben 7, 21029, Hamburg, Germany
| | - Sascha Rohn
- Department of Food Chemistry and Analysis, Institute of Food Technology and Food Chemistry, Technical University Berlin, Gustav Meyer Allee 25, 13355, Berlin, Germany
| | - Jürgen Kuballa
- GALAB Laboratories GmbH, Am Schleusengraben 7, 21029, Hamburg, Germany.
| |
Collapse
|
6
|
Strauss MT, Bludau I, Zeng WF, Voytik E, Ammar C, Schessner JP, Ilango R, Gill M, Meier F, Willems S, Mann M. AlphaPept: a modern and open framework for MS-based proteomics. Nat Commun 2024; 15:2168. [PMID: 38461149 PMCID: PMC10924963 DOI: 10.1038/s41467-024-46485-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/20/2024] [Indexed: 03/11/2024] Open
Abstract
In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making efficient analysis a principal challenge. A plethora of different computational tools can process the MS data to derive peptide and protein identification and quantification. However, during the last years there has been dramatic progress in computer science, including collaboration tools that have transformed research and industry. To leverage these advances, we develop AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Numba for just-in-time compilation on CPU and GPU achieves hundred-fold speed improvements. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while accessing the latest advances. We provide an easy on-ramp for community contributions through the concept of literate programming, implemented in Jupyter Notebooks. Large datasets can rapidly be processed as shown by the analysis of hundreds of proteomes in minutes per file, many-fold faster than acquisition. AlphaPept can be used to build automated processing pipelines with web-serving functionality and compatibility with downstream analysis tools. It provides easy access via one-click installation, a modular Python library for advanced users, and via an open GitHub repository for developers.
Collapse
Affiliation(s)
- Maximilian T Strauss
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Isabell Bludau
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Wen-Feng Zeng
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Eugenia Voytik
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Constantin Ammar
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Julia P Schessner
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | | | | | - Florian Meier
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
- Functional Proteomics, Jena University Hospital, Jena, Germany
| | - Sander Willems
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Matthias Mann
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
7
|
Gorshkov V, Kjeldsen F. Exploiting Charge State Distribution To Probe Intramolecular Interactions in Gas-Phase Phosphopeptides and Enhance Proteomics Analyses. Anal Chem 2024; 96:1167-1177. [PMID: 38183295 DOI: 10.1021/acs.analchem.3c04270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2024]
Abstract
Charging of analytes is a prerequisite for performing mass spectrometry analysis. In proteomics, electrospray ionization is the dominant technique for this process. Although the observation of differences in the peptide charge state distribution (CSD) is well-known among experimentalists, its analytical value remains underexplored. To investigate the utility of this dimension, we analyzed several public data sets, comprising over 250,000 peptide CSD profiles from the human proteome. We found that the dimensions of the CSD demonstrate high reproducibility across multiple laboratories, mass analyzers, and extensive time intervals. The general observation was that the CSD enabled effective partitioning of the peptide property space, resulting in enhanced discrimination between sequence and constitutional peptide isomers. Next, by evaluating the CSD values of phosphorylated peptides, we were able to differentiate between phosphopeptides that indicate the formation of intramolecular structures in the gas phase and those that do not. The reproducibility of the CSD values (mean cosine similarity above 0.97 for most of the experiments) qualified CSD data suitable to train a deep-learning model capable of accurately predicting CSD values (mean cosine similarity - 0.98). When we applied the CSD dimension to MS1- and MS2-based proteomics experiments, we consistently observed around a 5% increase in protein and peptide identification rate. Even though the CSD dimension is not as effective a discriminator as the widely used retention time dimension, it still holds the potential for application in direct infusion proteomics.
Collapse
Affiliation(s)
- Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| |
Collapse
|
8
|
Ivanov MV, Garibova LA, Postoenko VI, Levitsky LI, Gorshkov MV. On the excessive use of coefficient of variation as a metric of quantitation quality in proteomics. Proteomics 2024; 24:e2300090. [PMID: 37496303 DOI: 10.1002/pmic.202300090] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/05/2023] [Accepted: 07/18/2023] [Indexed: 07/28/2023]
Abstract
The coefficient of variation (CV) is often used in proteomics as a proxy to characterize the performance of a quantitation method and/or the related software. In this note, we question the excessive reliance on this metric in quantitative proteomics that may result in erroneous conclusions. We support this note using a ground-truth Human-Yeast-E. coli dataset demonstrating in a number of cases that erroneous data processing methods may lead to a low CV which has nothing to do with these methods' performances in quantitation.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| | - Leyla A Garibova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| | - Valeriy I Postoenko
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| | - Lev I Levitsky
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
9
|
Postoenko VI, Garibova LA, Levitsky LI, Bubis JA, Gorshkov MV, Ivanov MV. IQMMA: Efficient MS1 Intensity Extraction Pipeline Using Multiple Feature Detection Algorithms for DDA Proteomics. J Proteome Res 2023; 22:2827-2835. [PMID: 37579078 DOI: 10.1021/acs.jproteome.3c00075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/16/2023]
Abstract
One of the key steps in data dependent acquisition (DDA) proteomics is detection of peptide isotopic clusters, also called "features", in MS1 spectra and matching them to MS/MS-based peptide identifications. A number of peptide feature detection tools became available in recent years, each relying on its own matching algorithm. Here, we provide an integrated solution, the intensity-based Quantitative Mix and Match Approach (IQMMA), which integrates a number of untargeted peptide feature detection algorithms and returns the most probable intensity values for the MS/MS-based identifications. IQMMA was tested using available proteomic data acquired for both well-characterized (ground truth) and real-world biological samples, including a mix of Yeast and E. coli digests spiked at different concentrations into the Human K562 digest used as a background, and a set of glioblastoma cell lines. Three open-source feature detection algorithms were integrated: Dinosaur, biosaur2, and OpenMS FeatureFinder. None of them was found optimal when applied individually to all the data sets employed in this work; however, their combined use in IQMMA improved efficiency of subsequent protein quantitation. The software implementing IQMMA is freely available at https://github.com/PostoenkoVI/IQMMA under Apache 2.0 license.
Collapse
Affiliation(s)
- Valeriy I Postoenko
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
- Moscow Institute of Physics and Technology, National Research University, G. Dolgoprudny, Institutsky Lane 9, Dolgoprudny 141701, Russia
| | - Leyla A Garibova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
- Moscow Institute of Physics and Technology, National Research University, G. Dolgoprudny, Institutsky Lane 9, Dolgoprudny 141701, Russia
| | - Lev I Levitsky
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| |
Collapse
|
10
|
Penanes P, Gorshkov V, Ivanov MV, Gorshkov MV, Kjeldsen F. Potential of Negative-Ion-Mode Proteomics: An MS1-Only Approach. J Proteome Res 2023; 22:2734-2742. [PMID: 37395192 PMCID: PMC10407931 DOI: 10.1021/acs.jproteome.3c00307] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Indexed: 07/04/2023]
Abstract
Current proteomics approaches rely almost exclusively on using the positive ionization mode, resulting in inefficient ionization of many acidic peptides. This study investigates protein identification efficiency in the negative ionization mode using the DirectMS1 method. DirectMS1 is an ultrafast data acquisition method based on accurate peptide mass measurements and predicted retention times. Our method achieves the highest rate of protein identification in the negative ion mode to date, identifying over 1000 proteins in a human cell line at a 1% false discovery rate. This is accomplished using a single-shot 10 min separation gradient, comparable to lengthy MS/MS-based analyses. Optimizing separation and experimental conditions was achieved by utilizing mobile buffers containing 2.5 mM imidazole and 3% isopropanol. The study emphasized the complementary nature of data obtained in positive and negative ion modes. Combining the results from all replicates in both polarities increased the number of identified proteins to 1774. Additionally, we analyzed the method's efficiency using different proteases for protein digestion. Among the four studied proteases (LysC, GluC, AspN, and trypsin), trypsin and LysC demonstrated the highest protein identification yield. This suggests that digestion procedures utilized in positive-mode proteomics can be effectively applied in the negative ion mode. Data are deposited to ProteomeXchange: PXD040583.
Collapse
Affiliation(s)
- Pelayo
A. Penanes
- Department
of Biochemistry and Molecular Biology, University
of Southern Denmark, DK-5230 Odense M, Denmark
| | - Vladimir Gorshkov
- Department
of Biochemistry and Molecular Biology, University
of Southern Denmark, DK-5230 Odense M, Denmark
| | - Mark V. Ivanov
- V.
L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical
Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Mikhail V. Gorshkov
- V.
L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical
Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Frank Kjeldsen
- Department
of Biochemistry and Molecular Biology, University
of Southern Denmark, DK-5230 Odense M, Denmark
| |
Collapse
|
11
|
Solovyeva EM, Bubis JA, Tarasova IA, Lobas AA, Ivanov MV, Nazarov AA, Shutkov IA, Gorshkov MV. On the Feasibility of Using an Ultra-Fast DirectMS1 Method of Proteome-Wide Analysis for Searching Drug Targets in Chemical Proteomics. BIOCHEMISTRY. BIOKHIMIIA 2022; 87:1342-1353. [PMID: 36509723 DOI: 10.1134/s000629792211013x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein quantitation in tissue cells or physiological fluids based on liquid chromatography/mass spectrometry is one of the key sources of information on the mechanisms of cell functioning during chemotherapeutic treatment. Information on significant changes in protein expression upon treatment can be obtained by chemical proteomics and requires analysis of the cellular proteomes, as well as development of experimental and bioinformatic methods for identification of the drug targets. Low throughput of whole proteome analysis based on liquid chromatography and tandem mass spectrometry is one of the main factors limiting the scale of these studies. The method of direct mass spectrometric identification of proteins, DirectMS1, is one of the approaches developed in recent years allowing ultrafast proteome-wide analyses employing minute-scale gradients for separation of proteolytic mixtures. Aim of this work was evaluation of both possibilities and limitations of the method for identification of drug targets at the level of whole proteome and for revealing cellular processes activated by the treatment. Particularly, the available literature data on chemical proteomics obtained earlier for a large set of onco-pharmaceuticals using multiplex quantitative proteome profiling were analyzed. The results obtained were further compared with the proteome-wide data acquired by the DirectMS1 method using ultrashort separation gradients to evaluate efficiency of the method in identifying known drug targets. Using ovarian cancer cell line A2780 as an example, a whole-proteome comparison of two cell lysis techniques was performed, including the freeze-thaw lysis commonly employed in chemical proteomics and the one based on ultrasonication for cell disruption, which is the widely accepted as a standard in proteomic studies. Also, the proteome-wide profiling was performed using ultrafast DirectMS1 method for A2780 cell line treated with lonidamine, followed by gene ontology analyses to evaluate capabilities of the method in revealing regulation of proteins in the cellular processes associated with drug treatment.
Collapse
Affiliation(s)
- Elizaveta M Solovyeva
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Irina A Tarasova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Anna A Lobas
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia
| | - Alexey A Nazarov
- Faculty of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Ilya A Shutkov
- Faculty of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow, 119334, Russia.
| |
Collapse
|
12
|
Ivanov MV, Bubis JA, Gorshkov V, Tarasova IA, Levitsky LI, Solovyeva EM, Lipatova AV, Kjeldsen F, Gorshkov MV. DirectMS1Quant: Ultrafast Quantitative Proteomics with MS/MS-Free Mass Spectrometry. Anal Chem 2022; 94:13068-13075. [PMID: 36094425 DOI: 10.1021/acs.analchem.2c02255] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Recently, we presented the DirectMS1 method of ultrafast proteome-wide analysis based on minute-long LC gradients and MS1-only mass spectra acquisition. Currently, the method provides the depth of human cell proteome coverage of 2500 proteins at a 1% false discovery rate (FDR) when using 5 min LC gradients and 7.3 min runtime in total. While the standard MS/MS approaches provide 4000-5000 protein identifications within a couple of hours of instrumentation time, we advocate here that the higher number of identified proteins does not always translate into better quantitation quality of the proteome analysis. To further elaborate on this issue, we performed a one-on-one comparison of quantitation results obtained using DirectMS1 with three popular MS/MS-based quantitation methods: label-free (LFQ) and tandem mass tag quantitation (TMT), both based on data-dependent acquisition (DDA) and data-independent acquisition (DIA). For comparison, we performed a series of proteome-wide analyses of well-characterized (ground truth) and biologically relevant samples, including a mix of UPS1 proteins spiked at different concentrations into an Echerichia coli digest used as a background and a set of glioblastoma cell lines. MS1-only data was analyzed using a novel quantitation workflow called DirectMS1Quant developed in this work. The results obtained in this study demonstrated comparable quantitation efficiency of 5 min DirectMS1 with both TMT and DIA methods, yet the latter two utilized a 10-20-fold longer instrumentation time.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Irina A Tarasova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Lev I Levitsky
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Elizaveta M Solovyeva
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Anastasiya V Lipatova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| |
Collapse
|
13
|
Samukhina YV, Matyushin DD, Grinevich OI, Buryak AK. A Deep Convolutional Neural Network for Prediction of Peptide Collision Cross Sections in Ion Mobility Spectrometry. Biomolecules 2021; 11:1904. [PMID: 34944547 PMCID: PMC8699202 DOI: 10.3390/biom11121904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/13/2021] [Accepted: 12/17/2021] [Indexed: 11/26/2022] Open
Abstract
Most frequently, the identification of peptides in mass spectrometry-based proteomics is carried out using high-resolution tandem mass spectrometry. In order to increase the accuracy of analysis, additional information on the peptides such as chromatographic retention time and collision cross section in ion mobility spectrometry can be used. An accurate prediction of the collision cross section values allows erroneous candidates to be rejected using a comparison of the observed values and the predictions based on the amino acids sequence. Recently, a massive high-quality data set of peptide collision cross sections was released. This opens up an opportunity to apply the most sophisticated deep learning techniques for this task. Previously, it was shown that a recurrent neural network allows for predicting these values accurately. In this work, we present a deep convolutional neural network that enables us to predict these values more accurately compared with previous studies. We use a neural network with complex architecture that contains both convolutional and fully connected layers and comprehensive methods of converting a peptide to multi-channel 1D spatial data and vector. The source code and pre-trained model are available online.
Collapse
Affiliation(s)
| | - Dmitriy D. Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia; (Y.V.S.); (O.I.G.); (A.K.B.)
| | | | | |
Collapse
|
14
|
Ivanov MV, Solovyeva EM, Bubis JA, Gorshkov MV. Improving the Protein Inference from Bottom-Up Proteomic Data Using Identifications from MS1 Spectra. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1258-1262. [PMID: 33900766 DOI: 10.1021/jasms.1c00061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Protein inference is one of the crucial steps in proteome characterization using a bottom-up approach. Multiple algorithms to solve the problem are focused on extensive analysis of shared peptides identified from fragmentation mass spectra (MS/MS). However, many protein homologues with a similar amino acid sequence typically have identical lists of identified peptides due to the problem of proteome undersampling in a bottom-up approach and, thus, cannot be distinguished by existing protein inference methods. Here, we propose the use of peptide feature information extracted from precursor mass spectra to assist in identification of proteins otherwise indistinguishable from MS/MS. The proposed method was integrated with a protein inference algorithm based on the parsimony principle and built-in in the postsearch utility Scavager. The results demonstrate increasing accuracy and efficiency of homologous protein identifications for the well characterized data sets including the one with known protein sequences from iPRG-2016 study.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Building 2, Moscow 119334, Russia
| | - Elizaveta M Solovyeva
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Building 2, Moscow 119334, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Building 2, Moscow 119334, Russia
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Building 2, Moscow 119334, Russia
| |
Collapse
|
15
|
Ivanov MV, Bubis JA, Gorshkov V, Abdrakhimov DA, Kjeldsen F, Gorshkov MV. Boosting MS1-only Proteomics with Machine Learning Allows 2000 Protein Identifications in Single-Shot Human Proteome Analysis Using 5 min HPLC Gradient. J Proteome Res 2021; 20:1864-1873. [PMID: 33720732 DOI: 10.1021/acs.jproteome.0c00863] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Proteome-wide analyses rely on tandem mass spectrometry and the extensive separation of proteolytic mixtures. This imposes considerable instrumental time consumption, which is one of the main obstacles in the broader acceptance of proteomics in biomedical and clinical research. Recently, we presented a fast proteomic method termed DirectMS1 based on ultrashort LC gradients as well as MS1-only mass spectra acquisition and data processing. The method allows significant reduction of the proteome-wide analysis time to a few minutes at the depth of quantitative proteome coverage of 1000 proteins at 1% false discovery rate (FDR). In this work, to further increase the capabilities of the DirectMS1 method, we explored the opportunities presented by the recent progress in the machine-learning area and applied the LightGBM decision tree boosting algorithm to the scoring of peptide feature matches when processing MS1 spectra. Furthermore, we integrated the peptide feature identification algorithm of DirectMS1 with the recently introduced peptide retention time prediction utility, DeepLC. Additional approaches to improve the performance of the DirectMS1 method are discussed and demonstrated, such as using FAIMS for gas-phase ion separation. As a result of all improvements to DirectMS1, we succeeded in identifying more than 2000 proteins at 1% FDR from the HeLa cell line in a 5 min gradient LC-FAIMS/MS1 analysis. The data sets generated and analyzed during the current study have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier PXD023977.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Julia A Bubis
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Daniil A Abdrakhimov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia.,Moscow Institute of Physics and Technology, Institutsky lane 9, Dolgoprudny, Moscow Region 141700, Russia
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| |
Collapse
|