1
|
Lebedev VV, Yarykin DI, Buryak AK. Automated Identification of Ions Observed in Mass Spectra of Inorganic Compounds Using Isotopic Distribution Brute Force: Methodology and Performance Measurements. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:1806-1817. [PMID: 39041793 DOI: 10.1021/jasms.4c00153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
This Article describes the method of isotopic distribution brute force, which can be used to identify ions registered in mass spectra of inorganic compounds in an automated manner when a library search cannot be conducted. A detailed description of the isotopic distribution brute force methodology is presented, including a discussion of computation-related difficulties. The ability of the proposed algorithm to identify various inorganic ions is tested on a small set of real-life low-resolution mass spectra of lead halides and copper halides. An evaluation of the isotopic distribution brute force performance is conducted using the low-resolution experimental mass spectra of natural rhenium sulfide and lead(II) chloride. Based on identification results and obtained performance measurements, we formulate the empirical restrictions on the input data, ensuring that the application of isotopic distribution brute force is feasible from the standpoints of search space reduction and identification time.
Collapse
Affiliation(s)
- Viacheslav V Lebedev
- A. N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Leninsky Prospect, 31 Building 4, Moscow 119071, Russian Federation
| | - Daniil I Yarykin
- A. N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Leninsky Prospect, 31 Building 4, Moscow 119071, Russian Federation
| | - Aleksey K Buryak
- A. N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Leninsky Prospect, 31 Building 4, Moscow 119071, Russian Federation
| |
Collapse
|
2
|
Zhong J, Song X, Wang S. FREE: Enhanced Feature Representation for Isotopic Envelope Evaluation in Top-Down Mass Spectra Deconvolution. Anal Chem 2024; 96:12602-12615. [PMID: 39037184 DOI: 10.1021/acs.analchem.4c00152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
The aim of deconvolution of top-down mass spectra is to recognize monoisotopic peaks from the experimental envelopes in raw mass spectra. So accurate assessment of similarity between theoretical and experimental envelopes is a critical step in mass spectra data deconvolution. Existing evaluation methods primarily rely on intensity differences and m/z similarity, potentially lacking a comprehensive assessment. To overcome this constraint and facilitate a comprehensive and refined assessment of the similarity between theoretical and experimental envelopes, there exists an imperative to systematically explore and identify increasingly efficacious features for assessing this correspondence. We present enhanced feature representation for isotopic envelope evaluation (FREE) that derives diverse feature representations, encapsulating fundamental physical attributes of envelopes, including peak intensity and envelope shape. We trained FREE and evaluated its performance on both the ovarian tumor (OT) (human OT cells) data set and zebrafish (ZF) (brain in mature female ZF) data set. Specifically, comparing the state-of-art method, FREE demonstrates higher performance in multiple evaluation metrics across both the OT and ZF data sets, with a particular emphasis on precision, and it demonstrates accurate predictions of a greater number of positive envelopes among the top-ranked envelopes based on their scores. Moreover, within a cross-species data set of ZF, FREE identified a higher number of proteoform-spectrum matches (PrSMs), increasing the count from 50,795 to 52,927 compared to EnvCNN, the amalgamation of FREE with TopFD also exhibits a commendable capacity to discern 117,883 fragment ions, thus surpassing the 97,554 fragment ions identified through the application of EnvCNN in conjunction with TopFD. To further validate the performance of FREE, we have tested 10 a cross-species top-down proteomes containing 36 subdata set from ProteomeXchange. The results reveal that, after deconvolution with TopFD + FREE, TopPIC identifies more PrSMs across these 10 data sets in both the first and second rounds of experiments. These findings underscore the robustness and generalization capabilities of the FREE approach in diverse proteomes.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, ChangSha 410081, China
| | - Xingran Song
- College of Information Science and Engineering, Hunan Normal University, ChangSha 410081, China
| | - Shaokai Wang
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo N2L 3G1, Canada
| |
Collapse
|
3
|
Beck A, Muhoberac M, Randolph CE, Beveridge CH, Wijewardhane PR, Kenttämaa HI, Chopra G. Recent Developments in Machine Learning for Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024; 4:233-246. [PMID: 38910862 PMCID: PMC11191731 DOI: 10.1021/acsmeasuresciau.3c00060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/27/2023] [Accepted: 01/22/2024] [Indexed: 06/25/2024]
Abstract
Statistical analysis and modeling of mass spectrometry (MS) data have a long and rich history with several modern MS-based applications using statistical and chemometric methods. Recently, machine learning (ML) has experienced a renaissance due to advents in computational hardware and the development of new algorithms for artificial neural networks (ANN) and deep learning architectures. Moreover, recent successes of new ANN and deep learning architectures in several areas of science, engineering, and society have further strengthened the ML field. Importantly, modern ML methods and architectures have enabled new approaches for tasks related to MS that are now widely adopted in several popular MS-based subdisciplines, such as mass spectrometry imaging and proteomics. Herein, we aim to provide an introductory summary of the practical aspects of ML methodology relevant to MS. Additionally, we seek to provide an up-to-date review of the most recent developments in ML integration with MS-based techniques while also providing critical insights into the future direction of the field.
Collapse
Affiliation(s)
- Armen
G. Beck
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Matthew Muhoberac
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Caitlin E. Randolph
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Connor H. Beveridge
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Prageeth R. Wijewardhane
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Hilkka I. Kenttämaa
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Gaurav Chopra
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
- Department
of Computer Science (by courtesy), Purdue University, West Lafayette, Indiana 47907, United States
- Purdue
Institute for Drug Discovery, Purdue Institute for Cancer Research,
Regenstrief Center for Healthcare Engineering, Purdue Institute for
Inflammation, Immunology and Infectious Disease, Purdue Institute for Integrative Neuroscience, West Lafayette, Indiana 47907 United States
| |
Collapse
|
4
|
Potemkin AA, Proskurnin MA, Volkov DS. Noise Filtering Algorithm Using Gaussian Mixture Models for High-Resolution Mass Spectra of Natural Organic Matter. Anal Chem 2024; 96:5455-5461. [PMID: 38530650 DOI: 10.1021/acs.analchem.3c05453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
High-resolution mass spectra of natural organic matter (NOM) contain a large number of noise signals. These signals interfere with the correct molecular composition estimation during nontargeted analysis because formula-assignment programs find empirical formulas for such peaks as well. Previously proposed noise filtering methods that utilize the profile of the intensity distribution of mass spectrum peaks rely on a histogram to calculate the intensity threshold value. However, the histogram profile can vary depending on the user settings. In addition, these algorithms are not automated, so they are handled manually. To overcome the mentioned drawbacks, we propose a new algorithm for noise filtering in mass spectra. This filter is based on Gaussian Mixture Models (GMMs), a machine learning method to find the intensity threshold value. The algorithm is completely data-driven and eliminates the need to work with a histogram. It has no customizable parameters and automatically determines the noise level for each individual mass spectrum. The algorithm performance was tested on mass spectra of natural organic matter obtained by averaging a different number of microscans (transients), and the results were compared with other noise filters proposed in the literature. Finally, the effect of this noise filtering approach on the fraction of peaks with assigned formulas was investigated. It was shown that there is always an increase in the identification rate, but the magnitude of the effect changes with the number of microscans averaged. The increase can be as high as 15%.
Collapse
Affiliation(s)
- Alexander A Potemkin
- Chemistry Department of M.V. Lomonosov Moscow State University, Leninskie Gory, 1-3, GSP-1, Moscow 119991, Russia
| | - Mikhail A Proskurnin
- Chemistry Department of M.V. Lomonosov Moscow State University, Leninskie Gory, 1-3, GSP-1, Moscow 119991, Russia
| | - Dmitry S Volkov
- Chemistry Department of M.V. Lomonosov Moscow State University, Leninskie Gory, 1-3, GSP-1, Moscow 119991, Russia
| |
Collapse
|
5
|
Larson EJ, Pergande MR, Moss ME, Rossler KJ, Wenger RK, Krichel B, Josyer H, Melby JA, Roberts DS, Pike K, Shi Z, Chan HJ, Knight B, Rogers HT, Brown KA, Ong IM, Jeong K, Marty MT, McIlwain SJ, Ge Y. MASH Native: a unified solution for native top-down proteomics data processing. Bioinformatics 2023; 39:btad359. [PMID: 37294807 PMCID: PMC10283151 DOI: 10.1093/bioinformatics/btad359] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 04/13/2023] [Accepted: 06/07/2023] [Indexed: 06/11/2023] Open
Abstract
MOTIVATION Native top-down proteomics (nTDP) integrates native mass spectrometry (nMS) with top-down proteomics (TDP) to provide comprehensive analysis of protein complexes together with proteoform identification and characterization. Despite significant advances in nMS and TDP software developments, a unified and user-friendly software package for analysis of nTDP data remains lacking. RESULTS We have developed MASH Native to provide a unified solution for nTDP to process complex datasets with database searching capabilities in a user-friendly interface. MASH Native supports various data formats and incorporates multiple options for deconvolution, database searching, and spectral summing to provide a "one-stop shop" for characterizing both native protein complexes and proteoforms. AVAILABILITY AND IMPLEMENTATION The MASH Native app, video tutorials, written tutorials, and additional documentation are freely available for download at https://labs.wisc.edu/gelab/MASH_Explorer/MASHSoftware.php. All data files shown in user tutorials are included with the MASH Native software in the download .zip file.
Collapse
Affiliation(s)
- Eli J Larson
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Melissa R Pergande
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Michelle E Moss
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kalina J Rossler
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - R Kent Wenger
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
- Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Boris Krichel
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Harini Josyer
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Jake A Melby
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - David S Roberts
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kyndalanne Pike
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Zhuoxin Shi
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Hsin-Ju Chan
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Bridget Knight
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Holden T Rogers
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kyle A Brown
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Irene M Ong
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53705, United States
- University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Obstetrics and Gynecology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kyowon Jeong
- Department of Applied Bioinformatics, University of Tübingen, Tübingen 72704, Germany
| | - Michael T Marty
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85719, United States
| | - Sean J McIlwain
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53705, United States
- University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, WI 53705, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
- Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin–Madison, Madison, WI 53705, United States
| |
Collapse
|
6
|
Basharat AR, Zang Y, Sun L, Liu X. TopFD: A Proteoform Feature Detection Tool for Top-Down Proteomics. Anal Chem 2023; 95:8189-8196. [PMID: 37196155 DOI: 10.1021/acs.analchem.2c05244] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Top-down liquid chromatography-mass spectrometry (LC-MS) analyzes intact proteoforms and generates mass spectra containing peaks of proteoforms with various isotopic compositions, charge states, and retention times. An essential step in top-down MS data analysis is proteoform feature detection, which aims to group these peaks into peak sets (features), each containing all peaks of a proteoform. Accurate protein feature detection enhances the accuracy in MS-based proteoform identification and quantification. Here, we present TopFD, a software tool for top-down MS feature detection that integrates algorithms for proteoform feature detection, feature boundary refinement, and machine learning models for proteoform feature evaluation. We performed extensive benchmarking of TopFD, ProMex, FlashDeconv, and Xtract using seven top-down MS data sets and demonstrated that TopFD outperforms other tools in feature accuracy, reproducibility, and feature abundance reproducibility.
Collapse
Affiliation(s)
- Abdul Rehman Basharat
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Yong Zang
- Department of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaowen Liu
- Deming Department of Medicine, Tulane University School of Medicine, New Orleans, Louisiana 70112, United States
| |
Collapse
|
7
|
Pauwels J, Fijałkowska D, Eyckerman S, Gevaert K. Mass spectrometry and the cellular surfaceome. MASS SPECTROMETRY REVIEWS 2022; 41:804-841. [PMID: 33655572 DOI: 10.1002/mas.21690] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 02/05/2021] [Accepted: 02/09/2021] [Indexed: 06/12/2023]
Abstract
The collection of exposed plasma membrane proteins, collectively termed the surfaceome, is involved in multiple vital cellular processes, such as the communication of cells with their surroundings and the regulation of transport across the lipid bilayer. The surfaceome also plays key roles in the immune system by recognizing and presenting antigens, with its possible malfunctioning linked to disease. Surface proteins have long been explored as potential cell markers, disease biomarkers, and therapeutic drug targets. Despite its importance, a detailed study of the surfaceome continues to pose major challenges for mass spectrometry-driven proteomics due to the inherent biophysical characteristics of surface proteins. Their inefficient extraction from hydrophobic membranes to an aqueous medium and their lower abundance compared to intracellular proteins hamper the analysis of surface proteins, which are therefore usually underrepresented in proteomic datasets. To tackle such problems, several innovative analytical methodologies have been developed. This review aims at providing an extensive overview of the different methods for surfaceome analysis, with respective considerations for downstream mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Jarne Pauwels
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | | | - Sven Eyckerman
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
8
|
Abstract
Native mass spectrometry (MS) involves the analysis and characterization of macromolecules, predominantly intact proteins and protein complexes, whereby as much as possible the native structural features of the analytes are retained. As such, native MS enables the study of secondary, tertiary, and even quaternary structure of proteins and other biomolecules. Native MS represents a relatively recent addition to the analytical toolbox of mass spectrometry and has over the past decade experienced immense growth, especially in enhancing sensitivity and resolving power but also in ease of use. With the advent of dedicated mass analyzers, sample preparation and separation approaches, targeted fragmentation techniques, and software solutions, the number of practitioners and novel applications has risen in both academia and industry. This review focuses on recent developments, particularly in high-resolution native MS, describing applications in the structural analysis of protein assemblies, proteoform profiling of─among others─biopharmaceuticals and plasma proteins, and quantitative and qualitative analysis of protein-ligand interactions, with the latter covering lipid, drug, and carbohydrate molecules, to name a few.
Collapse
Affiliation(s)
- Sem Tamara
- Biomolecular
Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular
Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Padualaan 8, 3584
CH Utrecht, The Netherlands
- Netherlands
Proteomics Center, Padualaan
8, 3584 CH Utrecht, The Netherlands
| | - Maurits A. den Boer
- Biomolecular
Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular
Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Padualaan 8, 3584
CH Utrecht, The Netherlands
- Netherlands
Proteomics Center, Padualaan
8, 3584 CH Utrecht, The Netherlands
| | - Albert J. R. Heck
- Biomolecular
Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular
Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Padualaan 8, 3584
CH Utrecht, The Netherlands
- Netherlands
Proteomics Center, Padualaan
8, 3584 CH Utrecht, The Netherlands
| |
Collapse
|
9
|
Choi IK, Liu X. Top-Down Mass Spectrometry Data Analysis Using TopPIC Suite. Methods Mol Biol 2022; 2500:83-103. [PMID: 35657589 DOI: 10.1007/978-1-0716-2325-1_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the advances of mass spectrometry (MS) techniques, top-down MS-based proteomics has gained increasing attention because of its advantages over bottom-up MS in studying complex proteoforms. TopPIC Suite is a widely used software package for top-down MS-based proteoform identification and quantification. Here, we present the methods for top-down MS data analysis using TopPIC Suite.
Collapse
Affiliation(s)
- In Kwon Choi
- Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA
| | - Xiaowen Liu
- Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA.
| |
Collapse
|