1
|
Downs M, Zaia J, Sethi MK. Mass spectrometry methods for analysis of extracellular matrix components in neurological diseases. MASS SPECTROMETRY REVIEWS 2023; 42:1848-1875. [PMID: 35719114 PMCID: PMC9763553 DOI: 10.1002/mas.21792] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/12/2022] [Accepted: 05/24/2022] [Indexed: 06/15/2023]
Abstract
The brain extracellular matrix (ECM) is a highly glycosylated environment and plays important roles in many processes including cell communication, growth factor binding, and scaffolding. The formation of structures such as perineuronal nets (PNNs) is critical in neuroprotection and neural plasticity, and the formation of molecular networks is dependent in part on glycans. The ECM is also implicated in the neuropathophysiology of disorders such as Alzheimer's disease (AD), Parkinson's disease (PD), and Schizophrenia (SZ). As such, it is of interest to understand both the proteomic and glycomic makeup of healthy and diseased brain ECM. Further, there is a growing need for site-specific glycoproteomic information. Over the past decade, sample preparation, mass spectrometry, and bioinformatic methods have been developed and refined to provide comprehensive information about the glycoproteome. Core ECM molecules including versican, hyaluronan and proteoglycan link proteins, and tenascin are dysregulated in AD, PD, and SZ. Glycomic changes such as differential sialylation, sulfation, and branching are also associated with neurodegeneration. A more thorough understanding of the ECM and its proteomic, glycomic, and glycoproteomic changes in brain diseases may provide pathways to new therapeutic options.
Collapse
Affiliation(s)
- Margaret Downs
- Department of Biochemistry, Center for Biomedical Mass Spectrometry, Boston University, Boston, Massachusetts, USA
| | - Joseph Zaia
- Department of Biochemistry, Center for Biomedical Mass Spectrometry, Boston University, Boston, Massachusetts, USA
- Bioinformatics Program, Boston University, Boston, Massachusetts, USA
| | - Manveen K Sethi
- Department of Biochemistry, Center for Biomedical Mass Spectrometry, Boston University, Boston, Massachusetts, USA
| |
Collapse
|
2
|
Harvey DJ. Analysis of carbohydrates and glycoconjugates by matrix-assisted laser desorption/ionization mass spectrometry: An update for 2019-2020. MASS SPECTROMETRY REVIEWS 2022:e21806. [PMID: 36468275 DOI: 10.1002/mas.21806] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
This review is the tenth update of the original article published in 1999 on the application of matrix-assisted laser desorption/ionization (MALDI) mass spectrometry to the analysis of carbohydrates and glycoconjugates and brings coverage of the literature to the end of 2020. Also included are papers that describe methods appropriate to analysis by MALDI, such as sample preparation techniques, even though the ionization method is not MALDI. The review is basically divided into three sections: (1) general aspects such as theory of the MALDI process, matrices, derivatization, MALDI imaging, fragmentation, quantification and the use of arrays. (2) Applications to various structural types such as oligo- and polysaccharides, glycoproteins, glycolipids, glycosides and biopharmaceuticals, and (3) other areas such as medicine, industrial processes and glycan synthesis where MALDI is extensively used. Much of the material relating to applications is presented in tabular form. The reported work shows increasing use of incorporation of new techniques such as ion mobility and the enormous impact that MALDI imaging is having. MALDI, although invented nearly 40 years ago is still an ideal technique for carbohydrate analysis and advancements in the technique and range of applications show little sign of diminishing.
Collapse
Affiliation(s)
- David J Harvey
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, UK
- Department of Chemistry, University of Oxford, Oxford, Oxfordshire, United Kingdom
| |
Collapse
|
3
|
Patabandige MW, Pfeifer LD, Nguyen HT, Desaire H. Quantitative clinical glycomics strategies: A guide for selecting the best analysis approach. MASS SPECTROMETRY REVIEWS 2022; 41:901-921. [PMID: 33565652 PMCID: PMC8601598 DOI: 10.1002/mas.21688] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 12/13/2020] [Accepted: 01/24/2021] [Indexed: 05/05/2023]
Abstract
Glycans introduce complexity to the proteins to which they are attached. These modifications vary during the progression of many diseases; thus, they serve as potential biomarkers for disease diagnosis and prognosis. The immense structural diversity of glycans makes glycosylation analysis and quantitation difficult. Fortunately, recent advances in analytical techniques provide the opportunity to quantify even low-abundant glycopeptides and glycans derived from complex biological mixtures, allowing for the identification of glycosylation differences between healthy samples and those derived from disease states. Understanding the strengths and weaknesses of different quantitative glycomics analysis methods is important for selecting the best strategy to analyze glycosylation changes in any given set of clinical samples. To provide guidance towards selecting the proper approach, we discuss four widely used quantitative glycomics analysis platforms, including fluorescence-based analysis of released N-linked glycans and three different varieties of MS-based analysis: liquid chromatography (LC)-mass spectrometry (MS) analysis of glycopeptides, matrix-assisted laser desorption ionization-time of flight MS, and LC-ESI-MS analysis of released N-linked glycans. These methods' strengths and weaknesses are compared, particularly associated with the figures of merit that are important for clinical biomarker studies, including: the initial sample requirements, the methods' throughput, sample preparation time, the number of species identified, the methods' utility for isomer separation and structural characterization, method-related challenges associated with quantitation, repeatability, the expertise required, and the cost for each analysis. This review, therefore, provides unique guidance to researchers who endeavor to undertake a clinical glycomics analysis by offering insights on the available analysis technologies.
Collapse
Affiliation(s)
- Milani Wijeweera Patabandige
- Ralph N. Adams Institute for Bioanalytical Chemistry, Department of Chemistry, University of Kansas, Lawrence, KS 66047, United States
| | - Leah D. Pfeifer
- Ralph N. Adams Institute for Bioanalytical Chemistry, Department of Chemistry, University of Kansas, Lawrence, KS 66047, United States
| | - Hanna T. Nguyen
- Ralph N. Adams Institute for Bioanalytical Chemistry, Department of Chemistry, University of Kansas, Lawrence, KS 66047, United States
| | - Heather Desaire
- Ralph N. Adams Institute for Bioanalytical Chemistry, Department of Chemistry, University of Kansas, Lawrence, KS 66047, United States
| |
Collapse
|
4
|
Desaire H, Stepler KE, Robinson RAS. Exposing the Brain Proteomic Signatures of Alzheimer's Disease in Diverse Racial Groups: Leveraging Multiple Data Sets and Machine Learning. J Proteome Res 2022; 21:1095-1104. [PMID: 35276041 PMCID: PMC9097891 DOI: 10.1021/acs.jproteome.1c00966] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Recent studies have highlighted that the proteome can be used to identify potential biomarker candidates for Alzheimer's disease (AD) in diverse cohorts. Furthermore, the racial and ethnic background of participants is an important factor to consider to ensure the effectiveness of potential biomarkers for representative populations. A promising approach to survey potential biomarker candidates for diagnosing AD in diverse cohorts is the application of machine learning to proteomics data sets. Herein, we leveraged six existing bottom-up proteomics data sets, which included non-Hispanic White, African American/Black, and Hispanic participants, to study protein changes in AD and cognitively unimpaired participants. Machine learning models were applied to these data sets and resulted in the identification of amyloid-β precursor protein (APP) and heat shock protein β-1 (HSPB1) as two proteins that have high ability to distinguish AD; however, each protein's performance varied based upon the racial and ethnic background of the participants. HSPB1 particularly was helpful for generating high areas under the curve (AUCs) for African American/Black participants. Overall, HSPB1 improved the performance of the machine learning models when combined with APP and/or participant age and is a potential candidate that should be further explored in AD biomarker discovery efforts.
Collapse
Affiliation(s)
- Heather Desaire
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| | - Kaitlyn E Stepler
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Renã A S Robinson
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States.,Vanderbilt Memory and Alzheimer's Center, Vanderbilt University Medical Center, Nashville, Tennessee 37212, United States.,Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, Tennessee 37232, United States.,Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee 37232, United States.,Department of Neurology, Vanderbilt University Medical Center, Nashville, Tennessee 37232, United States
| |
Collapse
|
5
|
Hua D, Desaire H. Improved Discrimination of Disease States Using Proteomics Data with the Updated Aristotle Classifier. J Proteome Res 2021; 20:2823-2829. [PMID: 33909976 DOI: 10.1021/acs.jproteome.1c00066] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Mass spectrometry data sets from omics studies are an optimal information source for discriminating patients with disease and identifying biomarkers. Thousands of proteins or endogenous metabolites can be queried in each analysis, spanning several orders of magnitude in abundance. Machine learning tools that effectively leverage these data to accurately identify disease states are in high demand. While mass spectrometry data sets are rich with potentially useful information, using the data effectively can be challenging because of missing entries in the data sets and because the number of samples is typically much smaller than the number of features, two challenges that make machine learning difficult. To address this problem, we have modified a new supervised classification tool, the Aristotle Classifier, so that omics data sets can be better leveraged for identifying disease states. The optimized classifier, AC.2021, is benchmarked on multiple data sets against its predecessor and two leading supervised classification tools, Support Vector Machine (SVM) and XGBoost. The new classifier, AC.2021, outperformed existing tools on multiple tests using proteomics data. The underlying code for the classifier, provided herein, would be useful for researchers who desire improved classification accuracy when using their omics data sets to identify disease states.
Collapse
Affiliation(s)
- David Hua
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| | - Heather Desaire
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| |
Collapse
|
6
|
He Q, Sun C, Liu J, Pan Y. MALDI-MSI analysis of cancer drugs: Significance, advances, and applications. Trends Analyt Chem 2021. [DOI: 10.1016/j.trac.2021.116183] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
7
|
Desaire H, Patabandige MW, Hua D. The local-balanced model for improved machine learning outcomes on mass spectrometry data sets and other instrumental data. Anal Bioanal Chem 2021; 413:1583-1593. [PMID: 33580828 PMCID: PMC8516084 DOI: 10.1007/s00216-020-03117-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Revised: 11/17/2020] [Accepted: 12/08/2020] [Indexed: 11/25/2022]
Abstract
One unifying challenge when classifying biological samples with mass spectrometry data is overcoming the obstacle of sample-to-sample variability so that differences between groups, such as between a healthy set and a disease set, can be identified. Similarly, when the same sample is re-analyzed under identical conditions, instrument signals can fluctuate by more than 10%. This signal inconsistency imposes difficulties in identifying subtle differences across a set of samples, and it weakens the mass spectrometrist’s ability to effectively leverage data in domains as diverse as proteomics, metabolomics, glycomics, and imaging. We selected challenging data sets in the fields of glycomics, mass spectrometry imaging, and bacterial typing to study the problem of within-group signal variability and adapted a 30 year old statistical approach to address the problem. The solution, “local-balanced model,” relies on using balanced subsets of training data to classify test samples. This analysis strategy was assessed on ESI-MS data of IgG-based glycopeptides and MALDI-MS imaging data of endogenous lipids, and MALDI-MS data of bacterial proteins. Two preliminary examples on non-mass spectrometry data sets are also included to show the potential generality of the method outside the field of MS analysis. We demonstrate that this approach is superior to simple normalization methods, generalizable to multiple mass spectrometry domains, and potentially appropriate in fields as diverse as physics and satellite imaging. In some cases, improvements in classification can be dramatic, with accuracy escalating from 60% with normalization alone to over 90% with the additional development described herein.
Collapse
Affiliation(s)
- Heather Desaire
- Department of Chemistry, University of Kansas, Lawrence, KS, 66045, USA.
| | | | - David Hua
- Department of Chemistry, University of Kansas, Lawrence, KS, 66045, USA
| |
Collapse
|
8
|
Machine Learning Based Analysis of Human Serum N-glycome Alterations to Follow up Lung Tumor Surgery. Cancers (Basel) 2020; 12:cancers12123700. [PMID: 33317143 PMCID: PMC7764602 DOI: 10.3390/cancers12123700] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/04/2020] [Accepted: 12/07/2020] [Indexed: 12/24/2022] Open
Abstract
Simple Summary Globally, there were around 2.1 million lung cancer cases and 1.8 million deaths in 2018. Hungary—where this study was carried out—had the highest rate of lung cancer in the same year. We developed a new analytical method which can be readily used to follow up the tumor surgery by investigating the glycan (sugar) structures of proteins. As the results of such investigations are very complex, computer-assisted machine learning methods were utilized for data interpretation. Abstract The human serum N-glycome is a valuable source of biomarkers for malignant diseases, already utilized in multiple studies. In this paper, the N-glycosylation changes in human serum proteins were analyzed after surgical lung tumor resection. Seventeen lung cancer patients were involved in this study and the N-glycosylation pattern of their serum samples was analyzed before and after the surgery using capillary electrophoresis separation with laser-induced fluorescent detection. The relative peak areas of 21 N-glycans were evaluated from the acquired electropherograms using machine learning-based data analysis. Individual glycans as well as their subclasses were taken into account during the course of evaluation. For the data analysis, both discrete (e.g., smoker or not) and continuous (e.g., age of the patient) clinical parameters were compared against the alterations in these 21 N-linked carbohydrate structures. The classification tree analysis resulted in a panel of N-glycans, which could be used to follow up on the effects of lung tumor surgical resection.
Collapse
|
9
|
Hua D, Liu X, Go EP, Wang Y, Hummon AB, Desaire H. How to Apply Supervised Machine Learning Tools to MS Imaging Files: Case Study with Cancer Spheroids Undergoing Treatment with the Monoclonal Antibody Cetuximab. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2020; 31:1350-1357. [PMID: 32469221 PMCID: PMC7685566 DOI: 10.1021/jasms.0c00010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
As the field of mass spectrometry imaging continues to grow, so too do its needs for optimal methods of data analysis. One general need in image analysis is the ability to classify the underlying regions within an image, as healthy or diseased, for example. Classification, as a general problem, is often best accomplished by supervised machine learning strategies; unfortunately, conducting supervised machine learning on MS imaging files is not typically done by mass spectrometrists because a high degree of specialized knowledge is needed. To address this problem, we developed a fully open-source approach that facilitates supervised machine learning on MS imaging files, and we demonstrated its implementation on sets of cancer spheroids that either had or had not undergone chemotherapy treatment. These supervised machine learning studies demonstrated that metabolic changes induced by the monoclonal antibody, Cetuximab, are detectable but modest at 24 h, and by 72 h, the drug induces a larger and more diverse metabolic response.
Collapse
Affiliation(s)
- David Hua
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| | - Xin Liu
- Department of Chemistry and Biochemistry and the Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio 43210, United States
| | - Eden P. Go
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| | - Yijia Wang
- Department of Chemistry and Biochemistry and the Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio 43210, United States
| | - Amanda B. Hummon
- Department of Chemistry and Biochemistry and the Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio 43210, United States
| | - Heather Desaire
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| |
Collapse
|
10
|
Shipman JT, Nguyen HT, Desaire H. So You Discovered a Potential Glycan-Based Biomarker; Now What? We Developed a High-Throughput Method for Quantitative Clinical Glycan Biomarker Validation. ACS OMEGA 2020; 5:6270-6276. [PMID: 32258861 PMCID: PMC7114137 DOI: 10.1021/acsomega.9b03334] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 02/25/2020] [Indexed: 05/04/2023]
Abstract
Glycomic-based approaches to discover potential biomarkers have shown great promise in their ability to distinguish between healthy and diseased individuals; these methods can identify when aberrant glycosylation is significant, but they cannot practically be adapted into widely implemented diagnostic assays because they are too complex, expensive, and low-throughput. We have developed a new strategy that addresses challenges associated with sample preparation, sample throughput, instrumentation needs, and data analysis to transfer the valuable knowledge provided by protein glycosylation into a clinical environment. Notably, the detection limits of the assay are in the single-digit picomole range. Proof of principle is demonstrated by quantifying the changes in the sialic acid content in fetuin. As the sialic acid content in proteins varies in a number of disease states, this example demonstrates the utility of the method for biomarker analysis. Furthermore, the developed method can be adapted to other biologically important saccharides, affording a broad array of quantitative glycomic analyses that are accessible in a high-throughput, plate-reader format. These studies enable glycomic-based biomarker discovery efforts to transition through the difficult landscape of developing a potential biomarker into a clinical assay.
Collapse
Affiliation(s)
- Joshua T Shipman
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| | - Hanna T Nguyen
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| | - Heather Desaire
- Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
| |
Collapse
|
11
|
O'Shea K, Misra BB. Software tools, databases and resources in metabolomics: updates from 2018 to 2019. Metabolomics 2020; 16:36. [PMID: 32146531 DOI: 10.1007/s11306-020-01657-3] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 03/01/2020] [Indexed: 12/24/2022]
Abstract
Metabolomics has evolved as a discipline from a discovery and functional genomics tool, and is now a cornerstone in the era of big data-driven precision medicine. Sample preparation strategies and analytical technologies have seen enormous growth, and keeping pace with data analytics is challenging, to say the least. This review introduces and briefly presents around 100 metabolomics software resources, tools, databases, and other utilities that have surfaced or have improved in 2019. Table 1 provides the computational dependencies of the tools, categorizes the resources based on utility and ease of use, and provides hyperlinks to webpages where the tools can be downloaded or used. This review intends to keep the community of metabolomics researchers up to date with all the software tools, resources, and databases developed in 2019, in one place.
Collapse
Affiliation(s)
- Keiron O'Shea
- Institute of Biological, Environmental, and Rural Studies, Aberystwyth University, Ceredigion, Wales, SY23 3DA, UK
| | - Biswapriya B Misra
- Center for Precision Medicine, Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, Medical Center Boulevard, Winston-Salem, NC, 27157, USA.
| |
Collapse
|
12
|
Abrahams JL, Taherzadeh G, Jarvas G, Guttman A, Zhou Y, Campbell MP. Recent advances in glycoinformatic platforms for glycomics and glycoproteomics. Curr Opin Struct Biol 2019; 62:56-69. [PMID: 31874386 DOI: 10.1016/j.sbi.2019.11.009] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 11/05/2019] [Accepted: 11/15/2019] [Indexed: 12/16/2022]
Abstract
Protein glycosylation is the most complex and prevalent post-translation modification in terms of the number of proteins modified and the diversity generated. To understand the functional roles of glycoproteins it is important to gain an insight into the repertoire of oligosaccharides present. The comparison and relative quantitation of glycoforms combined with site-specific identification and occupancy are necessary steps in this direction. Computational platforms have continued to mature assisting researchers with the interpretation of such glycomics and glycoproteomics data sets, but frequently support dedicated workflows and users rely on the manual interpretation of data to gain insights into the glycoproteome. The growth of site-specific knowledge has also led to the implementation of machine-learning algorithms to predict glycosylation which is now being integrated into glycoproteomics pipelines. This short review describes commercial and open-access databases and software with an emphasis on those that are actively maintained and designed to support current analytical workflows.
Collapse
Affiliation(s)
- Jodie L Abrahams
- Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Gabor Jarvas
- Translational Glycomics Research Group, Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Veszprém, Hungary; Horváth Csaba Laboratory of Bioseparation Sciences, Research Centre for Molecular Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Andras Guttman
- Translational Glycomics Research Group, Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Veszprém, Hungary; Horváth Csaba Laboratory of Bioseparation Sciences, Research Centre for Molecular Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary; SCIEX, Brea, CA, USA
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Matthew P Campbell
- Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia.
| |
Collapse
|
13
|
Desaire H, Hua D. Adaption of the Aristotle Classifier for Accurately Identifying Highly Similar Bacteria Analyzed by MALDI-TOF MS. Anal Chem 2019; 92:1050-1057. [PMID: 31769656 DOI: 10.1021/acs.analchem.9b04049] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
MALDI-TOF MS has shown great utility for rapidly identifying microbial species. It can be used to successfully type bacteria and fungi from a variety of sources more rapidly and cost-effectively than traditional methods. One area where improvements are necessary is in the typing of highly similar samples, such as those samples from the same genus but different species or samples from within a single species but from different strains. One promising way to address this current limitation is by using advanced machine learning techniques. In this work, we adapt a newly developed machine learning tool, the Aristotle Classifier, to bacterial classification of MALDI-TOF MS data. This tool was originally developed for classifying glycomics and glycoproteomics data, so we modified it to be well-suited for assigning mass spectral data from bacterial proteins. The classifier exceeds existing benchmarks in classifying bacteria, and it shows particularly strong performance when the samples to be identified are highly similar. The combination of mass spectrometry data and tools like the Aristotle Classifier could ameliorate the ambiguities associated with challenging bacterial classification problems.
Collapse
Affiliation(s)
- Heather Desaire
- Department of Chemistry , University of Kansas , Lawrence , Kansas 66045 , United States
| | - David Hua
- Department of Chemistry , University of Kansas , Lawrence , Kansas 66045 , United States
| |
Collapse
|