1
|
Ji W, Wallace WE. Comprehensive Data Evaluation Methods Used in Developing the SWGDRUG Mass Spectral Reference Library for Seized Drug Identification. Anal Chem 2024; 96:17004-17012. [PMID: 39378263 DOI: 10.1021/acs.analchem.4c04425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2024]
Abstract
The mass spectral library of the Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) is the most comprehensive free reference database of its kind in the world. It provides reliable mass spectra for identification of seized drugs, their metabolites, and related forensic compounds when using gas chromatography/mass spectrometry (GC/MS). The SWGDRUG library (version 3.13) contains spectra for 3598 compounds. All spectra are evaluated by the Mass Spectrometry Data Center (MSDC) at the National Institute of Standards and Technology (NIST). Over the past few years, new evaluation methods aided by improved software have been developed. First, all chemical information, such as chemical structure and name, is confirmed. Second, the product ions in each spectrum are verified to match the compound structure using the NIST MS Interpreter software tool. Subsequently, the mass spectra are compared to the same or similar compounds across six different mass spectral reference libraries using three distinct library search methods. Additionally, the NIST Artificial Intelligence Retention Indices (AIRI) software is used to help confirm the corresponding compounds of spectra, especially for those without molecular ions. Low-quality and incorrect spectra are rejected for inclusion in the library.
Collapse
Affiliation(s)
- Weihua Ji
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - William E Wallace
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
2
|
Wang X, Strobel M, Aron AT, Phelan VV, Acharya DD, Brown CJ, Clevenger K, Hu J, Kretsch A, Mahood EH, Menegatti C, Xiong Q, Wang M. Network Topology Evaluation and Transitive Alignments for Molecular Networking. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:2165-2175. [PMID: 39133821 PMCID: PMC11516331 DOI: 10.1021/jasms.4c00208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Untargeted tandem mass spectrometry (MS/MS) is an essential technique in modern analytical chemistry, providing a comprehensive snapshot of chemical entities in complex samples and identifying unknowns through their fragmentation patterns. This high-throughput approach generates large data sets that can be challenging to interpret. Molecular Networks (MNs) have been developed as a computational tool to aid in the organization and visualization of complex chemical space in untargeted mass spectrometry data, thereby supporting comprehensive data analysis and interpretation. MNs group related compounds with potentially similar structures from MS/MS data by calculating all pairwise MS/MS similarities and filtering these connections to produce a MN. Such networks are instrumental in metabolomics for identifying novel metabolites, elucidating metabolic pathways, and even discovering biomarkers for disease. While MS/MS similarity metrics have been explored in the literature, the influence of network topology approaches on MN construction remains unexplored. This manuscript introduces metrics for evaluating MN construction, benchmarks state-of-the-art approaches, and proposes the Transitive Alignments approach to improve MN construction. The Transitive Alignment technique leverages the MN topology to realign MS/MS spectra of related compounds that differ by multiple structural modifications. Combining this Transitive Alignments approach with pseudoclique finding, a method for identifying highly connected groups of nodes in a network, resulted in more complete and higher-quality molecular families. Finally, we also introduce a targeted network construction technique called induced transitive alignments where we demonstrate effectiveness on a real world natural product discovery application. We release this transitive alignment technique as a high-throughput workflow that can be used by the wider research community.
Collapse
Affiliation(s)
- Xianghu Wang
- Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, California 92521, United States
| | - Michael Strobel
- Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, California 92521, United States
| | - Allegra T Aron
- Department of Chemistry and Biochemistry, University of Denver, 2101 East Wesley Ave, Denver, Colorado 80210, United States
| | - Vanessa V Phelan
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, Anschutz Medical Campus, 12850 E Montview Blvd, Aurora, Colorado 80045, United States
| | - Deepa D Acharya
- Biologicals Research and Development, Corteva Agriscience, 9330 Zionsville Rd, Indianapolis, Indiana 46268, United States
| | - Christopher J Brown
- Regulatory Science, Corteva Agriscience, 9330 Zionsville Rd, Indianapolis, Indiana 46268, United States
| | - Ken Clevenger
- Biologicals Research and Development, Corteva Agriscience, 9330 Zionsville Rd, Indianapolis, Indiana 46268, United States
| | - Jie Hu
- Data Science, Corteva Agriscience, 9330 Zionsville Rd, Indianapolis, Indiana 46268, United States
| | - Ashley Kretsch
- Biologicals Research and Development, Corteva Agriscience, 9330 Zionsville Rd, Indianapolis, Indiana 46268, United States
| | - Elizabeth H Mahood
- Data Science, Corteva Agriscience, 9330 Zionsville Rd, Indianapolis, Indiana 46268, United States
| | - Carla Menegatti
- Biologicals Research and Development, Corteva Agriscience, 9330 Zionsville Rd, Indianapolis, Indiana 46268, United States
| | - Quanbo Xiong
- Biologicals Research and Development, Corteva Agriscience, 9330 Zionsville Rd, Indianapolis, Indiana 46268, United States
| | - Mingxun Wang
- Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, California 92521, United States
| |
Collapse
|
3
|
Tariq U, Saeed F. Predicting peptide properties from mass spectrometry data using deep attention-based multitask network and uncertainty quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.21.609035. [PMID: 39229185 PMCID: PMC11370541 DOI: 10.1101/2024.08.21.609035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Database search algorithms reduce the number of potential candidate peptides against which scoring needs to be performed using a single (i.e. mass) property for filtering. While useful, filtering based on one property may lead to exclusion of non-abundant spectra and uncharacterized peptides - potentially exacerbating the streetlight effect. Here we present ProteoRift, a novel attention and multitask deep-network, which can predict multiple peptide properties (length, missed cleavages, and modification status) directly from spectra. We demonstrate that ProteoRift can predict these properties with up to 97% accuracy resulting in search-space reduction by more than 90%. As a result, our end-to-end pipeline is shown to exhibit 8x to 12x speedups with peptide deduction accuracy comparable to algorithmic techniques. We also formulate two uncertainty estimation metrics, which can distinguish between in-distribution and out-of-distribution data (ROC-AUC 0.99) and predict high-scoring mass spectra against correct peptide (ROC-AUC 0.94). These models and metrics are integrated in an end-to-end ML pipeline available at https://github.com/pcdslab/ProteoRift.
Collapse
Affiliation(s)
- Usman Tariq
- Knight Foundation School of Computing, and Information Sciences, Florida International University (FIU), Miami, FL USA
| | - Fahad Saeed
- Knight Foundation School of Computing, and Information Sciences, Florida International University (FIU), Miami, FL USA
- Biomolecular Sciences Institute (BSI), Florida International University, Miami, FL, USA
- Department of Human and Molecular Genetics, Herbert Wertheim School of Medicine, Florida International University, Miami, FL, USA
| |
Collapse
|
4
|
Wang G, Zhang Z, Liu Y, Burke MC, Sheetlin SL, Stein SE. An XIC-Centric Strategy for Improved Identification and Quantification in Proteomic Data Analyses. J Proteome Res 2024; 23:1571-1582. [PMID: 38594959 DOI: 10.1021/acs.jproteome.3c00633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
Reproducibility is a "proteomic dream" yet to be fully realized. A typical data analysis workflow utilizing extracted ion chromatograms (XICs) often treats the information path from identification to quantification as a one-way street. Here, we propose an XIC-centric approach in which the data flow is bidirectional: identifications are used to derive XICs whose information is in turn applied to validate the identifications. In this study, we employed liquid chromatography-mass spectrometry data from glycoprotein and human hair samples to illustrate the XIC-centric concept. At the core of this approach was XIC-based monoisotope repicking. Taking advantage of the intensity information for all detected isotopes across the whole range of an XIC peak significantly improved the accuracy and uncovered misidentifications originating from monoisotope assignment mistakes. It could also rescue non-top-ranked glycopeptide hits. Identification of glycopeptides is particularly susceptible to precursor mass errors for their low abundances, large masses, and glycans differing by 1 or 2 Da easily confused as isotopes. In addition, the XIC-centric strategy significantly reduced the problem of one XIC peak associated with multiple unique identifications, a source of quantitative irreproducibility. Taken together, the proposed approach can lead to improved identification and quantification accuracy and, ultimately, enhanced reproducibility in proteomic data analyses.
Collapse
Affiliation(s)
- Guanghui Wang
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Zheng Zhang
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Yi Liu
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Meghan C Burke
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Sergey L Sheetlin
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Stephen E Stein
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
5
|
Orsburn BC. Analyzing Posttranslational Modifications in Single Cells. Methods Mol Biol 2024; 2817:145-156. [PMID: 38907153 DOI: 10.1007/978-1-0716-3934-4_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
With the rapid expansion of capabilities in the analysis of proteins in single cells, we can now identify multiple classes of protein posttranslational modifications on some of these proteins. Each new technology that has increased the number of proteins measured per cell has likewise increased our ability to identify and quantify modified peptides. In this chapter, I will discuss our current capabilities, concerns, and challenges specific to this emerging field of study and the inevitable demand for services, providing a general review of concepts that should be considered.
Collapse
Affiliation(s)
- Benjamin C Orsburn
- The Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
6
|
Na S, Paek E. Demystifying PTM Identification Using MODplus: Best Practices and Pitfalls. Methods Mol Biol 2024; 2836:37-55. [PMID: 38995534 DOI: 10.1007/978-1-0716-4007-4_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Tandem mass spectrometry (MS/MS) facilitates the rapid identification of posttranslational modifications (PTMs), which play a pivotal role in regulating numerous biological processes. This chapter explores recent advancements that expand the types of detectable PTMs and enhance the speed of the PTM searches. We also delve into computational challenges associated with searching for a multitude of PTMs simultaneously. The latter section introduces an automated procedure to identify an extensive range of PTMs using MODplus, a free PTM analysis software tool. We guide the reader through the preparation of the modification search, the determination of optional search parameters, the execution of the search, and the analysis of results, exemplified by a case study using specific MS/MS dataset.
Collapse
Affiliation(s)
- Seungjin Na
- Digital Omics Research Center, Korea Basic Science Institute, Cheongju, South Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul, South Korea.
- Department of Artificial Intelligence, Hanyang University, Seoul, South Korea.
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, South Korea.
| |
Collapse
|
7
|
Bittremieux W, Avalon NE, Thomas SP, Kakhkhorov SA, Aksenov AA, Gomes PWP, Aceves CM, Caraballo-Rodríguez AM, Gauglitz JM, Gerwick WH, Huan T, Jarmusch AK, Kaddurah-Daouk RF, Kang KB, Kim HW, Kondić T, Mannochio-Russo H, Meehan MJ, Melnik AV, Nothias LF, O'Donovan C, Panitchpakdi M, Petras D, Schmid R, Schymanski EL, van der Hooft JJJ, Weldon KC, Yang H, Xing S, Zemlin J, Wang M, Dorrestein PC. Open access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics. Nat Commun 2023; 14:8488. [PMID: 38123557 PMCID: PMC10733301 DOI: 10.1038/s41467-023-44035-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/28/2023] [Indexed: 12/23/2023] Open
Abstract
Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of MS/MS spectra originating from published untargeted metabolomics experiments. Entries in this library, or "suspects," were derived from unannotated spectra that could be linked in a molecular network to an annotated spectrum. Annotations were propagated to unknowns based on structural relationships to reference molecules using MS/MS-based spectrum alignment. We demonstrate the broad relevance of the nearest neighbor suspect spectral library through representative examples of propagation-based annotation of acylcarnitines, bacterial and plant natural products, and drug metabolism. Our results also highlight how the library can help to better understand an Alzheimer's brain phenotype. The nearest neighbor suspect spectral library is openly available for download or for data analysis through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in untargeted metabolomics data.
Collapse
Affiliation(s)
- Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020, Antwerpen, Belgium.
| | - Nicole E Avalon
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, 92093, USA
| | - Sydney P Thomas
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
| | - Sarvar A Kakhkhorov
- Laboratory of Physical and Chemical Methods of Research, Center for Advanced Technologies, Tashkent, 100174, Uzbekistan
- Department of Food Science, Faculty of Science, University of Copenhagen, Rolighedsvej 26, 1958, Frederiksberg C, Denmark
| | - Alexander A Aksenov
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Chemistry, University of Connecticut, Storrs, CT, 06269, USA
- Arome Science inc., Farmington, CT, 06032, USA
| | - Paulo Wender P Gomes
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
| | - Christine M Aceves
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Andrés Mauricio Caraballo-Rodríguez
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
| | - Julia M Gauglitz
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
| | - Tao Huan
- Department of Chemistry, University of British Columbia, Vancouver, BC, V6T 1Z1, Canada
| | - Alan K Jarmusch
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Immunity, Inflammation, and Disease Laboratory, Division of Intramural Research, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA
| | - Rima F Kaddurah-Daouk
- Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC, 27701, USA
- Department of Medicine, Duke University, Durham, NC, 27710, USA
- Duke Institute of Brain Sciences, Duke University, Durham, NC, 27710, USA
| | - Kyo Bin Kang
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Sookmyung Women's University, Seoul, 04310, Korea
| | - Hyun Woo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University, Goyang, 10326, Korea
| | - Todor Kondić
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4367, Belvaux, Luxembourg
| | - Helena Mannochio-Russo
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Biochemistry and Organic Chemistry, Institute of Chemistry, São Paulo State University, Araraquara, 14800-901, Brazil
| | - Michael J Meehan
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
| | - Alexey V Melnik
- Department of Chemistry, University of Connecticut, Storrs, CT, 06269, USA
- Arome Science inc., Farmington, CT, 06032, USA
| | - Louis-Felix Nothias
- Université Côte d'Azur, CNRS, ICN, Nice, France
- Interdisciplinary Institute for Artificial Intelligence (3iA) Côte d'Azur, Nice, France
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Morgan Panitchpakdi
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
| | - Daniel Petras
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Interfaculty Institute of Microbiology and Infection Medicine, University of Tuebingen, 72076, Tuebingen, Germany
- Department of Biochemistry, University of California Riverside, Riverside, CA, 92507, USA
| | - Robin Schmid
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
| | - Emma L Schymanski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4367, Belvaux, Luxembourg
| | - Justin J J van der Hooft
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Bioinformatics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
| | - Kelly C Weldon
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
| | - Heejung Yang
- Laboratory of Natural Products Chemistry, College of Pharmacy, Kangwon National University, Chuncheon, 24341, Korea
| | - Shipei Xing
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Chemistry, University of British Columbia, Vancouver, BC, V6T 1Z1, Canada
| | - Jasmine Zemlin
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
| | - Mingxun Wang
- Department of Computer Science and Engineering, University of California Riverside, Riverside, CA, 92507, USA
| | - Pieter C Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA.
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
8
|
Prunier G, Cherkaoui M, Lysiak A, Langella O, Blein-Nicolas M, Lollier V, Benoist E, Jean G, Fertin G, Rogniaux H, Tessier D. Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides. BMC Bioinformatics 2023; 24:421. [PMID: 37940845 PMCID: PMC10631047 DOI: 10.1186/s12859-023-05555-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 10/30/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND In proteomics, the interpretation of mass spectra representing peptides carrying multiple complex modifications remains challenging, as it is difficult to strike a balance between reasonable execution time, a limited number of false positives, and a huge search space allowing any number of modifications without a priori. The scientific community needs new developments in this area to aid in the discovery of novel post-translational modifications that may play important roles in disease. RESULTS To make progress on this issue, we implemented SpecGlobX (SpecGlob eXTended to eXperimental spectra), a standalone Java application that quickly determines the best spectral alignments of a (possibly very large) list of Peptide-to-Spectrum Matches (PSMs) provided by any open modification search method, or generated by the user. As input, SpecGlobX reads a file containing spectra in MGF or mzML format and a semicolon-delimited spreadsheet describing the PSMs. SpecGlobX returns the best alignment for each PSM as output, splitting the mass difference between the spectrum and the peptide into one or more shifts while considering the possibility of non-aligned masses (a phenomenon resulting from many situations including neutral losses). SpecGlobX is fast, able to align one million PSMs in about 1.5 min on a standard desktop. Firstly, we remind the foundations of the algorithm and detail how we adapted SpecGlob (the method we previously developed following the same aim, but limited to the interpretation of perfect simulated spectra) to the interpretation of imperfect experimental spectra. Then, we highlight the interest of SpecGlobX as a complementary tool downstream to three open modification search methods on a large simulated spectra dataset. Finally, we ran SpecGlobX on a proteome-wide dataset downloaded from PRIDE to demonstrate that SpecGlobX functions just as well on simulated and experimental spectra. We then carefully analyzed a limited set of interpretations. CONCLUSIONS SpecGlobX is helpful as a decision support tool, providing keys to interpret peptides carrying complex modifications still poorly considered by current open modification search software. Better alignment of PSMs enhances confidence in the identification of spectra provided by open modification search methods and should improve the interpretation rate of spectra.
Collapse
Affiliation(s)
- Grégoire Prunier
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France
- INRAE, UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France
| | - Mehdi Cherkaoui
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France
- INRAE, UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France
| | - Albane Lysiak
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France
- Nantes Université, CNRS, LS2N, UMR 6004, 44000, Nantes, France
| | - Olivier Langella
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, PAPPSO, 91190, Gif-Sur-Yvette, France
| | - Mélisande Blein-Nicolas
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, PAPPSO, 91190, Gif-Sur-Yvette, France
| | - Virginie Lollier
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France
- INRAE, UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France
| | - Emile Benoist
- Nantes Université, CNRS, LS2N, UMR 6004, 44000, Nantes, France
| | - Géraldine Jean
- Nantes Université, CNRS, LS2N, UMR 6004, 44000, Nantes, France
| | | | - Hélène Rogniaux
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France
- INRAE, UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France
| | - Dominique Tessier
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France.
- INRAE, UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France.
| |
Collapse
|
9
|
Li Y, Fiehn O. Flash entropy search to query all mass spectral libraries in real time. Nat Methods 2023; 20:1475-1478. [PMID: 37735567 PMCID: PMC11511675 DOI: 10.1038/s41592-023-02012-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 08/15/2023] [Indexed: 09/23/2023]
Abstract
Public repositories of metabolomics mass spectra encompass more than 1 billion entries. With open search, dot product or entropy similarity, comparisons of a single tandem mass spectrometry spectrum take more than 8 h. Flash entropy search speeds up calculations more than 10,000 times to query 1 billion spectra in less than 2 s, without loss in accuracy. It benefits from using multiple threads and GPU calculations. This algorithm can fully exploit large spectral libraries with little memory overhead for any mass spectrometry laboratory.
Collapse
Affiliation(s)
- Yuanyue Li
- West Coast Metabolomics Center, UC Davis Genome Center, University of California, Davis, CA, USA
| | - Oliver Fiehn
- West Coast Metabolomics Center, UC Davis Genome Center, University of California, Davis, CA, USA.
| |
Collapse
|
10
|
Moorthy A, Kearsley A, Mallard W, Wallace W, Stein S. Inferring the Nominal Molecular Mass of an Analyte from Its Electron Ionization Mass Spectrum. Anal Chem 2023; 95:13132-13139. [PMID: 37610141 PMCID: PMC10560098 DOI: 10.1021/acs.analchem.3c01815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
The performance of three algorithms for predicting nominal molecular mass from an analyte's electron ionization mass spectrum is presented. The Peak Interpretation Method (PIM) attempts to quantify the likelihood that a molecular ion peak is contained in the mass spectrum, whereas the Simple Search Hitlist Method (SS-HM) and iterative Hybrid Search Hitlist Method (iHS-HM) leverage results from mass spectral library searching. These predictions can be employed in combination (recommended) or independently. The methods were tested on two sets of query mass spectra searched against libraries that did not contain the reference mass spectra of the same compounds: 19,074 spectra of various organic molecules searched against the NIST17 mass spectral library and 162 spectra of small molecule drugs searched against SWGDRUG version 3.3. Individually, each molecular mass prediction method had computed precisions (the fraction of positive predictions that were correct) of 91, 89, and 74%, respectively. The methods become more valuable when predictions are taken together. When all three predictions were identical, which occurred in 33% of the test cases, the predicted molecular mass was almost always correct (>99%).
Collapse
Affiliation(s)
- A.S. Moorthy
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - A.J. Kearsley
- Mathematical Analysis and Modeling Group, Applied and Computational Mathematics Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - W.G. Mallard
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - W.E. Wallace
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - S.E. Stein
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
11
|
Wallace WE, Moorthy AS. NIST Mass Spectrometry Data Center standard reference libraries and software tools: Application to seized drug analysis. J Forensic Sci 2023; 68:1484-1493. [PMID: 37203286 PMCID: PMC10517720 DOI: 10.1111/1556-4029.15284] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/27/2023] [Accepted: 05/04/2023] [Indexed: 05/20/2023]
Abstract
The standard reference libraries and associated custom software provided by the National Institute of Standards and Technology's Mass Spectrometry Data Center (NIST MSDC) are described with a focus on assisting the seized drug analyst with the identification of fentanyl-related substances (FRS). These tools are particularly useful when encountering novel substances when no certified sample is available. The MSDC provides three standard reference mass spectral libraries, as well as six software packages for mass spectral analysis, reference library searching, data interpretation, and measurement uncertainty estimation. Each of these libraries and software packages are described with references to the original publications provided. Examples of fentanyl identification by gas chromatography-mass spectrometry (GC-MS) and by direct analysis in real-time (DART) mass spectrometry are given. A link to online tutorials is provided.
Collapse
Affiliation(s)
- William E Wallace
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Arun S Moorthy
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| |
Collapse
|
12
|
Neely BA, Ellisor DL, Davis WC. Proteomics as a Metrological Tool to Evaluate Genome Annotation Accuracy Following De Novo Genome Assembly: A Case Study Using the Atlantic Bottlenose Dolphin ( Tursiops truncatus). Genes (Basel) 2023; 14:1696. [PMID: 37761836 PMCID: PMC10531373 DOI: 10.3390/genes14091696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 08/22/2023] [Accepted: 08/23/2023] [Indexed: 09/29/2023] Open
Abstract
The last decade has witnessed dramatic improvements in whole-genome sequencing capabilities coupled to drastically decreased costs, leading to an inundation of high-quality de novo genomes. For this reason, the continued development of genome quality metrics is imperative. Using the 2016 Atlantic bottlenose dolphin NCBI RefSeq annotation and mass spectrometry-based proteomic analysis of six tissues, we confirmed 10,402 proteins from 4711 protein groups, constituting nearly one-third of the possible predicted proteins. Since the identification of larger proteins with more identified peptides implies reduced database fragmentation and improved gene annotation accuracy, we propose the metric NP10, which attempts to capture this quality improvement. The NP10 metric is calculated by first stratifying proteomic results by identifying the top decile (or 10th 10-quantile) of identified proteins based on the number of peptides per protein and then returns the median molecular weight of the resulting proteins. When using the 2016 versus 2012 Tursiops truncatus genome annotation to search this proteomic data set, there was a 21% improvement in NP10. This metric was further demonstrated by using a publicly available proteomic data set to compare human genome annotations from 2004, 2013 and 2016, which showed a 33% improvement in NP10. These results demonstrate that proteomics may be a useful metrological tool to benchmark genome accuracy, though there is a need for reference proteomic datasets across species to facilitate the evaluation of new de novo and existing genome.
Collapse
Affiliation(s)
- Benjamin A. Neely
- National Institute of Standards and Technology, NIST Charleston, 331 Fort Johnson Road, Charleston, SC 29412, USA; (D.L.E.); (W.C.D.)
| | | | | |
Collapse
|
13
|
Geer LY, Lapin J, Slotta DJ, Mak TD, Stein SE. AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence. J Proteome Res 2023; 22:2246-2255. [PMID: 37232537 PMCID: PMC10542943 DOI: 10.1021/acs.jproteome.2c00807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The unbounded permutations of biological molecules, including proteins and their constituent peptides, present a dilemma in identifying the components of complex biosamples. Sequence search algorithms used to identify peptide spectra can be expanded to cover larger classes of molecules, including more modifications, isoforms, and atypical cleavage, but at the cost of false positives or false negatives due to the simplified spectra they compute from sequence records. Spectral library searching can help solve this issue by precisely matching experimental spectra to library spectra with excellent sensitivity and specificity. However, compiling spectral libraries that span entire proteomes is pragmatically difficult. Neural networks that predict complete spectra containing a full range of annotated and unannotated ions can be used to replace these simplified spectra with libraries of fully predicted spectra, including modified peptides. Using such a network, we created predicted spectral libraries that were used to rescore matches from a sequence search done over a large search space, including a large number of modifications. Rescoring improved the separation of true and false hits by 82%, yielding an 8% increase in peptide identifications, including a 21% increase in nonspecifically cleaved peptides and a 17% increase in phosphopeptides.
Collapse
Affiliation(s)
- Lewis Y. Geer
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Joel Lapin
- Department of Physics, Georgetown University, Washington, DC 20057, United States
- Associate, Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Douglas J. Slotta
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Tytus D. Mak
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Stephen E. Stein
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| |
Collapse
|
14
|
Searle BC, Shannon AE, Wilburn DB. Scribe: Next Generation Library Searching for DDA Experiments. J Proteome Res 2023; 22:482-490. [PMID: 36695531 DOI: 10.1021/acs.jproteome.2c00672] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Spectrum library searching is a powerful alternative to database searching for data dependent acquisition experiments, but has been historically limited to identifying previously observed peptides in libraries. Here we present Scribe, a new library search engine designed to leverage deep learning fragmentation prediction software such as Prosit. Rather than relying on highly curated DDA libraries, this approach predicts fragmentation and retention times for every peptide in a FASTA database. Scribe embeds Percolator for false discovery rate correction and an interference tolerant, label-free quantification integrator for an end-to-end proteomics workflow. By leveraging expected relative fragmentation and retention time values, we find that library searching with Scribe can outperform traditional database searching tools both in terms of sensitivity and quantitative precision. Scribe and its graphical interface are easy to use, freely accessible, and fully open source.
Collapse
Affiliation(s)
- Brian C Searle
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States.,Proteome Software Inc., Portland, Oregon97219, United States
| | - Ariana E Shannon
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| | - Damien Beau Wilburn
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| |
Collapse
|
15
|
Orsburn BC, Yuan Y, Bumpus NN. Insights into protein post-translational modification landscapes of individual human cells by trapped ion mobility time-of-flight mass spectrometry. Nat Commun 2022; 13:7246. [PMID: 36433961 PMCID: PMC9700839 DOI: 10.1038/s41467-022-34919-w] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Accepted: 11/11/2022] [Indexed: 11/26/2022] Open
Abstract
Single cell proteomics is a powerful tool with potential for markedly enhancing understanding of cellular processes. Here we report the development and application of multiplexed single cell proteomics using trapped ion mobility time-of-flight mass spectrometry. When employing a carrier channel to improve peptide signal, this method allows over 40,000 tandem mass spectra to be acquired in 30 min. Using a KRASG12C model human-derived cell line, we demonstrate the quantification of over 1200 proteins per cell with high relative sequence coverage permitting the detection of multiple classes of post-translational modifications in single cells. When cells were treated with a KRASG12C covalent inhibitor, this approach revealed cell-to-cell variability in the impact of the drug, providing insight missed by traditional proteomics. We provide multiple resources necessary for the application of single cell proteomics to drug treatment studies including tools to reduce cell cycle linked proteomic effects from masking pharmacological phenotypes.
Collapse
Affiliation(s)
- Benjamin C Orsburn
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University, 21205, Baltimore, MD, USA.
| | - Yuting Yuan
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University, 21205, Baltimore, MD, USA
| | - Namandjé N Bumpus
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University, 21205, Baltimore, MD, USA.
| |
Collapse
|
16
|
Bittremieux W, Wang M, Dorrestein PC. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 2022; 18:94. [PMID: 36409434 PMCID: PMC10284100 DOI: 10.1007/s11306-022-01947-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/19/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. AIM OF REVIEW We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. KEY SCIENTIFIC CONCEPTS OF REVIEW This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.
Collapse
Affiliation(s)
- Wout Bittremieux
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
| | - Mingxun Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, 92507, USA
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
17
|
Bittremieux W, Schmid R, Huber F, van der Hooft JJJ, Wang M, Dorrestein PC. Comparison of Cosine, Modified Cosine, and Neutral Loss Based Spectrum Alignment For Discovery of Structurally Related Molecules. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2022; 33:1733-1744. [PMID: 35960544 DOI: 10.1021/jasms.2c00153] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Spectrum alignment of tandem mass spectrometry (MS/MS) data using the modified cosine similarity and subsequent visualization as molecular networks have been demonstrated to be a useful strategy to discover analogs of molecules from untargeted MS/MS-based metabolomics experiments. Recently, a neutral loss matching approach has been introduced as an alternative to MS/MS-based molecular networking with an implied performance advantage in finding analogs that cannot be discovered using existing MS/MS spectrum alignment strategies. To comprehensively evaluate the scoring properties of neutral loss matching, the cosine similarity, and the modified cosine similarity, similarity measures of 955 228 peptide MS/MS spectrum pairs and 10 million small molecule MS/MS spectrum pairs were compared. This comparative analysis revealed that the modified cosine similarity outperformed neutral loss matching and the cosine similarity in all cases. The data further indicated that the performance of MS/MS spectrum alignment depends on the location and type of the modification, as well as the chemical compound class of fragmented molecules.
Collapse
Affiliation(s)
- Wout Bittremieux
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, California 92093, United States
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Robin Schmid
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, California 92093, United States
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Florian Huber
- Centre for Digitalization and Digitality, University of Applied Sciences, 40476 Düsseldorf, Germany
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, 6708PB Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa
| | - Mingxun Wang
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, California 92093, United States
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, California 92093, United States
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| |
Collapse
|
18
|
MS4A15 drives ferroptosis resistance through calcium-restricted lipid remodeling. Cell Death Differ 2022; 29:670-686. [PMID: 34663908 PMCID: PMC8901757 DOI: 10.1038/s41418-021-00883-z] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 09/20/2021] [Accepted: 09/23/2021] [Indexed: 01/07/2023] Open
Abstract
Ferroptosis is an iron-dependent form of cell death driven by biochemical processes that promote oxidation within the lipid compartment. Calcium (Ca2+) is a signaling molecule in diverse cellular processes such as migration, neurotransmission, and cell death. Here, we uncover a crucial link between ferroptosis and Ca2+ through the identification of the novel tetraspanin MS4A15. MS4A15 localizes to the endoplasmic reticulum, where it blocks ferroptosis by depleting luminal Ca2+ stores and reprogramming membrane phospholipids to ferroptosis-resistant species. Specifically, prolonged Ca2+ depletion inhibits lipid elongation and desaturation, driving lipid droplet dispersion and formation of shorter, more saturated ether lipids that protect phospholipids from ferroptotic reactive species. We further demonstrate that increasing luminal Ca2+ levels can preferentially sensitize refractory cancer cell lines. In summary, MS4A15 regulation of anti-ferroptotic lipid reservoirs provides a key resistance mechanism that is distinct from antioxidant and lipid detoxification pathways. Manipulating Ca2+ homeostasis offers a compelling strategy to balance cellular lipids and cell survival in ferroptosis-associated diseases.
Collapse
|
19
|
Lee H, Kim SI. Review of Liquid Chromatography-Mass Spectrometry-Based Proteomic Analyses of Body Fluids to Diagnose Infectious Diseases. Int J Mol Sci 2022; 23:ijms23042187. [PMID: 35216306 PMCID: PMC8878692 DOI: 10.3390/ijms23042187] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 02/11/2022] [Accepted: 02/14/2022] [Indexed: 01/27/2023] Open
Abstract
Rapid and precise diagnostic methods are required to control emerging infectious diseases effectively. Human body fluids are attractive clinical samples for discovering diagnostic targets because they reflect the clinical statuses of patients and most of them can be obtained with minimally invasive sampling processes. Body fluids are good reservoirs for infectious parasites, bacteria, and viruses. Therefore, recent clinical proteomics methods have focused on body fluids when aiming to discover human- or pathogen-originated diagnostic markers. Cutting-edge liquid chromatography-mass spectrometry (LC-MS)-based proteomics has been applied in this regard; it is considered one of the most sensitive and specific proteomics approaches. Here, the clinical characteristics of each body fluid, recent tandem mass spectroscopy (MS/MS) data-acquisition methods, and applications of body fluids for proteomics regarding infectious diseases (including the coronavirus disease of 2019 [COVID-19]), are summarized and discussed.
Collapse
Affiliation(s)
- Hayoung Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute (KBSI), Ochang 28119, Korea;
- Bio-Analytical Science Division, University of Science and Technology (UST), Daejeon 34113, Korea
| | - Seung Il Kim
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute (KBSI), Ochang 28119, Korea;
- Bio-Analytical Science Division, University of Science and Technology (UST), Daejeon 34113, Korea
- Correspondence:
| |
Collapse
|
20
|
Rainer J, Vicini A, Salzer L, Stanstrup J, Badia JM, Neumann S, Stravs MA, Verri Hernandes V, Gatto L, Gibb S, Witting M. A Modular and Expandable Ecosystem for Metabolomics Data Annotation in R. Metabolites 2022; 12:metabo12020173. [PMID: 35208247 PMCID: PMC8878271 DOI: 10.3390/metabo12020173] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/01/2022] [Accepted: 02/04/2022] [Indexed: 01/27/2023] Open
Abstract
Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics experiments have become increasingly popular because of the wide range of metabolites that can be analyzed and the possibility to measure novel compounds. LC-MS instrumentation and analysis conditions can differ substantially among laboratories and experiments, thus resulting in non-standardized datasets demanding customized annotation workflows. We present an ecosystem of R packages, centered around the MetaboCoreUtils, MetaboAnnotation and CompoundDb packages that together provide a modular infrastructure for the annotation of untargeted metabolomics data. Initial annotation can be performed based on MS1 properties such as m/z and retention times, followed by an MS2-based annotation in which experimental fragment spectra are compared against a reference library. Such reference databases can be created and managed with the CompoundDb package. The ecosystem supports data from a variety of formats, including, but not limited to, MSP, MGF, mzML, mzXML, netCDF as well as MassBank text files and SQL databases. Through its highly customizable functionality, the presented infrastructure allows to build reproducible annotation workflows tailored for and adapted to most untargeted LC-MS-based datasets. All core functionality, which supports base R data types, is exported, also facilitating its re-use in other R packages. Finally, all packages are thoroughly unit-tested and documented and are available on GitHub and through Bioconductor.
Collapse
Affiliation(s)
- Johannes Rainer
- Institute for Biomedicine (Affiliated to the University of Lübeck), Eurac Research, 39100 Bozen, Italy; (A.V.); (V.V.H.)
- Correspondence:
| | - Andrea Vicini
- Institute for Biomedicine (Affiliated to the University of Lübeck), Eurac Research, 39100 Bozen, Italy; (A.V.); (V.V.H.)
| | - Liesa Salzer
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, 85764 Neuherberg, Germany;
| | - Jan Stanstrup
- Department of Nutrition, Exercise and Sports, University of Copenhagen, 1985 Frederiksberg, Denmark;
| | - Josep M. Badia
- Department of Electronic Engineering & IISPV, Universitat Rovira i Virgili, 43007 Tarragona, Spain;
- CIBER de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data, 06120 Halle, Germany;
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Michael A. Stravs
- Department of Environmental Chemistry, Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland;
- Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland
| | - Vinicius Verri Hernandes
- Institute for Biomedicine (Affiliated to the University of Lübeck), Eurac Research, 39100 Bozen, Italy; (A.V.); (V.V.H.)
- School of Medicine and Surgery, Università degli Studi di Milano-Bicocca, 20854 Vedano al Lambro, Italy
| | - Laurent Gatto
- Computational Biology and Bioinformatics Unit, de Duve Institute, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Sebastian Gibb
- Department of Anesthesiology and Intensive Care, University Medicine Greifswald, 17475 Greifswald, Germany;
| | - Michael Witting
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, 85764 Neuherberg, Germany;
- Chair of Analytical Food Chemistry, TUM School of Life Sciences, Technical University of Munich, 85354 Freising-Weihenstephan, Germany
| |
Collapse
|
21
|
Lee SY, Lee ST, Suh S, Ko BJ, Oh HB. Revealing Unknown Controlled Substances and New Psychoactive Substances Using High-Resolution LC-MS/MS Machine Learning Models and the Hybrid Similarity Search Algorithm. J Anal Toxicol 2021; 46:732-742. [PMID: 34498039 DOI: 10.1093/jat/bkab098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/11/2021] [Accepted: 09/08/2021] [Indexed: 11/12/2022] Open
Abstract
High-resolution LC-MS/MS tandem mass spectra-based machine learning models are constructed to address the analytical challenge of identifying unknown controlled substances and new psychoactive substances (NPS's). Using a training set comprised of 770 LC-MS/MS barcode spectra (with binary entries 0 or 1) obtained generally by high-resolution mass spectrometers, three classification machine learning models were generated and evaluated. The three models are artificial neural network (ANN), support vector machine (SVM), and k-nearest neighbor (k-NN) models. In these models, controlled substances and NPS's were classified into 13 subgroups (benzylpiperazine, opiate, benzodiazepine, amphetamine, cocaine, methcathinone, classical cannabinoid, fentanyl, 2C series, indazole carbonyl compound, indole carbonyl compound, phencyclidine, and others). Using 193 LC-MS/MS barcode spectra as an external test set, accuracy of the ANN, SVM, and k-NN models were evaluated as 72.5%, 90.0%, and 94.3%, respectively. Also, the hybrid similarity search (HSS) algorithm was evaluated to examine whether this algorithm can successfully identify unknown controlled substances and NPS's whose data are unavailable in the database. When only 24 representative LC-MS/MS spectra of controlled substances and NPS's were selectively included in the database, it was found that HSS can successfully identify compounds with high reliability. The machine learning models and HSS algorithms are incorporated into our home-coded AI-SNPS (artificial intelligence screener for narcotic drugs and psychotropic substances) standalone software that is equipped with a graphic user interface. The use of this software allows unknown controlled substances and NPS's to be identified in a convenient manner.
Collapse
Affiliation(s)
- So Yeon Lee
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| | - Sang Tak Lee
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| | - Sungill Suh
- Forensic genetics & chemistry division, Supreme prosecutors' office, Seoul 06590, Republic of Korea
| | - Bum Jun Ko
- Forensic genetics & chemistry division, Supreme prosecutors' office, Seoul 06590, Republic of Korea
| | - Han Bin Oh
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| |
Collapse
|
22
|
Moorthy AS, Sisco E. A New Library-Search Algorithm for Mixture Analysis Using DART-MS. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1725-1734. [PMID: 34137604 PMCID: PMC9808406 DOI: 10.1021/jasms.1c00097] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Forensic analysis of seized drug evidence often involves determining whether the components of an unknown mixture are illicit compounds. One approach to this task is to screen the evidence using direct analysis in real time mass spectrometry (DART-MS) to make presumptive identifications. This manuscript introduces a new library-search algorithm that enhances presumptive identifications of mixture components using a series of in-source collision-induced dissociation mass spectra collected through DART-MS. The multistage search, titled the Inverted Library-Search Algorithm (ILSA), identifies potential components in a mixture by first searching the lowest fragmentation mass spectrum for target peaks, assuming these peaks are protonated molecules, and then scoring each target peak with possible library matches. As a proof of concept, the ILSA is demonstrated through several example searches of model seized drug mixtures of acetyl fentanyl, benzyl fentanyl, amphetamine, and methamphetamine searched against a small library of select compounds and the freely available NIST DART-MS Forensics Database. Discussion of the search results and several open areas of research to further extend the method are provided. This new approach for presumptive identification provides analysts with refined information about mixture components and will be of immediate importance in forensic analysis using DART-MS. A prototype implementation of the ILSA is available at https://github.com/asm3-nist/DART-MS-DST.
Collapse
|
23
|
Smythers AL, Hicks LM. Mapping the plant proteome: tools for surveying coordinating pathways. Emerg Top Life Sci 2021; 5:203-220. [PMID: 33620075 PMCID: PMC8166341 DOI: 10.1042/etls20200270] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 02/07/2021] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Plants rapidly respond to environmental fluctuations through coordinated, multi-scalar regulation, enabling complex reactions despite their inherently sessile nature. In particular, protein post-translational signaling and protein-protein interactions combine to manipulate cellular responses and regulate plant homeostasis with precise temporal and spatial control. Understanding these proteomic networks are essential to addressing ongoing global crises, including those of food security, rising global temperatures, and the need for renewable materials and fuels. Technological advances in mass spectrometry-based proteomics are enabling investigations of unprecedented depth, and are increasingly being optimized for and applied to plant systems. This review highlights recent advances in plant proteomics, with an emphasis on spatially and temporally resolved analysis of post-translational modifications and protein interactions. It also details the necessity for generation of a comprehensive plant cell atlas while highlighting recent accomplishments within the field.
Collapse
Affiliation(s)
- Amanda L Smythers
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, U.S.A
| | - Leslie M Hicks
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, U.S.A
| |
Collapse
|
24
|
Lysiak A, Fertin G, Jean G, Tessier D. Evaluation of open search methods based on theoretical mass spectra comparison. BMC Bioinformatics 2021; 22:65. [PMID: 33902435 PMCID: PMC8073971 DOI: 10.1186/s12859-021-03963-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 01/08/2021] [Indexed: 11/17/2022] Open
Abstract
Background Mass spectrometry remains the privileged method to characterize proteins. Nevertheless, most of the spectra generated by an experiment remain unidentified after their analysis, mostly because of the modifications they carry. Open Modification Search (OMS) methods offer a promising answer to this problem. However, assessing the quality of OMS identifications remains a difficult task. Methods Aiming at better understanding the relationship between (1) similarity of pairs of spectra provided by OMS methods and (2) relevance of their corresponding peptide sequences, we used a dataset composed of theoretical spectra only, on which we applied two OMS strategies. We also introduced two appropriately defined measures for evaluating the above mentioned spectra/sequence relevance in this context: one is a color classification representing the level of difficulty to retrieve the proper sequence of the peptide that generated the identified spectrum ; the other, called LIPR, is the proportion of common masses, in a given Peptide Spectrum Match (PSM), that represent dissimilar sequences. These two measures were also considered in conjunction with the False Discovery Rate (FDR). Results According to our measures, the strategy that selects the best candidate by taking the mass difference between two spectra into account yields better quality results. Besides, although the FDR remains an interesting indicator in OMS methods (as shown by LIPR), it is questionable: indeed, our color classification shows that a non negligible proportion of relevant spectra/sequence interpretations corresponds to PSMs coming from the decoy database. Conclusions The three above mentioned measures allowed us to clearly determine which of the two studied OMS strategies outperformed the other, both in terms of number of identifications and of accuracy of these identifications. Even though quality evaluation of PSMs in OMS methods remains challenging, the study of theoretical spectra is a favorable framework for going further in this direction.
Collapse
Affiliation(s)
- Albane Lysiak
- CNRS, LS2N, Université de Nantes, 44000, Nantes, France.,UR BIA, INRAE, 44316, Nantes, France
| | | | | | - Dominique Tessier
- BIBS Facility, INRAE, 44316, Nantes, France.,UR BIA, INRAE, 44316, Nantes, France
| |
Collapse
|
25
|
Aggarwal S, Tolani P, Gupta S, Yadav AK. Posttranslational modifications in systems biology. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:93-126. [PMID: 34340775 DOI: 10.1016/bs.apcsb.2021.03.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The biological complexity cannot be captured by genes or proteins alone. The protein posttranslational modifications (PTMs) impart functional diversity to the proteome and regulate protein structure, activity, localization and interactions. Their dynamics drive cellular signaling, growth and development while their dysregulation causes many diseases. Mass spectrometry based quantitative profiling of PTMs and bioinformatics analysis tools allow systems level insights into their network architecture. High-resolution profiling of PTM networks will advance disease understanding and precision medicine. It can accelerate the discovery of biomarkers and drug targets. This requires better tools for unbiased, high-throughput and accurate PTM identification, site localization and automated annotation on a systems level.
Collapse
Affiliation(s)
- Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; Department of Molecular Biology and Biotechnology, Cotton University, Guwahati, Assam, India
| | - Priya Tolani
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India
| | - Srishti Gupta
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India.
| |
Collapse
|
26
|
Guan S, Bythell BJ. Size Dependent Fragmentation Chemistry of Short Doubly Protonated Tryptic Peptides. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1020-1032. [PMID: 33779179 DOI: 10.1021/jasms.1c00009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Tandem mass spectrometry of electrospray ionized multiply charged peptide ions is commonly used to identify the sequence of peptide(s) and infer the identity of source protein(s). Doubly protonated peptide ions are consistently the most efficiently sequenced ions following collision-induced dissociation of peptides generated by tryptic digestion. While the broad characteristics of longer (N ≥ 8 residue) doubly protonated peptides have been investigated, there is comparatively little data on shorter systems where charge repulsion should exhibit the greatest influence on the dissociation chemistry. To address this gap and further understand the chemistry underlying collisional-dissociation of doubly charged tryptic peptides, two series of analytes ([GxR+2H]2+ and [AxR+2H]2+, x = 2-5) were investigated experimentally and with theory. We find distinct differences in the preference of bond cleavage sites for these peptides as a function of size and to a lesser extent composition. Density functional calculations at two levels of theory predict that the threshold relative energies required for bond cleavages at the same site for peptides of different size are quite similar (for example, b2-yN-2). In isolation, this finding is inconsistent with experiment. However, the predicted extent of entropy change of these reactions is size dependent. Subsequent RRKM rate constant calculations provide a far clearer picture of the kinetics of the competing bond cleavage reactions enabling rationalization of experimental findings. The M06-2X data were substantially more consistent with experiment than were the B3LYP data.
Collapse
Affiliation(s)
- Shanshan Guan
- Department of Chemistry and Biochemistry, Ohio University, 307 Chemistry Building, Athens, Ohio 45701, United States
- Department of Chemistry and Biochemistry, University of Missouri-St. Louis, 1 University Boulevard, St. Louis, Missouri 63121, United States
| | - Benjamin J Bythell
- Department of Chemistry and Biochemistry, Ohio University, 307 Chemistry Building, Athens, Ohio 45701, United States
- Department of Chemistry and Biochemistry, University of Missouri-St. Louis, 1 University Boulevard, St. Louis, Missouri 63121, United States
| |
Collapse
|
27
|
Telu KH, Marupaka R, Andriamaharavo NR, Simón-Manso Y, Liang Y, Mirokhin YA, Bukhari TH, Preston RJ, Kashi L, Kelman Z, Stein SE. Creation and filtering of a recurrent spectral library of CHO cell metabolites and media components. Biotechnol Bioeng 2021; 118:1491-1510. [PMID: 33404064 PMCID: PMC8048470 DOI: 10.1002/bit.27661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 12/02/2020] [Accepted: 12/13/2020] [Indexed: 02/02/2023]
Abstract
This paper reports the first implementation of a new type of mass spectral library for the analysis of Chinese hamster ovary (CHO) cell metabolites that allows users to quickly identify most compounds in any complex metabolite sample. We also describe an annotation methodology developed to filter out artifacts and low‐quality spectra from recurrent unidentified spectra of metabolites. CHO cells are commonly used to produce biological therapeutics. Metabolic profiles of CHO cells and media can be used to monitor process variability and look for markers that discriminate between batches of product. We have created a comprehensive library of both identified and unidentified metabolites derived from CHO cells that can be used in conjunction with tandem mass spectrometry to identify metabolites. In addition, we present a workflow that can be used for assigning confidence to a NIST MS/MS Library search match based on prior probability of general utility. The goal of our work is to annotate and identify (when possible), all liquid chromatography‐mass spectrometry generated metabolite ions as well as create automatable library building and identification pipelines for use by others in the field.
Collapse
Affiliation(s)
- Kelly H Telu
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Ramesh Marupaka
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Nirina R Andriamaharavo
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Yamil Simón-Manso
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Yuxue Liang
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Yuri A Mirokhin
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Tallat H Bukhari
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Renae J Preston
- Biomolecular Labeling Laboratory, Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology and the University of Maryland, Rockville, Maryland, USA
| | - Lila Kashi
- Biomolecular Labeling Laboratory, Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology and the University of Maryland, Rockville, Maryland, USA
| | - Zvi Kelman
- Biomolecular Labeling Laboratory, Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology and the University of Maryland, Rockville, Maryland, USA
| | - Stephen E Stein
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| |
Collapse
|
28
|
|
29
|
Qin C, Luo X, Deng C, Shu K, Zhu W, Griss J, Hermjakob H, Bai M, Perez-Riverol Y. Deep learning embedder method and tool for mass spectra similarity search. J Proteomics 2020; 232:104070. [PMID: 33307250 DOI: 10.1016/j.jprot.2020.104070] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/25/2020] [Accepted: 12/01/2020] [Indexed: 12/31/2022]
Abstract
Spectral similarity calculation is widely used in protein identification tools and mass spectra clustering algorithms while comparing theoretical or experimental spectra. The performance of the spectral similarity calculation plays an important role in these tools and algorithms especially in the analysis of large-scale datasets. Recently, deep learning methods have been proposed to improve the performance of clustering algorithms and protein identification by training the algorithms with existing data and the use of multiple spectra and identified peptide features. While the efficiency of these algorithms is still under study in comparison with traditional approaches, their application in proteomics data analysis is becoming more common. Here, we propose the use of deep learning to improve spectral similarity comparison. We assessed the performance of deep learning for spectral similarity, with GLEAMS and a newly trained embedder model (DLEAMSE), which uses high-quality spectra from PRIDE Cluster. Also, we developed a new bioinformatics tool (mslookup - https://github.com/bigbio/DLEAMSE/) that allows users to quickly search for spectra in previously identified mass spectra publish in public repositories and spectral libraries. Finally, we released a human database to enable bioinformaticians and biologists to search for identified spectra in their machines. SIGNIFICANCE STATEMENT: Spectral similarity calculation plays an important role in proteomics data analysis. With deep learning's ability to learn the implicit and effective features from large-scale training datasets, deep learning-based MS/MS spectra embedding models has emerged as a solution to improve mass spectral clustering similarity calculation algorithms. We compare multiple similarity scoring and deep learning methods in terms of accuracy (compute the similarity for a pair of the mass spectrum) and computing-time performance. The benchmark results showed no major differences in accuracy between DLEAMSE and normalized dot product for spectrum similarity calculations. The DLEAMSE GPU implementation is faster than NDP in preprocessing on the GPU server and the similarity calculation of DLEAMSE (Euclidean distance on 32-D vectors) takes about 1/3 of dot product calculations. The deep learning model (DLEAMSE) encoding and embedding steps needed to run once for each spectrum and the embedded 32-D points can be persisted in the repository for future comparison, which is faster for future comparisons and large-scale data. Based on these, we proposed a new tool mslookup that enables the researcher to find spectra previously identified in public data. The tool can be also used to generate in-house databases of previously identified spectra to share with other laboratories and consortiums.
Collapse
Affiliation(s)
- Chunyuan Qin
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China
| | - Xiyang Luo
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China
| | - Chuan Deng
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China
| | - Kunxian Shu
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China
| | - Weimin Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; Department of Dermatology, Medical University of Vienna, 1090 Vienna, Austria
| | - Henning Hermjakob
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mingze Bai
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China.
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| |
Collapse
|
30
|
Abstract
For the last century we have relied on model organisms to help understand fundamental biological processes. Now, with advancements in genome sequencing, assembly, and annotation, non-model organisms may be studied with the same advanced bioanalytical toolkit as model organisms. Proteomics is one such technique, which classically relies on predicted protein sequences to catalog and measure complex proteomes across tissues and biofluids. Applying proteomics to non-model organisms can advance and accelerate biomimicry studies, biomedical advancements, veterinary medicine, agricultural research, behavioral ecology, and food safety. In this postmodel organism era, we can study almost any species, meaning that many non-model organisms are, in fact, important emerging model organisms. Herein we specifically focus on eukaryotic organisms and discuss the steps to generate sequence databases, analyze proteomic data with or without a database, and interpret results as well as future research opportunities. Proteomics is more accessible than ever before and will continue to rapidly advance in the coming years, enabling critical research and discoveries in non-model organisms that were hitherto impossible.
Collapse
Affiliation(s)
- Michelle Heck
- Emerging Pests and Pathogens Research Unit, USDA Agricultural Research Service, Ithaca, NY, USA
- Plant Pathology and Plant Microbe Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
- Boyce Thompson Institute, Ithaca, NY, USA
| | - Benjamin A. Neely
- Chemical Sciences Division, National Institute of Standards and Technology, Charleston, SC, USA
| |
Collapse
|
31
|
Na S, Paek E. Computational methods in mass spectrometry-based structural proteomics for studying protein structure, dynamics, and interactions. Comput Struct Biotechnol J 2020; 18:1391-1402. [PMID: 32637038 PMCID: PMC7322682 DOI: 10.1016/j.csbj.2020.06.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 06/01/2020] [Accepted: 06/01/2020] [Indexed: 12/28/2022] Open
Abstract
Mass spectrometry (MS) has made enormous contributions to comprehensive protein identification and quantification in proteomics. MS is also gaining momentum for structural biology in a variety of ways, complementing conventional structural biology techniques. Here, we will review how MS-based techniques, such as hydrogen/deuterium exchange, covalent labeling, and chemical cross-linking, enable the characterization of protein structure, dynamics, and interactions, especially from a perspective of their data analyses. Structural information encoded by chemical probes in intact proteins is decoded by interpreting MS data at a peptide level, i.e., revealing conformational and dynamic changes in local regions of proteins. The structural MS data are not amenable to data analyses in traditional proteomics workflow, requiring dedicated software for each type of data. We first provide basic principles of data interpretation, including isotopic distribution and peptide sequencing. We then focus particularly on computational methods for structural MS data analyses and discuss outstanding challenges in a proteome-wide large scale analysis.
Collapse
Affiliation(s)
- Seungjin Na
- Dept. of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
| | - Eunok Paek
- Dept. of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
| |
Collapse
|
32
|
Abstract
This manuscript outlines a straight-forward procedure for generating a map of similarity between spectra of a set. When applied to a reference set of spectra for Type I fentanyl analogs (molecules differing from fentanyl by a single modification), the map illuminates clustering that is applicable to automated structure assignment of unidentified molecules. An open-source software implementation that generates mass spectral similarity mappings of unknowns against a library of Type I fentanyl analog spectra is available at http://github.com/asm3-nist/FentanylClassifier.
Collapse
|
33
|
Yan X, Markey SP, Marupaka R, Dong Q, Cooper BT, Mirokhin YA, Wallace WE, Stein SE. Mass Spectral Library of Acylcarnitines Derived from Human Urine. Anal Chem 2020; 92:6521-6528. [PMID: 32271007 DOI: 10.1021/acs.analchem.0c00129] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
We describe the creation of a mass spectral library of acylcarnitines and conjugated acylcarnitines from the LC-MS/MS analysis of six NIST urine reference materials. To recognize acylcarnitines, we conducted in-depth analyses of fragmentation patterns of acylcarnitines and developed a set of rules, derived from spectra in the NIST17 Tandem MS Library and those identified in urine, using the newly developed hybrid search method. Acylcarnitine tandem spectra were annotated with fragments from carnitine and acyl moieties as well as neutral loss peaks from precursors. Consensus spectra were derived from spectra having similar retention time, fragmentation pattern, and the same precursor m/z and collision energy. The library contains 157 different precursor masses, 586 unique acylcarnitines, and 4 332 acylcarnitine consensus spectra. Furthermore, from spectra that partially satisfied the fragmentation rules of acylcarnitines, we identified 125 conjugated acylcarnitines represented by 987 consensus spectra, which appear to originate from Phase II biotransformation reactions. To our knowledge, this is the first report of conjugated acylcarnitines. The mass spectra provided by this work may be useful for clinical screening of acylcarnitines as well as for studying relationships among fragmentation patterns, collision energies, structures, and retention times of acylcarnitines. Further, these methods are extensible to other classes of metabolites.
Collapse
Affiliation(s)
- Xinjian Yan
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland 20899, United States
| | - Sanford P Markey
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland 20899, United States
| | - Ramesh Marupaka
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland 20899, United States
| | - Qian Dong
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland 20899, United States
| | - Brian T Cooper
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, North Carolina 28223, United States
| | - Yuri A Mirokhin
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland 20899, United States
| | - William E Wallace
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland 20899, United States
| | - Stephen E Stein
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland 20899, United States
| |
Collapse
|
34
|
Overview of Tandem Mass Spectral and Metabolite Databases for Metabolite Identification in Metabolomics. Methods Mol Biol 2020; 2104:139-148. [PMID: 31953816 DOI: 10.1007/978-1-0716-0239-3_8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Liquid chromatography-mass spectrometry (LC-MS) is one of the most popular technologies in metabolomics. The large-scale and unambiguous identification of metabolite structures remains a challenging task in LC-MS based metabolomics. Tandem mass spectral databases provide experimental and in silico MS/MS spectra to facilitate the identification of both known and unknown metabolites, which has become a gold standard method in metabolomics. In addition, metabolite knowledge databases offer valuable biological and pathway information of metabolites. In this chapter, we have briefly reviewed the most common and important tandem mass spectral and metabolite databases, and illustrated how they could be used for metabolite identification.
Collapse
|
35
|
Bearden DW, Sheen DA, Simón-Manso Y, Benner BA, Rocha WFC, Blonder N, Lippa KA, Beger RD, Schnackenberg LK, Sun J, Mehta KY, Cheema AK, Gu H, Marupaka R, Nagana Gowda GA, Raftery D. Metabolomics Test Materials for Quality Control: A Study of a Urine Materials Suite. Metabolites 2019; 9:metabo9110270. [PMID: 31703392 PMCID: PMC6918257 DOI: 10.3390/metabo9110270] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 10/31/2019] [Accepted: 11/01/2019] [Indexed: 12/20/2022] Open
Abstract
There is a lack of experimental reference materials and standards for metabolomics measurements, such as urine, plasma, and other human fluid samples. Reasons include difficulties with supply, distribution, and dissemination of information about the materials. Additionally, there is a long lead time because reference materials need their compositions to be fully characterized with uncertainty, a labor-intensive process for material containing thousands of relevant compounds. Furthermore, data analysis can be hampered by different methods using different software by different vendors. In this work, we propose an alternative implementation of reference materials. Instead of characterizing biological materials based on their composition, we propose using untargeted metabolomic data such as nuclear magnetic resonance (NMR) or gas and liquid chromatography-mass spectrometry (GC-MS and LC-MS) profiles. The profiles are then distributed with the material accompanying the certificate, so that researchers can compare their own metabolomic measurements with the reference profiles. To demonstrate this approach, we conducted an interlaboratory study (ILS) in which seven National Institute of Standards and Technology (NIST) urine Standard Reference Material®s (SRM®s) were distributed to participants, who then returned the metabolomic data to us. We then implemented chemometric methods to analyze the data together to estimate the uncertainties in the current measurement techniques. The participants identified similar patterns in the profiles that distinguished the seven samples. Even when the number of spectral features is substantially different between platforms, a collective analysis still shows significant overlap that allows reliable comparison between participants. Our results show that a urine suite such as that used in this ILS could be employed for testing and harmonization among different platforms. A limited quantity of test materials will be made available for researchers who are willing to repeat the protocols presented here and contribute their data.
Collapse
Affiliation(s)
- Daniel W. Bearden
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA; (D.W.B.); (W.F.C.R.); (N.B.); (K.A.L.)
| | - David A. Sheen
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA; (D.W.B.); (W.F.C.R.); (N.B.); (K.A.L.)
- Correspondence: ; Tel.: +1-301-975-2603
| | - Yamil Simón-Manso
- Biomolecular Measurement Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA;
| | - Bruce A. Benner
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA; (D.W.B.); (W.F.C.R.); (N.B.); (K.A.L.)
| | - Werickson F. C. Rocha
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA; (D.W.B.); (W.F.C.R.); (N.B.); (K.A.L.)
- National Institute of Metrology, Quality, and Technology—INMETRO, 25250-020 Duque de Caxias, RJ, Brazil
| | - Niksa Blonder
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA; (D.W.B.); (W.F.C.R.); (N.B.); (K.A.L.)
| | - Katrice A. Lippa
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA; (D.W.B.); (W.F.C.R.); (N.B.); (K.A.L.)
| | - Richard D. Beger
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (R.D.B.); (L.K.S.); (J.S.)
| | - Laura K. Schnackenberg
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (R.D.B.); (L.K.S.); (J.S.)
| | - Jinchun Sun
- Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA; (R.D.B.); (L.K.S.); (J.S.)
| | - Khyati Y. Mehta
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA; (K.Y.M.); (A.K.C.)
| | - Amrita K. Cheema
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA; (K.Y.M.); (A.K.C.)
- Departments of Oncology and Biochemistry, Molecular and Cellular Biology, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Haiwei Gu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA;
| | - Ramesh Marupaka
- Clinical Toxicology at CIAN Diagnostics, Frederick, MD 21703, USA;
| | - G. A. Nagana Gowda
- Department of Anesthesiology and Pain Medicine, Mitochondria and Metabolism Center, University of Washington, Seattle, WA 98109, USA; (G.A.N.G.); (D.R.)
| | - Daniel Raftery
- Department of Anesthesiology and Pain Medicine, Mitochondria and Metabolism Center, University of Washington, Seattle, WA 98109, USA; (G.A.N.G.); (D.R.)
| |
Collapse
|
36
|
Cooper BT, Yan X, Simón-Manso Y, Tchekhovskoi DV, Mirokhin YA, Stein SE. Hybrid Search: A Method for Identifying Metabolites Absent from Tandem Mass Spectrometry Libraries. Anal Chem 2019; 91:13924-13932. [PMID: 31600070 PMCID: PMC7299168 DOI: 10.1021/acs.analchem.9b03415] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Metabolomics has a critical need for better tools for mass spectral identification. Common metabolites may be identified by searching libraries of tandem mass spectra, which offers important advantages over other approaches to identification. But tandem libraries are not nearly complete enough to represent the full molecular diversity present in complex biological samples. We present a novel hybrid search method that can help identify metabolites not in the library by similarity to compounds that are. We call it "hybrid" searching because it combines conventional, direct peak matching with the logical equivalent of neutral-loss matching. A successful hybrid search requires the library to contain "cognates" of the unknown: similar compounds with a structural difference confined to a single region of the molecule, that does not substantially alter its fragmentation behavior. We demonstrate that the hybrid search is highly likely to find similar compounds under such circumstances.
Collapse
Affiliation(s)
- Brian T. Cooper
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, North Carolina 28223, United States
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Xinjian Yan
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Yamil Simón-Manso
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Dmitrii V. Tchekhovskoi
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Yuri A. Mirokhin
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Stephen E. Stein
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
37
|
Zhang Z, Burke MC, Wallace WE, Liang Y, Sheetlin SL, Mirokhin YA, Tchekhovskoi DV, Stein SE. Sensitive Method for the Confident Identification of Genetically Variant Peptides in Human Hair Keratin. J Forensic Sci 2019; 65:406-420. [PMID: 31670846 PMCID: PMC7064992 DOI: 10.1111/1556-4029.14229] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 09/20/2019] [Accepted: 10/14/2019] [Indexed: 12/20/2022]
Abstract
Recent reports have demonstrated that genetically variant peptides derived from human hair shaft proteins can be used to differentiate individuals of different biogeographic origins. We report a method involving direct extraction of hair shaft proteins more sensitive than previously published methods regarding GVP detection. It involves one step for protein extraction and was found to provide reproducible results. A detailed proteomic analysis of this data is presented that led to the following four results: (i) A peptide spectral library was created and made available for download. It contains all identified peptides from this work, including GVPs that, when appropriately expanded with diverse hair-derived peptides, can provide a routine, reliable, and sensitive means of analyzing hair digests; (ii) an analysis of artifact peptides arising from side reactions is also made using a new method for finding unexpected modifications; (iii) detailed analysis of the gel-based method employed clearly shows the high degree of cross-linking or protein association involved in hair digestion, with major GVPs eluting over a wide range of high molecular weights while others apparently arise from distinct non-cross-linked proteins; and (v) finally, we show that some of the specific GVP identifications depend on the sample preparation method.
Collapse
Affiliation(s)
- Zheng Zhang
- Biomolecular Measurement Division, Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899
| | - Meghan C Burke
- Biomolecular Measurement Division, Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899
| | - William E Wallace
- Biomolecular Measurement Division, Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899
| | - Yuxue Liang
- Biomolecular Measurement Division, Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899
| | - Sergey L Sheetlin
- Biomolecular Measurement Division, Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899
| | - Yuri A Mirokhin
- Biomolecular Measurement Division, Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899
| | - Dmitrii V Tchekhovskoi
- Biomolecular Measurement Division, Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899
| | - Stephen E Stein
- Biomolecular Measurement Division, Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899
| |
Collapse
|
38
|
Simón-Manso Y, Marupaka R, Yan X, Liang Y, Telu KH, Mirokhin Y, Stein SE. Mass Spectrometry Fingerprints of Small-Molecule Metabolites in Biofluids: Building a Spectral Library of Recurrent Spectra for Urine Analysis. Anal Chem 2019; 91:12021-12029. [PMID: 31424920 DOI: 10.1021/acs.analchem.9b02977] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
A large fraction of ions observed in electrospray liquid chromatography-mass spectrometry (LC-ESI-MS) experiments of biological samples remain unidentified. One of the main reasons for this is that spectral libraries of pure compounds fail to account for the complexity of the metabolite profiling of complex materials. Recently, the NIST Mass Spectrometry Data Center has been developing a novel type of searchable mass spectral library that includes all recurrent unidentified spectra found in the sample profile. These libraries, in conjunction with the NIST tandem mass spectral library, allow analysts to explore most of the chemical space accessible to LC-MS analysis. In this work, we demonstrate how these libraries can provide a reliable fingerprint of the material by applying them to a variety of urine samples, including an extremely altered urine from cancer patients undergoing total body irradiation. The same workflow is applicable to any other biological fluid. The selected class of acylcarnitines is examined in detail, and derived libraries and related software are freely available. They are intended to serve as online resources for continuing community review and improvement.
Collapse
Affiliation(s)
- Yamil Simón-Manso
- Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology (NIST) , Gaithersburg , Maryland 20899 , United States
| | - Ramesh Marupaka
- Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology (NIST) , Gaithersburg , Maryland 20899 , United States
| | - Xinjian Yan
- Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology (NIST) , Gaithersburg , Maryland 20899 , United States
| | - Yuxue Liang
- Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology (NIST) , Gaithersburg , Maryland 20899 , United States
| | - Kelly H Telu
- Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology (NIST) , Gaithersburg , Maryland 20899 , United States
| | - Yuri Mirokhin
- Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology (NIST) , Gaithersburg , Maryland 20899 , United States
| | - Stephen E Stein
- Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology (NIST) , Gaithersburg , Maryland 20899 , United States
| |
Collapse
|
39
|
Na S, Kim J, Paek E. MODplus: Robust and Unrestrictive Identification of Post-Translational Modifications Using Mass Spectrometry. Anal Chem 2019; 91:11324-11333. [PMID: 31365238 DOI: 10.1021/acs.analchem.9b02445] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Post-translational modifications regulate various cellular processes and are of great biological interest. Unrestrictive searches of mass spectrometry data enable the detection of any type of modification. Here we propose MODplus, which makes practical unrestrictive searches possible by allowing (1) hundreds of modifications, (2) multiple modifications per peptide, (3) the whole proteome database, and (4) any tolerant values in search parameters. The utility of MODplus was demonstrated in large human data sets of HEK293 cells and TMT-labeled phosphorylation enrichment. Notably, MODplus supports identifying different modification types at multiple sites and reports real chemical and biological modifications, as it has been very labor intensive to link unrestrictive search results to real modifications. We also confirmed the presence of Missing Precursor (MP) spectra that were not identifiable using targeted precursor masses. The MP spectra mostly resulted in identifications of wrong modifications and negatively affected the overall performance, often by as much as 10%. MODplus can rapidly recognize MP spectra and correct their identifications, resulting in increased identification rate up to 70% in the HEK293 data set as well as improved reliability.
Collapse
Affiliation(s)
- Seungjin Na
- Department of Computer Science , Hanyang University , Seoul 04763 , South Korea
| | - Jihyung Kim
- Department of Computer Science , Hanyang University , Seoul 04763 , South Korea
| | - Eunok Paek
- Department of Computer Science , Hanyang University , Seoul 04763 , South Korea
| |
Collapse
|
40
|
Burke MC, Zhang Z, Mirokhin YA, Tchekovskoi DV, Liang Y, Stein SE. False Discovery Rate Estimation for Hybrid Mass Spectral Library Search Identifications in Bottom-up Proteomics. J Proteome Res 2019; 18:3223-3234. [DOI: 10.1021/acs.jproteome.8b00863] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Meghan C. Burke
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Zheng Zhang
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Yuri A. Mirokhin
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Dmitrii V. Tchekovskoi
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Yuxue Liang
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Stephen E. Stein
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
41
|
Jang I, Lee JU, Lee JM, Kim BH, Moon B, Hong J, Oh HB. LC–MS/MS Software for Screening Unknown Erectile Dysfunction Drugs and Analogues: Artificial Neural Network Classification, Peak-Count Scoring, Simple Similarity Search, and Hybrid Similarity Search Algorithms. Anal Chem 2019; 91:9119-9128. [DOI: 10.1021/acs.analchem.9b01643] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Inae Jang
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| | - Jae-ung Lee
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| | - Jung-min Lee
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| | - Beom Hee Kim
- College of Pharmacy, Kyunghee University, Seoul 02447, Republic of Korea
| | - Bongjin Moon
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| | - Jongki Hong
- College of Pharmacy, Kyunghee University, Seoul 02447, Republic of Korea
| | - Han Bin Oh
- Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea
| |
Collapse
|
42
|
Blaženović I, Kind T, Sa MR, Ji J, Vaniya A, Wancewicz B, Roberts BS, Torbašinović H, Lee T, Mehta SS, Showalter MR, Song H, Kwok J, Jahn D, Kim J, Fiehn O. Structure Annotation of All Mass Spectra in Untargeted Metabolomics. Anal Chem 2019; 91:2155-2162. [PMID: 30608141 PMCID: PMC11426395 DOI: 10.1021/acs.analchem.8b04698] [Citation(s) in RCA: 116] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Urine metabolites are used in many clinical and biomedical studies but usually only for a few classic compounds. Metabolomics detects vastly more metabolic signals that may be used to precisely define the health status of individuals. However, many compounds remain unidentified, hampering biochemical conclusions. Here, we annotate all metabolites detected by two untargeted metabolomic assays, hydrophilic interaction chromatography (HILIC)-Q Exactive HF mass spectrometry and charged surface hybrid (CSH)-Q Exactive HF mass spectrometry. Over 9,000 unique metabolite signals were detected, of which 42% triggered MS/MS fragmentations in data-dependent mode. On the highest Metabolomics Standards Initiative (MSI) confidence level 1, we identified 175 compounds using authentic standards with precursor mass, retention time, and MS/MS matching. An additional 578 compounds were annotated by precursor accurate mass and MS/MS matching alone, MSI level 2, including a novel library specifically geared at acylcarnitines (CarniBlast). The rest of the metabolome is usually left unannotated. To fill this gap, we used the in silico fragmentation tool CSI:FingerID and the new NIST hybrid search to annotate all further compounds (MSI level 3). Testing the top-ranked metabolites in CSI:Finger ID annotations yielded 40% accuracy when applied to the MSI level 1 identified compounds. We classified all MSI level 3 annotations by the NIST hybrid search using the ClassyFire ontology into 21 superclasses that were further distinguished into 184 chemical classes. ClassyFire annotations showed that the previously unannotated urine metabolome consists of 28% derivatives of organic acids, 16% heterocyclics, and 16% lipids as major classes.
Collapse
Affiliation(s)
- Ivana Blaženović
- West Coast Metabolomics Center , University of California, Davis , Davis , California 95616 , United States
| | - Tobias Kind
- West Coast Metabolomics Center , University of California, Davis , Davis , California 95616 , United States
| | - Michael R Sa
- West Coast Metabolomics Center , University of California, Davis , Davis , California 95616 , United States
| | - Jian Ji
- School of Food Science, State Key Laboratory of Food Science and Technology , Jiangnan University , Wuxi , Jiangsu 330047 , China
| | - Arpana Vaniya
- West Coast Metabolomics Center , University of California, Davis , Davis , California 95616 , United States
| | - Benjamin Wancewicz
- West Coast Metabolomics Center , University of California, Davis , Davis , California 95616 , United States
| | - Bryan S Roberts
- West Coast Metabolomics Center , University of California, Davis , Davis , California 95616 , United States
| | | | - Tack Lee
- Department of Urology , Inha University College of Medicine , Incheon 22212 , South Korea
| | - Sajjan S Mehta
- West Coast Metabolomics Center , University of California, Davis , Davis , California 95616 , United States
| | - Megan R Showalter
- West Coast Metabolomics Center , University of California, Davis , Davis , California 95616 , United States
| | - Hosook Song
- Department of Urology , Inha University College of Medicine , Incheon 22212 , South Korea
| | - Jessica Kwok
- West Coast Metabolomics Center , University of California, Davis , Davis , California 95616 , United States
| | - Dieter Jahn
- Institute of Microbiology , Technische Universität Braunschweig , Braunschweig 38106 , Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS) , Technische Universität Braunschweig , Braunschweig 38106 , Germany
| | - Jayoung Kim
- Departments of Surgery and Biomedical Sciences , Cedars-Sinai Medical Center , Los Angeles , California 90048 , United States
- Department of Medicine , University of California Los Angeles , Los Angeles , California 90095 , United States
- Samuel Oschin Comprehensive Cancer Institute , Cedars-Sinai Medical Center , Los Angeles , California 90048 , United States
- Department of Urology , Ga Cheon University College of Medicine , Incheon 22212 , South Korea
| | - Oliver Fiehn
- West Coast Metabolomics Center , University of California, Davis , Davis , California 95616 , United States
| |
Collapse
|
43
|
Deutsch EW, Perez-Riverol Y, Chalkley RJ, Wilhelm M, Tate S, Sachsenberg T, Walzer M, Käll L, Delanghe B, Böcker S, Schymanski EL, Wilmes P, Dorfer V, Kuster B, Volders PJ, Jehmlich N, Vissers JP, Wolan DW, Wang AY, Mendoza L, Shofstahl J, Dowsey AW, Griss J, Salek RM, Neumann S, Binz PA, Lam H, Vizcaíno JA, Bandeira N, Röst H. Expanding the Use of Spectral Libraries in Proteomics. J Proteome Res 2018; 17:4051-4060. [PMID: 30270626 PMCID: PMC6443480 DOI: 10.1021/acs.jproteome.8b00485] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Robert J. Chalkley
- University of California San Francisco, San Francisco, 94158, California, United States
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
| | | | - Timo Sachsenberg
- Department of Computer Science, Center for Bioinformatics, University of Tübingen, Sand 14, Tübingen, 72076, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH − Royal Institute of Technology, Stockholm 114 28, Sweden
| | - Bernard Delanghe
- Thermo Fisher Scientific Bremen, Hanna-Kunath Str. 11, 28199 Bremen, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Viktoria Dorfer
- University of Applied Sciences Upper Austria, Bioinformatics Research Group, Hagenberg, 4232, Austria
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
- Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich, Freising, 85354, Germany
| | | | - Nico Jehmlich
- Helmholtz-Centre for Environmental Research - UFZ, Leipzig, Germany
| | | | - Dennis W. Wolan
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Ana Y. Wang
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Jim Shofstahl
- Thermo Fisher Scientific, 355 River Oaks Parkway San Jose, CA 95134
| | - Andrew W. Dowsey
- Department of Population Health Sciences and Bristol Veterinary School, Faculty of Health Sciences, University of Bristol, Bristol BS9 1BN, UK
| | - Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Währinger Gürtel 18-20, Vienna 1090, Austria
| | - Reza M. Salek
- The International Agency for Research on Cancer (IARC), 150 Cours Albert Thomas, 69372 Lyon CEDEX 08, France
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Department of Stress and Developmental Biology, 06120 Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Pierre-Alain Binz
- Clinical Chemistry Service, Centre Hospitalier Universitaire Vaudois, 1011 Lausanne, Switzerland
| | - Henry Lam
- Department of Chemical and Biological Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 92093-0404, USA
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, 160 College St., Toronto, ON, M5S 3E1, Canada
| |
Collapse
|
44
|
Bittremieux W, Meysman P, Noble WS, Laukens K. Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing. J Proteome Res 2018; 17:3463-3474. [PMID: 30184435 PMCID: PMC6173621 DOI: 10.1021/acs.jproteome.8b00359] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Open modification searching (OMS) is a powerful search strategy that identifies peptides carrying any type of modification by allowing a modified spectrum to match against its unmodified variant by using a very wide precursor mass window. A drawback of this strategy, however, is that it leads to a large increase in search time. Although performing an open search can be done using existing spectral library search engines by simply setting a wide precursor mass window, none of these tools have been optimized for OMS, leading to excessive runtimes and suboptimal identification results. We present the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. This approach is combined with a cascade search strategy to maximize the number of identified unmodified and modified spectra while strictly controlling the false discovery rate as well as a shifted dot product score to sensitively match modified spectra to their unmodified counterparts. ANN-SoLo achieves state-of-the-art performance in terms of speed and the number of identifications. On a previously published human cell line data set, ANN-SoLo confidently identifies more spectra than SpectraST or MSFragger and achieves a speedup of an order of magnitude compared with SpectraST. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo .
Collapse
Affiliation(s)
- Wout Bittremieux
- Department of Mathematics and Computer Science , University of Antwerp , 2020 Antwerp , Belgium
- Biomedical Informatics Network Antwerpen (biomina) , 2020 Antwerp , Belgium
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - Pieter Meysman
- Department of Mathematics and Computer Science , University of Antwerp , 2020 Antwerp , Belgium
- Biomedical Informatics Network Antwerpen (biomina) , 2020 Antwerp , Belgium
| | - William Stafford Noble
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
- Department of Computer Science and Engineering , University of Washington , Seattle , Washington 98195 , United States
| | - Kris Laukens
- Department of Mathematics and Computer Science , University of Antwerp , 2020 Antwerp , Belgium
- Biomedical Informatics Network Antwerpen (biomina) , 2020 Antwerp , Belgium
| |
Collapse
|
45
|
Beyter D, Lin MS, Yu Y, Pieper R, Bafna V. ProteoStorm: An Ultrafast Metaproteomics Database Search Framework. Cell Syst 2018; 7:463-467.e6. [PMID: 30268435 DOI: 10.1016/j.cels.2018.08.009] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 06/22/2018] [Accepted: 08/13/2018] [Indexed: 12/15/2022]
Abstract
Shotgun metaproteomics has the potential to reveal the functional landscape of microbial communities but lacks appropriate methods for complex samples with unknown compositions. In the absence of prior taxonomic information, tandem mass spectra would be searched against large pan-microbial databases, which requires heavy computational workload and reduces sensitivity. We present ProteoStorm, an efficient database search framework for large-scale metaproteomics studies, which identifies high-confidence peptide-spectrum matches (PSMs) while achieving a two-to-three orders-of-magnitude speedup over popular tools. A reanalysis of a urinary tract infection (UTI) dataset of 110 individuals revealed a complex pattern of polymicrobial expression, including sub-types of UTIs, cases of bacterial vaginosis, and evidence of no underlying disease. Importantly, compared to the initial UTI study that restricted the search database to a manually curated list of 20 genera, ProteoStorm identified additional genera that were previously unreported, including a case of infection with the rare pathogen Propionimicrobium.
Collapse
Affiliation(s)
- Doruk Beyter
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Miin S Lin
- Graduate Program in Bioinformatics & Systems Biology, University of California, San Diego, La Jolla, CA 92093, USA
| | - Yanbao Yu
- J. Craig Venter Institute, Rockville, MD 20850, USA
| | | | - Vineet Bafna
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
46
|
Shen J, Pagala VR, Breuer AM, Peng J, Bin Ma, Wang X. Spectral Library Search Improves Assignment of TMT Labeled MS/MS Spectra. J Proteome Res 2018; 17:3325-3331. [PMID: 30096983 DOI: 10.1021/acs.jproteome.8b00594] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Tandem mass tag (TMT)-based liquid chromatography-tandem mass spectrometry (LC-MS/MS) is a proven approach for large-scale multiplexed protein quantification. However, the identification of TMT-labeled peptides is compromised by the labeling during traditional sequence database searches. In this study, we aim to use a spectral library search to increase the sensitivity and specificity of peptide identification for TMT-based MS data. Compared to MS/MS spectra of unlabeled peptides, the spectra of TMT-labeled counterparts usually display intensified b ions, suggesting that TMT labeling can alter product ion patterns during MS/MS fragementation. We compiled a human TMT spectral library of 401,168 unique peptides of high quality from millions of peptide-spectrum matches in tens of profiling projects, matching to 14,048 nonredundant proteins (13,953 genes). A mouse TMT spectral library of similar size was also constructed. The libraries were subsequently appended with decoy spectra to evaluate the false discovery rate, which was validated by a simulated null TMT data set. The performance of the library search was further optimized by removing TMT reporter ions and selecting an appropriate library construction method. Finally, we searched a human TMT data set against the spectral library to demonstrate that the spectral library outperformed the sequence database. Both human and mouse TMT libraries were made publicly available to the research community.
Collapse
Affiliation(s)
- Jianqiao Shen
- Department of Computer Science , University of Waterloo , Waterloo , Ontario N2L 3G1 , Canada
| | | | | | | | - Bin Ma
- Department of Computer Science , University of Waterloo , Waterloo , Ontario N2L 3G1 , Canada
| | | |
Collapse
|
47
|
Remoroza CA, Mak TD, De Leoz MLA, Mirokhin YA, Stein SE. Creating a Mass Spectral Reference Library for Oligosaccharides in Human Milk. Anal Chem 2018; 90:8977-8988. [PMID: 29969231 DOI: 10.1021/acs.analchem.8b01176] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
We report the development and availability of a mass spectral reference library for oligosaccharides in human milk. This represents a new variety of spectral library that includes consensus spectra of compounds annotated through various data analysis methods, a concept that can be extended to other varieties of biological fluids. Oligosaccharides from the NIST Standard Reference Material (SRM) 1953, composed of human milk pooled from 100 breastfeeding mothers, were identified and characterized using hydrophilic interaction liquid chromatography electrospray ionization tandem mass spectrometry (HILIC-ESI-MS/MS) and the NIST 17 Tandem MS Library. Consensus reference spectra were generated, incorporated into a searchable library, and matched using the newly developed hybrid search algorithm to elucidate unknown oligosaccharides. The NIST hybrid search program facilitates the structural assignment of complex oligosaccharides especially when reference standards are not commercially available. High accuracy mass measurement for precursor and product ions, as well as the relatively high MS/MS signal intensities of various oligosaccharide precursors with Fourier transform ion trap (FT-IT) and higher energy dissociation (HCD) fragmentation techniques, enabled the assignment of multiple free and underivatized fucosyllacto- and sialyllacto-oligosaccharide spectra. Neutral and sialylated isomeric oligosaccharides have distinct retention times, allowing the identification of 74 oligosaccharides in the reference material. This collection of newly characterized spectra based on a searchable, reference MS library of annotated oligosaccharides can be applied to analyze similar compounds in other types of milk or any biological fluid containing milk oligosaccharides.
Collapse
Affiliation(s)
- Connie A Remoroza
- Mass Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology , Gaithersburg , Maryland 20899-8362 , United States
| | - Tytus D Mak
- Mass Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology , Gaithersburg , Maryland 20899-8362 , United States
| | - Maria Lorna A De Leoz
- Mass Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology , Gaithersburg , Maryland 20899-8362 , United States
| | - Yuri A Mirokhin
- Mass Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology , Gaithersburg , Maryland 20899-8362 , United States
| | - Stephen E Stein
- Mass Spectrometry Data Center, Biomolecular Measurement Division , National Institute of Standards and Technology , Gaithersburg , Maryland 20899-8362 , United States
| |
Collapse
|
48
|
Bowers JJ, Gunawardena HP, Cornu A, Narvekar AS, Richieu A, Deffieux D, Quideau S, Tharayil N. Rapid Screening of Ellagitannins in Natural Sources via Targeted Reporter Ion Triggered Tandem Mass Spectrometry. Sci Rep 2018; 8:10399. [PMID: 29991731 PMCID: PMC6039434 DOI: 10.1038/s41598-018-27708-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 05/17/2018] [Indexed: 12/18/2022] Open
Abstract
Complex biomolecules present in their natural sources have been difficult to analyze using traditional analytical approaches. Ultrahigh-performance liquid chromatography (UHPLC-MS/MS) methods have the potential to enhance the discovery of a less well characterized and challenging class of biomolecules in plants, the ellagitannins. We present an approach that allows for the screening of ellagitannins by employing higher energy collision dissociation (HCD) to generate reporter ions for classification and collision-induced dissociation (CID) to generate unique fragmentation spectra for isomeric variants of previously unreported species. Ellagitannin anions efficiently form three characteristic reporter ions after HCD fragmentation that allows for the classification of unknown precursors that we call targeted reporter ion triggering (TRT). We demonstrate how a tandem HCD-CID experiment might be used to screen natural sources using UHPLC-MS/MS by application of 22 method conditions from which an optimized data-dependent acquisition (DDA) emerged. The method was verified not to yield false-positive results in complex plant matrices. We were able to identify 154 non-isomeric ellagitannins from strawberry leaves, which is 17 times higher than previously reported in the same matrix. The systematic inclusion of CID spectra for isomers of each species classified as an ellagitannin has never been possible before the development of this approach.
Collapse
Affiliation(s)
- Jeremiah J Bowers
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, 29631, USA
| | - Harsha P Gunawardena
- Janssen Research and Development, The Janssen Pharmaceutical Companies of Johnson and Johnson, Spring House, PA, 19477, USA
| | - Anaëlle Cornu
- University Bordeaux, ISM (CNRS-UMR 5255), 351 cours de la Libération, 33405, Talence Cedex, France
| | - Ashwini S Narvekar
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, 29631, USA
| | - Antoine Richieu
- University Bordeaux, ISM (CNRS-UMR 5255), 351 cours de la Libération, 33405, Talence Cedex, France
| | - Denis Deffieux
- University Bordeaux, ISM (CNRS-UMR 5255), 351 cours de la Libération, 33405, Talence Cedex, France
| | - Stéphane Quideau
- University Bordeaux, ISM (CNRS-UMR 5255), 351 cours de la Libération, 33405, Talence Cedex, France
| | - Nishanth Tharayil
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, 29631, USA.
| |
Collapse
|
49
|
Misra BB. Updates on resources, software tools, and databases for plant proteomics in 2016-2017. Electrophoresis 2018; 39:1543-1557. [PMID: 29420853 DOI: 10.1002/elps.201700401] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Revised: 01/23/2018] [Accepted: 02/02/2018] [Indexed: 11/05/2022]
Abstract
Proteomics data processing, annotation, and analysis can often lead to major hurdles in large-scale high-throughput bottom-up proteomics experiments. Given the recent rise in protein-based big datasets being generated, efforts in in silico tool development occurrences have had an unprecedented increase; so much so, that it has become increasingly difficult to keep track of all the advances in a particular academic year. However, these tools benefit the plant proteomics community in circumventing critical issues in data analysis and visualization, as these continually developing open-source and community-developed tools hold potential in future research efforts. This review will aim to introduce and summarize more than 50 software tools, databases, and resources developed and published during 2016-2017 under the following categories: tools for data pre-processing and analysis, statistical analysis tools, peptide identification tools, databases and spectral libraries, and data visualization and interpretation tools. Intended for a well-informed proteomics community, finally, efforts in data archiving and validation datasets for the community will be discussed as well. Additionally, the author delineates the current and most commonly used proteomics tools in order to introduce novice readers to this -omics discovery platform.
Collapse
Affiliation(s)
- Biswapriya B Misra
- Department of Internal Medicine, Section of Molecular Medicine, Medical Center Boulevard, Winston-Salem, NC, USA
| |
Collapse
|
50
|
Zhou L, Wong L, Goh WWB. Understanding missing proteins: a functional perspective. Drug Discov Today 2018; 23:644-651. [DOI: 10.1016/j.drudis.2017.11.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 10/24/2017] [Accepted: 11/13/2017] [Indexed: 01/03/2023]
|