1
|
Liu K, Tao C, Ye Y, Tang H. SpecEncoder: deep metric learning for accurate peptide identification in proteomics. Bioinformatics 2024; 40:i257-i265. [PMID: 38940141 PMCID: PMC11211836 DOI: 10.1093/bioinformatics/btae220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Tandem mass spectrometry (MS/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification. RESULTS We evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%-2% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%-15% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%-12% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder's potential to enhance peptide identification for proteomic data analyses. AVAILABILITY AND IMPLEMENTATION The source code and scripts for SpecEncoder and peptide identification are available on GitHub at https://github.com/lkytal/SpecEncoder. Contact: hatang@iu.edu.
Collapse
Affiliation(s)
- Kaiyuan Liu
- Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University, IN 47408, United States
| | - Chenghua Tao
- Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University, IN 47408, United States
| | - Yuzhen Ye
- Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University, IN 47408, United States
| | - Haixu Tang
- Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University, IN 47408, United States
| |
Collapse
|
2
|
Ragland JM, Place BJ. A Portable and Reusable Database Infrastructure for Mass Spectrometry, and Its Associated Toolkit (The DIMSpec Project). JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:1282-1291. [PMID: 38704738 DOI: 10.1021/jasms.4c00073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Nontargeted analysis (NTA) is a rapidly growing field of techniques that includes the identification of unknown chemical analytes in complex mixtures such as environmental, biological, and food matrices. The use of reference mass spectral databases is a key component of most NTA workflows, providing a high level of confidence for chemical identification when analytical standards are not available, yet effective interlaboratory sharing of research grade spectra remains challenging. The Database Infrastructure for Mass Spectrometry (DIMSpec) project focused on the creation of an open-source toolkit supporting storage and sharing of high-resolution mass spectra with attached sample and methodological metadata. As a demonstration of its utility, the DIMSpec toolkit was used to create a database of curated mass spectra for per- and polyfluoroalkyl substances (PFAS) generated from various sources. While the underlying toolkit is agnostic to analytical targets, this initial release (along with the database schema, mass spectral data, and database tools) should enable PFAS researchers to use these data for their own studies, including the identification of novel PFAS in the environment.
Collapse
Affiliation(s)
- Jared M Ragland
- National Institute of Standards and Technology, Material Measurement Laboratory, Chemical Sciences Division, Gaithersburg, Maryland 20899, United States
| | - Benjamin J Place
- National Institute of Standards and Technology, Material Measurement Laboratory, Chemical Sciences Division, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
3
|
Mohanty I, Mannochio-Russo H, Schweer JV, El Abiead Y, Bittremieux W, Xing S, Schmid R, Zuffa S, Vasquez F, Muti VB, Zemlin J, Tovar-Herrera OE, Moraïs S, Desai D, Amin S, Koo I, Turck CW, Mizrahi I, Kris-Etherton PM, Petersen KS, Fleming JA, Huan T, Patterson AD, Siegel D, Hagey LR, Wang M, Aron AT, Dorrestein PC. The underappreciated diversity of bile acid modifications. Cell 2024; 187:1801-1818.e20. [PMID: 38471500 DOI: 10.1016/j.cell.2024.02.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/30/2023] [Accepted: 02/15/2024] [Indexed: 03/14/2024]
Abstract
The repertoire of modifications to bile acids and related steroidal lipids by host and microbial metabolism remains incompletely characterized. To address this knowledge gap, we created a reusable resource of tandem mass spectrometry (MS/MS) spectra by filtering 1.2 billion publicly available MS/MS spectra for bile-acid-selective ion patterns. Thousands of modifications are distributed throughout animal and human bodies as well as microbial cultures. We employed this MS/MS library to identify polyamine bile amidates, prevalent in carnivores. They are present in humans, and their levels alter with a diet change from a Mediterranean to a typical American diet. This work highlights the existence of many more bile acid modifications than previously recognized and the value of leveraging public large-scale untargeted metabolomics data to discover metabolites. The availability of a modification-centric bile acid MS/MS library will inform future studies investigating bile acid roles in health and disease.
Collapse
Affiliation(s)
- Ipsita Mohanty
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Helena Mannochio-Russo
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Joshua V Schweer
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA; Department of Chemistry and Biochemistry, University of California, San Diego, San Diego, CA, USA
| | - Yasin El Abiead
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Shipei Xing
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA; Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, Vancouver, BC, Canada
| | - Robin Schmid
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA; Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Simone Zuffa
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Felipe Vasquez
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Valentina B Muti
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, USA; Department of Chemistry and Biochemistry, University of Denver, Denver, CO 80210, USA
| | - Jasmine Zemlin
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA; Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA 92093, USA
| | - Omar E Tovar-Herrera
- Department of Life Sciences, Ben-Gurion University of the Negev, Be'er Sheva, Israel; Goldman Sonnenfeldt School of Sustainability and Climate Change, Ben-Gurion University of the Negev, Be'er Sheva 84105, Israel
| | - Sarah Moraïs
- Department of Life Sciences, Ben-Gurion University of the Negev, Be'er Sheva, Israel; Goldman Sonnenfeldt School of Sustainability and Climate Change, Ben-Gurion University of the Negev, Be'er Sheva 84105, Israel
| | - Dhimant Desai
- Department of Pharmacology, Penn State University College of Medicine, Hershey, PA, USA
| | - Shantu Amin
- Department of Pharmacology, Penn State University College of Medicine, Hershey, PA, USA
| | - Imhoi Koo
- Center for Molecular Toxicology and Carcinogenesis, Department of Veterinary and Biomedical Sciences, Pennsylvania State University, University Park, PA, USA
| | - Christoph W Turck
- Max Planck Institute of Psychiatry, Proteomics and Biomarkers, Kraepelinstrasse 2-10, Munich 80804, Germany; Key Laboratory of Animal Models and Human Disease Mechanisms of Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - Itzhak Mizrahi
- Department of Life Sciences, Ben-Gurion University of the Negev, Be'er Sheva, Israel; Goldman Sonnenfeldt School of Sustainability and Climate Change, Ben-Gurion University of the Negev, Be'er Sheva 84105, Israel
| | - Penny M Kris-Etherton
- Department of Nutritional Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Kristina S Petersen
- Department of Nutritional Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Jennifer A Fleming
- Department of Nutritional Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Tao Huan
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, Vancouver, BC, Canada
| | - Andrew D Patterson
- Center for Molecular Toxicology and Carcinogenesis, Department of Veterinary and Biomedical Sciences, Pennsylvania State University, University Park, PA, USA
| | - Dionicio Siegel
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Lee R Hagey
- Department of Medicine, University of California, San Diego, San Diego, CA, USA
| | - Mingxun Wang
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, USA
| | - Allegra T Aron
- Department of Chemistry and Biochemistry, University of Denver, Denver, CO 80210, USA
| | - Pieter C Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA; Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA; Department of Pharmacology, University of California, San Diego, La Jolla, CA 92093, USA; Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
4
|
Mongia M, Yasaka TM, Liu Y, Guler M, Lu L, Bhagwat A, Behsaz B, Wang M, Dorrestein PC, Mohimani H. Fast mass spectrometry search and clustering of untargeted metabolomics data. Nat Biotechnol 2024:10.1038/s41587-023-01985-4. [PMID: 38168990 DOI: 10.1038/s41587-023-01985-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 09/12/2023] [Indexed: 01/05/2024]
Abstract
The throughput of mass spectrometers and the amount of publicly available metabolomics data are growing rapidly, but analysis tools such as molecular networking and Mass Spectrometry Search Tool do not scale to searching and clustering billions of mass spectral data in metabolomics repositories. To address this limitation, we designed MASST+ and Networking+, which can process datasets that are up to three orders of magnitude larger than those processed by state-of-the-art tools.
Collapse
Affiliation(s)
- Mihir Mongia
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tyler M Yasaka
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Yudong Liu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Mustafa Guler
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Liang Lu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Aditya Bhagwat
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Bahar Behsaz
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Chemia Biosciences Inc., Pittsburgh, PA, USA
| | - Mingxun Wang
- Computer Science and Engineering, University of California Riverside, Riverside, CA, USA
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
- Department of Pharmacology and Pediatrics, University of California San Diego, San Diego, CA, USA
| | - Hosein Mohimani
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
5
|
Chen Y, Du Z, Zhao H, Fang W, Liu T, Zhang Y, Zhang W, Qin W. SPPUSM: An MS/MS spectra merging strategy for improved low-input and single-cell proteome identification. Anal Chim Acta 2023; 1279:341793. [PMID: 37827637 DOI: 10.1016/j.aca.2023.341793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 08/26/2023] [Accepted: 09/06/2023] [Indexed: 10/14/2023]
Abstract
Single and rare cell analysis provides unique insights into the investigation of biological processes and disease progress by resolving the cellular heterogeneity that is masked by bulk measurements. Although many efforts have been made, the techniques used to measure the proteome in trace amounts of samples or in single cells still lag behind those for DNA and RNA due to the inherent non-amplifiable nature of proteins and the sensitivity limitation of current mass spectrometry. Here, we report an MS/MS spectra merging strategy termed SPPUSM (same precursor-produced unidentified spectra merging) for improved low-input and single-cell proteome data analysis. In this method, all the unidentified MS/MS spectra from multiple test files are first extracted. Then, the corresponding MS/MS spectra produced by the same precursor ion from different files are matched according to their precursor mass and retention time (RT) and are merged into one new spectrum. The newly merged spectra with more fragment ions are next searched against the database to increase the MS/MS spectra identification and proteome coverage. Further improvement can be achieved by increasing the number of test files and spectra to be merged. Up to 18.2% improvement in protein identification was achieved for 1 ng HeLa peptides by SPPUSM. Reliability evaluation by the "entrapment database" strategy using merged spectra from human and E. coli revealed a marginal error rate for the proposed method. For application in single cell proteome (SCP) study, identification enhancement of 28%-61% was achieved for proteins for different SCP data. Furthermore, a lower abundance was found for the SPPUSM-identified peptides, indicating its potential for more sensitive low sample input and SCP studies.
Collapse
Affiliation(s)
- Yongle Chen
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Zhuokun Du
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Hongxian Zhao
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Wei Fang
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Tong Liu
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Yangjun Zhang
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Wanjun Zhang
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China; College of Chemistry and Materials Science, Hebei University, Baoding, 071002, China
| | - Weijie Qin
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China; College of Chemistry and Materials Science, Hebei University, Baoding, 071002, China.
| |
Collapse
|
6
|
Wu L, Hoque A, Lam H. Spectroscape enables real-time query and visualization of a spectral archive in proteomics. Nat Commun 2023; 14:6267. [PMID: 37805652 PMCID: PMC10560257 DOI: 10.1038/s41467-023-42006-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 09/26/2023] [Indexed: 10/09/2023] Open
Abstract
In proteomics, spectral archives organize the enormous amounts of publicly available peptide tandem mass spectra by similarity, offering opportunities for error correction and novel discoveries. Here we adapt an indexing algorithm developed by Facebook for organizing online multimedia resources to tandem mass spectra and achieve practically instantaneous retrieval and clustering of approximate nearest neighbors in a large spectral archive. An interactive web-based graphical user interface enables the user to view a query spectrum in its clustered neighborhood, which facilitates contextual validation of peptide identifications and exploration of the dark proteome.
Collapse
Affiliation(s)
- Long Wu
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
- Department of Electrical and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Ayman Hoque
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong.
| |
Collapse
|
7
|
Schilling M, Levasseur M, Barbier M, Oliveira-Correia L, Henry C, Touboul D, Farine S, Bertsch C, Gelhaye E. Wood Degradation by Fomitiporia mediterranea M. Fischer: Exploring Fungal Adaptation Using Metabolomic Networking. J Fungi (Basel) 2023; 9:jof9050536. [PMID: 37233247 DOI: 10.3390/jof9050536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 04/21/2023] [Accepted: 04/24/2023] [Indexed: 05/27/2023] Open
Abstract
Fomitiporia mediterranea M. Fischer (Fmed) is a white-rot wood-decaying fungus associated with one of the most important and challenging diseases in vineyards: Esca. To relieve microbial degradation, woody plants, including Vitis vinifera, use structural and chemical weapons. Lignin is the most recalcitrant of the wood cell wall structural compounds and contributes to wood durability. Extractives are constitutive or de novo synthesized specialized metabolites that are not covalently bound to wood cell walls and are often associated with antimicrobial properties. Fmed is able to mineralize lignin and detoxify toxic wood extractives, thanks to enzymes such as laccases and peroxidases. Grapevine wood's chemical composition could be involved in Fmed's adaptation to its substrate. This study aimed at deciphering if Fmed uses specific mechanisms to degrade grapevine wood structure and extractives. Three different wood species, grapevine, beech, and oak. were exposed to fungal degradation by two Fmed strains. The well-studied white-rot fungus Trametes versicolor (Tver) was used as a comparison model. A simultaneous degradation pattern was shown for Fmed in the three degraded wood species. Wood mass loss after 7 months for the two fungal species was the highest with low-density oak wood. For the latter wood species, radical differences in initial wood density were observed. No differences between grapevine or beech wood degradation rates were observed after degradation by Fmed or by Tver. Contrary to the Tver secretome, one manganese peroxidase isoform (MnP2l, jgi protein ID 145801) was the most abundant in the Fmed secretome on grapevine wood only. Non-targeted metabolomic analysis was conducted on wood and mycelium samples, using metabolomic networking and public databases (GNPS, MS-DIAL) for metabolite annotations. Chemical differences between non-degraded and degraded woods, and between mycelia grown on different wood species, are discussed. This study highlights Fmed physiological, proteomic and metabolomic traits during wood degradation and thus contributes to a better understanding of its wood degradation mechanisms.
Collapse
Affiliation(s)
| | - Marceau Levasseur
- CNRS, Institut de Chimie des Substances Naturelles (ICSN), UPR2301, Université Paris-Saclay, Avenue de la Terrasse, 91198 Gif-sur-Yvette, France
| | | | - Lydie Oliveira-Correia
- INRAE, AgroParisTech, Micalis Institute, PAPPSO, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Céline Henry
- INRAE, AgroParisTech, Micalis Institute, PAPPSO, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - David Touboul
- CNRS, Institut de Chimie des Substances Naturelles (ICSN), UPR2301, Université Paris-Saclay, Avenue de la Terrasse, 91198 Gif-sur-Yvette, France
- CNRS, Laboratoire de Chimie Moléculaire (LCM), UMR 9168, École Polytechnique, Institut Polytechnique de Paris, Route de Saclay, 91128 Palaiseau, France
| | - Sibylle Farine
- Laboratoire Vigne Biotechnologies et Environnement UPR-3991, Université de Haute-Alsace, 33 Rue de Herrlisheim, 68000 Colmar, France
| | - Christophe Bertsch
- Laboratoire Vigne Biotechnologies et Environnement UPR-3991, Université de Haute-Alsace, 33 Rue de Herrlisheim, 68000 Colmar, France
| | - Eric Gelhaye
- INRAE, IAM, Université de Lorraine, 54000 Nancy, France
| |
Collapse
|
8
|
McDonnell K, Abram F, Howley E. Application of a Novel Hybrid CNN-GNN for Peptide Ion Encoding. J Proteome Res 2022; 22:323-333. [PMID: 36534699 PMCID: PMC9903319 DOI: 10.1021/acs.jproteome.2c00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Almost all state-of-the-art de novo peptide sequencing algorithms now use machine learning models to encode fragment peaks and hence identify amino acids in mass spectrometry (MS) spectra. Previous work has highlighted how the inherent MS challenges of noise and missing peptide peaks detrimentally affect the performance of these models. In the present research we extracted and evaluated the encoding modules from 3 state-of-the-art de novo peptide sequencing algorithms. We also propose a convolutional neural network-graph neural network machine learning model for encoding peptide ions in tandem MS spectra. We compared the proposed encoding module to those used in the state-of-the-art de novo peptide sequencing algorithms by assessing their ability to identify b-ions and y-ions in MS spectra. This included a comprehensive evaluation in both real and artificial data across various levels of noise and missing peptide peaks. The proposed model performed best across all data sets using two different metrics (area under the receiver operating characteristic curve (AUC) and average precision). The work also highlighted the effect of including additional features such as intensity rank in these encoding modules as well as issues with using the AUC as a metric. This work is of significance to those designing future de novo peptide identification algorithms as it is the first step toward a new approach.
Collapse
Affiliation(s)
- Kevin McDonnell
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland,Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland,E-mail:
| | - Florence Abram
- Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland
| | - Enda Howley
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland
| |
Collapse
|
9
|
Gauglitz JM, West KA, Bittremieux W, Williams CL, Weldon KC, Panitchpakdi M, Di Ottavio F, Aceves CM, Brown E, Sikora NC, Jarmusch AK, Martino C, Tripathi A, Meehan MJ, Dorrestein K, Shaffer JP, Coras R, Vargas F, Goldasich LD, Schwartz T, Bryant M, Humphrey G, Johnson AJ, Spengler K, Belda-Ferre P, Diaz E, McDonald D, Zhu Q, Elijah EO, Wang M, Marotz C, Sprecher KE, Vargas-Robles D, Withrow D, Ackermann G, Herrera L, Bradford BJ, Marques LMM, Amaral JG, Silva RM, Veras FP, Cunha TM, Oliveira RDR, Louzada-Junior P, Mills RH, Piotrowski PK, Servetas SL, Da Silva SM, Jones CM, Lin NJ, Lippa KA, Jackson SA, Daouk RK, Galasko D, Dulai PS, Kalashnikova TI, Wittenberg C, Terkeltaub R, Doty MM, Kim JH, Rhee KE, Beauchamp-Walters J, Wright KP, Dominguez-Bello MG, Manary M, Oliveira MF, Boland BS, Lopes NP, Guma M, Swafford AD, Dutton RJ, Knight R, Dorrestein PC. Enhancing untargeted metabolomics using metadata-based source annotation. Nat Biotechnol 2022; 40:1774-1779. [PMID: 35798960 PMCID: PMC10277029 DOI: 10.1038/s41587-022-01368-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 05/20/2022] [Indexed: 01/30/2023]
Abstract
Human untargeted metabolomics studies annotate only ~10% of molecular features. We introduce reference-data-driven analysis to match metabolomics tandem mass spectrometry (MS/MS) data against metadata-annotated source data as a pseudo-MS/MS reference library. Applying this approach to food source data, we show that it increases MS/MS spectral usage 5.1-fold over conventional structural MS/MS library matches and allows empirical assessment of dietary patterns from untargeted data.
Collapse
Affiliation(s)
- Julia M Gauglitz
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Kiana A West
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Wout Bittremieux
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Candace L Williams
- Beckman Center for Conservation Research, San Diego Zoo Wildlife Alliance, Escondido, CA, USA
| | - Kelly C Weldon
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, Joan and Irwin Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
| | - Morgan Panitchpakdi
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Francesca Di Ottavio
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
| | - Christine M Aceves
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Elizabeth Brown
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | - Nicole C Sikora
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Alan K Jarmusch
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Cameron Martino
- Center for Microbiome Innovation, Joan and Irwin Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - Anupriya Tripathi
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Michael J Meehan
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Kathleen Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Justin P Shaffer
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Roxana Coras
- Division of Rheumatology, Allergy & Immunology, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Fernando Vargas
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | | | - Tara Schwartz
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - MacKenzie Bryant
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Gregory Humphrey
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Abigail J Johnson
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Katharina Spengler
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
| | - Pedro Belda-Ferre
- Center for Microbiome Innovation, Joan and Irwin Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Edgar Diaz
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Daniel McDonald
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Qiyun Zhu
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Emmanuel O Elijah
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Mingxun Wang
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Clarisse Marotz
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Kate E Sprecher
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, CO, USA
- Department of Population Health Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Daniela Vargas-Robles
- Servicio Autónomo Centro Amazónico de Investigación y Control de Enfermedades Tropicales Simón Bolívar, Puerto Ayacucho, Amazonas, Venezuela
| | - Dana Withrow
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, CO, USA
| | - Gail Ackermann
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Lourdes Herrera
- Department of Pediatrics, Billings Clinic, Billings, MT, USA
| | - Barry J Bradford
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - Lucas Maciel Mauriz Marques
- Department of Pharmacology, Ribeirão Preto Medicinal School, Center of Research in Inflammatory Diseases, University of São Paulo, Ribeirão Preto, Sao Paolo, Brazil
| | - Juliano Geraldo Amaral
- Multidisciplinary Health Institute, Federal University of Bahia, Vitória da Conquista, Bahia, Brazil
| | - Rodrigo Moreira Silva
- NPPNS, Department of Biomolecular Sciences, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, Sao Paolo, Brazil
| | - Flavio Protasio Veras
- Department of Pharmacology, Ribeirão Preto Medicinal School, Center of Research in Inflammatory Diseases, University of São Paulo, Ribeirão Preto, Sao Paolo, Brazil
| | - Thiago Mattar Cunha
- Department of Pharmacology, Ribeirão Preto Medicinal School, Center of Research in Inflammatory Diseases, University of São Paulo, Ribeirão Preto, Sao Paolo, Brazil
| | - Rene Donizeti Ribeiro Oliveira
- Department of Internal Medicine, Ribeirão Preto Medical School, Center of Research in Inflammatory Diseases, University of São Paulo, Ribeirão Preto, Sao Paolo, Brazil
| | - Paulo Louzada-Junior
- Department of Internal Medicine, Ribeirão Preto Medical School, Center of Research in Inflammatory Diseases, University of São Paulo, Ribeirão Preto, Sao Paolo, Brazil
| | - Robert H Mills
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA, USA
| | - Paulina K Piotrowski
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Stephanie L Servetas
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Sandra M Da Silva
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Christina M Jones
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nancy J Lin
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Katrice A Lippa
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Scott A Jackson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Rima Kaddurah Daouk
- Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, Durham, NC, USA
- Department of Medicine, Duke University, Durham, NC, USA
- Duke Institute of Brain Sciences, Duke University, Durham, NC, USA
| | - Douglas Galasko
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
| | - Parambir S Dulai
- Division of Gastroenterology, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | | | - Curt Wittenberg
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, USA
| | - Robert Terkeltaub
- Division of Rheumatology, Allergy & Immunology, Department of Medicine, University of California San Diego, La Jolla, CA, USA
- San Diego VA Healthcare System, San Diego, CA, USA
| | - Megan M Doty
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
- Division of Neonatology, Department of Pediatrics, Kapi'olani Medical Center for Women and Children, John A. Burns School of Medicine, Honolulu, Hawaii, USA
| | - Jae H Kim
- Division of Neonatology, Perinatal Institute, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Kyung E Rhee
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Julia Beauchamp-Walters
- Division of Pediatric Hospital Medicine, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Kenneth P Wright
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, CO, USA
| | - Maria Gloria Dominguez-Bello
- Department of Biochemistry and Microbiology, School of Environmental and Biological Sciences; Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| | - Mark Manary
- Department of Pediatrics, Washington University, St. Louis, MO, USA
| | - Michelli F Oliveira
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Brigid S Boland
- Division of Gastroenterology, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Norberto Peporine Lopes
- NPPNS, Department of Biomolecular Sciences, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, Sao Paolo, Brazil
| | - Monica Guma
- Division of Rheumatology, Allergy & Immunology, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Austin D Swafford
- Center for Microbiome Innovation, Joan and Irwin Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
| | - Rachel J Dutton
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | - Rob Knight
- Center for Microbiome Innovation, Joan and Irwin Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, USA.
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, Joan and Irwin Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA.
- Department of Pharmacology, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
10
|
Bittremieux W, Wang M, Dorrestein PC. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 2022; 18:94. [PMID: 36409434 DOI: 10.1007/s11306-022-01947-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/19/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. AIM OF REVIEW We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. KEY SCIENTIFIC CONCEPTS OF REVIEW This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.
Collapse
Affiliation(s)
- Wout Bittremieux
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
| | - Mingxun Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, 92507, USA
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
11
|
Skoraczyński G, Gambin A, Miasojedow B. Alignstein: Optimal transport for improved LC-MS retention time alignment. Gigascience 2022; 11:6795291. [PMID: 36329619 PMCID: PMC9633278 DOI: 10.1093/gigascience/giac101] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/24/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022] Open
Abstract
Background Reproducibility of liquid chromatography separation is limited by retention time drift. As a result, measured signals lack correspondence over replicates of the liquid chromatography–mass spectrometry (LC-MS) experiments. Correction of these errors is named retention time alignment and needs to be performed before further quantitative analysis. Despite the availability of numerous alignment algorithms, their accuracy is limited (e.g., for retention time drift that swaps analytes’ elution order). Results We present the Alignstein, an algorithm for LC-MS retention time alignment. It correctly finds correspondence even for swapped signals. To achieve this, we implemented the generalization of the Wasserstein distance to compare multidimensional features without any reduction of the information or dimension of the analyzed data. Moreover, Alignstein by design requires neither a reference sample nor prior signal identification. We validate the algorithm on publicly available benchmark datasets obtaining competitive results. Finally, we show that it can detect the information contained in the tandem mass spectrum by the spatial properties of chromatograms. Conclusions We show that the use of optimal transport effectively overcomes the limitations of existing algorithms for statistical analysis of mass spectrometry datasets. The algorithm’s source code is available at https://github.com/grzsko/Alignstein.
Collapse
Affiliation(s)
- Grzegorz Skoraczyński
- Correspondence address. Grzegorz Skoraczyński, Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland. E-mail:
| | - Anna Gambin
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland
| | - Błażej Miasojedow
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland
| |
Collapse
|
12
|
Jora M, Corcoran D, Parungao GG, Lobue PA, Oliveira LFL, Stan G, Addepalli B, Limbach PA. Higher-Energy Collisional Dissociation Mass Spectral Networks for the Rapid, Semi-automated Characterization of Known and Unknown Ribonucleoside Modifications. Anal Chem 2022; 94:13958-13967. [PMID: 36174068 DOI: 10.1021/acs.analchem.2c03172] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Higher-energy collisional dissociation (HCD) of modified ribonucleosides generates characteristic and highly reproducible nucleoside-specific tandem mass spectra (MS/MS). Here, we demonstrate the capability of HCD spectra in combination with spectral matching for the semi-automated characterization of ribonucleosides. This process involved the generation of an HCD spectral library and the establishment of a mass spectral network for rapid detection with high sensitivity and specificity in a retention time-independent fashion. Systematic spectral matching analysis of the MS/MS spectra of tRNA hydrolysates from different organisms has helped us to uncover evidence for the existence of novel ribonucleoside modifications such as s2Cm and OHyW-14. Such an untargeted label-free approach has the potential to be integrated with other methods, including those that use isotope labeling, to simplify the characterization of unknown modified ribonucleosides. These findings suggest the compilation of a universal spectral network, for the characterization of known and unknown ribonucleosides, could accelerate discoveries in the epitranscriptome.
Collapse
Affiliation(s)
- Manasses Jora
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Daniel Corcoran
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Gwenn G Parungao
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Peter A Lobue
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Luiz F L Oliveira
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - George Stan
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Balasubrahmanyam Addepalli
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Patrick A Limbach
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| |
Collapse
|
13
|
Bittremieux W, May DH, Bilmes J, Noble WS. A learned embedding for efficient joint analysis of millions of mass spectra. Nat Methods 2022; 19:675-678. [DOI: 10.1038/s41592-022-01496-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 04/14/2022] [Indexed: 11/09/2022]
|
14
|
Luo X, Bittremieux W, Griss J, Deutsch EW, Sachsenberg T, Levitsky LI, Ivanov MV, Bubis JA, Gabriels R, Webel H, Sanchez A, Bai M, Käll L, Perez-Riverol Y. A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics. J Proteome Res 2022; 21:1566-1574. [PMID: 35549218 PMCID: PMC9171829 DOI: 10.1021/acs.jproteome.2c00069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Spectrum clustering
is a powerful strategy to minimize redundant
mass spectra by grouping them based on similarity, with the aim of
forming groups of mass spectra from the same repeatedly measured analytes.
Each such group of near-identical spectra can be represented by its
so-called consensus spectrum for downstream processing. Although several
algorithms for spectrum clustering have been adequately benchmarked
and tested, the influence of the consensus spectrum generation step
is rarely evaluated. Here, we present an implementation and benchmark
of common consensus spectrum algorithms, including spectrum averaging,
spectrum binning, the most similar spectrum, and the best-identified
spectrum. We have analyzed diverse public data sets using two different
clustering algorithms (spectra-cluster and MaRaCluster) to evaluate
how the consensus spectrum generation procedure influences downstream
peptide identification. The BEST and BIN methods were found the most
reliable methods for consensus spectrum generation, including for
data sets with post-translational modifications (PTM) such as phosphorylation.
All source code and data of the present study are freely available
on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.
Collapse
Affiliation(s)
- Xiyang Luo
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, 400065 Chongqing, China
| | - Wout Bittremieux
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, U.K.,Department of Dermatology, Medical University of Vienna, 1090 Vienna, Austria
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Timo Sachsenberg
- Applied Bioinformatics, Department for Computer Science, University of Tuebingen, Sand 14, 72076 Tuebingen, Germany
| | - Lev I Levitsky
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 142432, Russia
| | - Mark V Ivanov
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 142432, Russia
| | - Julia A Bubis
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 142432, Russia
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, B-9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, B-9000 Ghent, Belgium
| | - Henry Webel
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen DK-2200, Denmark
| | - Aniel Sanchez
- Section for Clinical Chemistry, Department of Translational Medicine, Lund University, Skåne University Hospital Malmö, 20502 Malmö, Sweden
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, 400065 Chongqing, China
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, Royal Institute of Technology - KTH, Box 1031, 17121 Solna, Sweden
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, U.K
| |
Collapse
|
15
|
Hamood F, Bayer FP, Wilhelm M, Kuster B, The M. SIMSI-Transfer: Software-assisted reduction of missing values in phosphoproteomic and proteomic isobaric labeling data using tandem mass spectrum clustering. Mol Cell Proteomics 2022; 21:100238. [PMID: 35462064 PMCID: PMC9389303 DOI: 10.1016/j.mcpro.2022.100238] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 03/18/2022] [Accepted: 03/27/2022] [Indexed: 12/11/2022] Open
Abstract
Isobaric stable isotope labeling techniques such as tandem mass tags (TMTs) have become popular in proteomics because they enable the relative quantification of proteins with high precision from up to 18 samples in a single experiment. While missing values in peptide quantification are rare in a single TMT experiment, they rapidly increase when combining multiple TMT experiments. As the field moves toward analyzing ever higher numbers of samples, tools that reduce missing values also become more important for analyzing TMT datasets. To this end, we developed SIMSI-Transfer (Similarity-based Isobaric Mass Spectra 2 [MS2] Identification Transfer), a software tool that extends our previously developed software MaRaCluster (© Matthew The) by clustering similar tandem MS2 from multiple TMT experiments. SIMSI-Transfer is based on the assumption that similarity-clustered MS2 spectra represent the same peptide. Therefore, peptide identifications made by database searching in one TMT batch can be transferred to another TMT batch in which the same peptide was fragmented but not identified. To assess the validity of this approach, we tested SIMSI-Transfer on masked search engine identification results and recovered >80% of the masked identifications while controlling errors in the transfer procedure to below 1% false discovery rate. Applying SIMSI-Transfer to six published full proteome and phosphoproteome datasets from the Clinical Proteomic Tumor Analysis Consortium led to an increase of 26 to 45% of identified MS2 spectra with TMT quantifications. This significantly decreased the number of missing values across batches and, in turn, increased the number of peptides and proteins identified in all TMT batches by 43 to 56% and 13 to 16%, respectively. Spectrum clustering enables peptide identification transfer between LC–MS/MS runs. The SIMSI pipeline supports processing full proteome and phosphoproteome data. SIMSI increases the number of quantifiable PSMs by 26 to 45%. SIMSI reduces missing values in multibatch TMT labeling experiments by up to 21%.
Collapse
Affiliation(s)
- Firas Hamood
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Florian P Bayer
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany.
| | - Matthew The
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany.
| |
Collapse
|
16
|
Optimization of LC-MS2 Data Acquisition Parameters for Molecular Networking Applied to Marine Natural Products. Metabolites 2022; 12:metabo12030245. [PMID: 35323688 PMCID: PMC8953742 DOI: 10.3390/metabo12030245] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 03/04/2022] [Accepted: 03/11/2022] [Indexed: 12/03/2022] Open
Abstract
Since the introduction of the online open-source GNPS, molecular networking has quickly become a widely applied tool in the field of natural products chemistry, with applications from dereplication, genome mining, metabolomics, and visualization of chemical space. Studies have shown that data dependent acquisition (DDA) parameters affect molecular network topology but are limited in the number of parameters studied. With an aim to optimize LC-MS2 parameters for integrating GNPS-based molecular networking into our marine natural products workflow, a design of experiment (DOE) was used to screen the significance of the effect that eleven parameters have on both Classical Molecular Networking workflow (CLMN) and the new Feature-Based Molecular Networking workflow (FBMN). Our results indicate that four parameters (concentration, run duration, collision energy and number of precursors per cycle) are the most significant data acquisition parameters affecting the network topology. While concentration and the LC duration were found to be the two most important factors to optimize for CLMN, the number of precursors per cycle and collision energy were also very important factors to optimize for FBMN.
Collapse
|
17
|
The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms. Comput Struct Biotechnol J 2022; 20:1402-1412. [PMID: 35386104 PMCID: PMC8956878 DOI: 10.1016/j.csbj.2022.03.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 03/09/2022] [Accepted: 03/09/2022] [Indexed: 01/24/2023] Open
Abstract
Most correct de novo peptides have ⩽1 missing fragmentation cleavages. DeepNovo outperforms Novor for peptide accuracy for both data types. Novor excels at amino acid recall when many fragmentation cleavages are missing. Deep learning allows DeepNovo to predict amino acids without adjacent peaks.
Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms’ correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms’ improvements and offer potential avenues to overcome current inherent data limitations.
Collapse
|
18
|
Matsuda F, Komori S, Yamada Y, Hara D, Okahashi N. Data Processing of Product Ion Spectra: Quality Improvement by Averaging Multiple Similar Spectra of Small Molecules. Mass Spectrom (Tokyo) 2022; 11:A0106. [PMID: 36713802 PMCID: PMC9853114 DOI: 10.5702/massspectrometry.a0106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/26/2022] [Indexed: 11/06/2022] Open
Abstract
In metabolomics studies using high-resolution mass spectrometry (MS), a set of product ion spectra is comprehensively acquired from observed ions using the data-dependent acquisition (DDA) mode of various tandem MS. However, especially for low-intensity signals, it is sometimes difficult to distinguish artifact signals from true fragment ions derived from a precursor ion. Inadequate precision in the measured m/z value is also one of the bottlenecks to narrowing down the candidate compositional formula. In this study, we report that averaging multiple product ion spectra can improve m/z precision as well as the reliability of fragment ions that are observed in such spectra. A graph-based method was applied to cluster a set of similar spectra from multiple DDA data files resulting in creating an averaged product-ion spectrum. The error levels for the m/z values declined following the central limit theorem, which allowed us to reduce the number of candidate compositional formulas. The improved reliability and precision of the averaged spectra will contribute to a more efficient annotation of product ion spectral data.
Collapse
Affiliation(s)
- Fumio Matsuda
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan,Osaka University Shimadzu Omics Innovation Research Laboratories, Osaka University, Osaka, Japan,Correspondence to: Fumio Matsuda, Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, 1–5 Yamadaoka, Suita, Osaka 565–0871, Japan, e-mail:
| | - Shuka Komori
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Yuki Yamada
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Daiki Hara
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Nobuyuki Okahashi
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan,Osaka University Shimadzu Omics Innovation Research Laboratories, Osaka University, Osaka, Japan
| |
Collapse
|
19
|
Ma C, Zhang Y, Dou X, Liu L, Zhang W, Ye J. Combining multiple acquisition modes and computational data annotation for structural characterization in traditional Chinese medicine: Miao Nationality medicine Qijiao Shengbai Capsule as a case study. RSC Adv 2022; 12:27781-27792. [PMID: 36320242 PMCID: PMC9520537 DOI: 10.1039/d2ra04720a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 09/21/2022] [Indexed: 11/26/2022] Open
Abstract
Qijiao Shengbai Capsule (QSC) is a reputable Miao Nationality medicine used for treating leukopenia, but its chemical composition has not yet been elucidated. We herein present a strategy, by integrating multiple data acquisition, computational data annotation and processing methods to visualize and identify the complicated constituents in QSC based on ultra-high-performance liquid chromatography coupled with traveling wave ion mobility quadrupole time-of-flight mass spectrometry (UPLC-TWIMS-QTOF-MS). The multiple data acquisition modes, including data-independent mass spectrometryEnergy (MSE), data-independent high-definition mass spectrometryEnergy (HDMSE), and fast data-dependent acquisition (fast-DDA), in both positive and negative ion modes, were conducted on a Waters-SYNAPT G2-Si mass spectrometer with an ESI source. An in-house library built by the UNIFI platform could efficiently process the peak annotation of known compounds, whilst different structural types were clustered in the molecular networks for the analogous classification and structural annotation of the unknown ones. Neutral loss, diagnostic ions, feature fragmentation behaviors, and community curation of mass spectrometry data of known compounds helped exploit those similar neighboring nodes of unknown compounds. Moreover, by combination of the predicted CCS values from CCS platform with the experimental CCS values from HDMSE, as well as diagnostic fragment ions, isomer compounds were annotated. By integrating reference compound comparison, a total of 202 constituents, including 94 flavonoids, 12 saponins, 30 phthalides, 38 organic acids, 3 amino acids, 7 alkaloids and 18 others, were unambiguously characterized or tentatively identified in QSC. Among them, 5 potential new compounds were detected and 12 pairs of isomers were comprehensively distinguished. Conclusively, the established multiple acquisition modes, computational data processing and analysis strategy proved to be useful for the in-depth structural identification of QSC. Qijiao Shengbai Capsule (QSC) is a reputable Miao Nationality medicine used for treating leukopenia, but its chemical composition has not yet been elucidated.![]()
Collapse
Affiliation(s)
- Chi Ma
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Yuhao Zhang
- School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, 211198, China
| | - Xiuxiu Dou
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Li Liu
- Guizhou Hanfang Pharmaceutical Co., Ltd., Guizhou, 550014, China
| | - Weidong Zhang
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
- School of Pharmacy, Naval Medical University, Shanghai 200433, China
- School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, 211198, China
| | - Ji Ye
- School of Pharmacy, Naval Medical University, Shanghai 200433, China
| |
Collapse
|
20
|
To PKP, Wu L, Chan CM, Hoque A, Lam H. ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics. J Proteome Res 2021; 20:5359-5367. [PMID: 34734728 DOI: 10.1021/acs.jproteome.1c00485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Modern shotgun proteomics experiments generate gigabytes of spectra every hour, only a fraction of which were utilized to form biological conclusions. Instead of being stored as flat files in public data repositories, this large amount of data can be better organized to facilitate data reuse. Clustering these spectra by similarity can be helpful in building high-quality spectral libraries, correcting identification errors, and highlighting frequently observed but unidentified spectra. However, large-scale clustering is time-consuming. Here, we present ClusterSheep, a method utilizing Graphics Processing Units (GPUs) to accelerate the process. Unlike previously proposed algorithms for this purpose, our method performs true pairwise comparison of all spectra within a precursor mass-to-charge ratio tolerance, thereby preserving the full cluster structures. ClusterSheep was benchmarked against previously reported clustering tools, MS-Cluster, MaRaCluster, and msCRUSH. The software tool also functions as an interactive visualization tool with a persistent state, enabling the user to explore the resulting clusters visually and retrieve the clustering results as desired.
Collapse
Affiliation(s)
- Paul Ka Po To
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Long Wu
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Chak Ming Chan
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Ayman Hoque
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| |
Collapse
|
21
|
Neto FC, Raftery D. Expanding Urinary Metabolite Annotation through Integrated Mass Spectral Similarity Networking. Anal Chem 2021; 93:12001-12010. [PMID: 34436864 PMCID: PMC8530160 DOI: 10.1021/acs.analchem.1c02041] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The urine metabolome constitutes a rich source of functional information reflecting physiological states that are influenced by distinct conditions and biological stresses, such as responses to drug treatments or disease manifestations. Although global liquid chromatography-mass spectrometry (MS) profiling provides the most comprehensive measurement of metabolites in complex biological samples, annotation remains a challenge, and computational approaches are necessary to translate the molecular composition into biological knowledge. Here, we investigated the use of tandem MS-based enhanced molecular networks (MolNetEnhancer) to improve the metabolite annotation of urine extracts. The samples (n = 10) were analyzed by hydrophilic interaction chromatography-quadrupole time-of-flight mass spectrometry in both electrospray ionization (ESI) modes. Consistent with other common data preprocessing software, the use of Progenesis QI led to the annotation of up to 20 metabolites based on MS2 library searches, showing a high fragmentation score (cosine similarity ≥ 0.7), that is, ∼2% of mass features containing MS2 spectra. Molecular networking based on library matching resulted in the annotation of up to 62 urinary compounds. Using a combination of unsupervised substructure discovery (MS2LDA), the in silico tool network annotation propagation (NAP), and ClassyFire chemical ontology, embedded in a multilayered molecular network by MolNetEnhancer, we were able to expand the chemical characterization to ∼50% of the data set. The integrative approach led to the annotation of 275 compounds at the metabolomics standards initiative (MSI) confidence level 2, as well as 459 and 578 urinary metabolites (MSI level 3) in both negative and positive ESI modes, respectively. The exhaustive MS2-based annotation outperformed similar studies applied to larger cohorts while offering the discovery of metabolites not identified by the MS2 library search. This is the first work that effectively integrates orthogonal annotation methods and MS2-based fragmentation studies to improve metabolite annotation in urine samples.
Collapse
Affiliation(s)
- Fausto Carnevale Neto
- Northwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine, University of Washington, 850 Republican Street, Seattle, Washington 98109, United States
| | - Daniel Raftery
- Northwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine, University of Washington, 850 Republican Street, Seattle, Washington 98109, United States
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, United States
| |
Collapse
|
22
|
Behsaz B, Bode E, Gurevich A, Shi YN, Grundmann F, Acharya D, Caraballo-Rodríguez AM, Bouslimani A, Panitchpakdi M, Linck A, Guan C, Oh J, Dorrestein PC, Bode HB, Pevzner PA, Mohimani H. Integrating genomics and metabolomics for scalable non-ribosomal peptide discovery. Nat Commun 2021; 12:3225. [PMID: 34050176 PMCID: PMC8163882 DOI: 10.1038/s41467-021-23502-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 05/04/2021] [Indexed: 02/07/2023] Open
Abstract
Non-Ribosomal Peptides (NRPs) represent a biomedically important class of natural products that include a multitude of antibiotics and other clinically used drugs. NRPs are not directly encoded in the genome but are instead produced by metabolic pathways encoded by biosynthetic gene clusters (BGCs). Since the existing genome mining tools predict many putative NRPs synthesized by a given BGC, it remains unclear which of these putative NRPs are correct and how to identify post-assembly modifications of amino acids in these NRPs in a blind mode, without knowing which modifications exist in the sample. To address this challenge, here we report NRPminer, a modification-tolerant tool for NRP discovery from large (meta)genomic and mass spectrometry datasets. We show that NRPminer is able to identify many NRPs from different environments, including four previously unreported NRP families from soil-associated microbes and NRPs from human microbiota. Furthermore, in this work we demonstrate the anti-parasitic activities and the structure of two of these NRP families using direct bioactivity screening and nuclear magnetic resonance spectrometry, illustrating the power of NRPminer for discovering bioactive NRPs.
Collapse
Affiliation(s)
- Bahar Behsaz
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California at San Diego, La Jolla, CA, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Edna Bode
- Molecular Biotechnology, Department of Biosciences, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St Petersburg, Russia
| | - Yan-Ni Shi
- Molecular Biotechnology, Department of Biosciences, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Florian Grundmann
- Molecular Biotechnology, Department of Biosciences, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Deepa Acharya
- Tiny Earth Chemistry Hub, University of Wisconsin-Madison, Madison, WI, USA
| | - Andrés Mauricio Caraballo-Rodríguez
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Amina Bouslimani
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Morgan Panitchpakdi
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Annabell Linck
- Molecular Biotechnology, Department of Biosciences, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Changhui Guan
- The Jackson Laboratory of Medical Genomics, Farmington, CT, USA
| | - Julia Oh
- The Jackson Laboratory of Medical Genomics, Farmington, CT, USA
| | - Pieter C Dorrestein
- Center for Microbiome Innovation, University of California at San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Helge B Bode
- Molecular Biotechnology, Department of Biosciences, Goethe University Frankfurt, Frankfurt am Main, Germany.
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt & Senckenberg Research Institute, Frankfurt am Main, Germany.
- Max-Planck-Institute for Terrestrial Microbiology, Department for Natural Products in Organismic Interactions, Marburg, Germany.
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
| | - Hosein Mohimani
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
23
|
Davoodi AG, Chang S, Yoo HG, Baweja A, Mongia M, Mohimani H. ForestDSH: a universal hash design for discrete probability distributions. Data Min Knowl Discov 2021. [DOI: 10.1007/s10618-020-00732-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
24
|
Abstract
Anaerobic gut fungi are important members of the gut microbiome of herbivores, yet they exist in small numbers relative to bacteria. Here, we show that these early-branching fungi produce a wealth of secondary metabolites (natural products) that may act to regulate the gut microbiome. We use an integrated 'omics'-based approach to classify the biosynthetic genes predicted from fungal genomes, determine transcriptionally active genes, and verify the presence of their enzymatic products. Our analysis reveals that anaerobic gut fungi are an untapped reservoir of bioactive compounds that could be harnessed for biotechnology. Anaerobic fungi (class Neocallimastigomycetes) thrive as low-abundance members of the herbivore digestive tract. The genomes of anaerobic gut fungi are poorly characterized and have not been extensively mined for the biosynthetic enzymes of natural products such as antibiotics. Here, we investigate the potential of anaerobic gut fungi to synthesize natural products that could regulate membership within the gut microbiome. Complementary 'omics' approaches were combined to catalog the natural products of anaerobic gut fungi from four different representative species: Anaeromyces robustus (A. robustus), Caecomyces churrovis (C. churrovis), Neocallimastix californiae (N. californiae), and Piromyces finnis (P. finnis). In total, 146 genes were identified that encode biosynthetic enzymes for diverse types of natural products, including nonribosomal peptide synthetases and polyketide synthases. In addition, N. californiae and C. churrovis genomes encoded seven putative bacteriocins, a class of antimicrobial peptides typically produced by bacteria. During standard laboratory growth on plant biomass or soluble substrates, 26% of total core biosynthetic genes in all four strains were transcribed. Across all four fungal strains, 30% of total biosynthetic gene products were detected via proteomics when grown on cellobiose. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) characterization of fungal supernatants detected 72 likely natural products from A. robustus alone. A compound produced by all four strains of anaerobic fungi was putatively identified as the polyketide-related styrylpyrone baumin. Molecular networking quantified similarities between tandem mass spectrometry (MS/MS) spectra among these fungi, enabling three groups of natural products to be identified that are unique to anaerobic fungi. Overall, these results support the finding that anaerobic gut fungi synthesize natural products, which could be harnessed as a source of antimicrobials, therapeutics, and other bioactive compounds.
Collapse
|
25
|
Permiakova O, Guibert R, Kraut A, Fortin T, Hesse AM, Burger T. CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis. BMC Bioinformatics 2021; 22:68. [PMID: 33579189 PMCID: PMC7881590 DOI: 10.1186/s12859-021-03969-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 01/14/2021] [Indexed: 11/16/2022] Open
Abstract
Background The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms. Results We propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles. Conclusions Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data.
Collapse
Affiliation(s)
- Olga Permiakova
- Univ. Grenoble Alpes, CEA, Inserm, BGE U1038, 38000, Grenoble, France
| | - Romain Guibert
- Univ. Grenoble Alpes, CEA, Inserm, BGE U1038, 38000, Grenoble, France
| | - Alexandra Kraut
- Univ. Grenoble Alpes, CEA, Inserm, BGE U1038, 38000, Grenoble, France
| | - Thomas Fortin
- Univ. Grenoble Alpes, CEA, Inserm, BGE U1038, 38000, Grenoble, France
| | - Anne-Marie Hesse
- Univ. Grenoble Alpes, CEA, Inserm, BGE U1038, 38000, Grenoble, France
| | - Thomas Burger
- Univ. Grenoble Alpes, CNRS, CEA, Inserm, BGE U1038, 38000, Grenoble, France.
| |
Collapse
|
26
|
Li Y, Qu J, Lin Y, Lu G, You Y, Jiang G, Wu Y. Visible Post-Data Analysis Protocol for Natural Mycotoxin Production. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2020; 68:9603-9611. [PMID: 32786838 DOI: 10.1021/acs.jafc.0c03814] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Fungal natural products are routinely analyzed using target detection protocols by comparing to commercial standards. However, discovery of new products suffers from a lack of high-throughput analytical techniques. Post-data process techniques have become popular tools for natural product confirmations and mycotoxin family analysis. In this work, a visible post-data process procedure with MZmine, GNPS, and Xcalibur was used for efficient analysis of high-resolution mass spectrometry. Conjugated products were screened with an optimized diagnostic fragmentation filtering module in MZmine and further confirmed with Xcalibur by comparing to unconjugated commercial standards. MS/MS spectral data were processed and used to establish a feature based on a molecular networking map in GNPS (Global Natural Products Social Molecular Networking; https://gnps.ucsd.edu), for visualization of fungal natural product families. The results demonstrate the potential of combining MZmine-, GNPS-, and Xcalibur-based methods for visible analysis of fungal natural products.
Collapse
Affiliation(s)
- Yanshen Li
- College of Life Science, Yantai University, Yantai, Shandong 264000, People's Republic of China
| | - Jinyao Qu
- College of Life Science, Yantai University, Yantai, Shandong 264000, People's Republic of China
| | - Yucheng Lin
- College of Life Science, Yantai University, Yantai, Shandong 264000, People's Republic of China
| | - Guozhu Lu
- College of Life Science, Yantai University, Yantai, Shandong 264000, People's Republic of China
| | - Yanli You
- College of Life Science, Yantai University, Yantai, Shandong 264000, People's Republic of China
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
| | - Yongning Wu
- NHC Key Laboratory of Food Safety Risk Assessment, Chinese Academy of Medical Science Research Unit (2019RU014), China National Center for Food Safety Risk Assessment, Beijing 100017, People's Republic of China
| |
Collapse
|
27
|
The M, Käll L. Focus on the spectra that matter by clustering of quantification data in shotgun proteomics. Nat Commun 2020; 11:3234. [PMID: 32591519 PMCID: PMC7319958 DOI: 10.1038/s41467-020-17037-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 06/08/2020] [Indexed: 02/02/2023] Open
Abstract
In shotgun proteomics, the analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow, thereby preventing valuable information from being discarded in the identification stage. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This reduces search time due to the data reduction. We can now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Quandenser+Triqler outperforms the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins for all tested datasets. Software is available for all major operating systems at https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.
Collapse
Affiliation(s)
- Matthew The
- Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, 17121, Solna, Sweden
| | - Lukas Käll
- Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, 17121, Solna, Sweden.
| |
Collapse
|
28
|
Aron AT, Gentry EC, McPhail KL, Nothias LF, Nothias-Esposito M, Bouslimani A, Petras D, Gauglitz JM, Sikora N, Vargas F, van der Hooft JJJ, Ernst M, Kang KB, Aceves CM, Caraballo-Rodríguez AM, Koester I, Weldon KC, Bertrand S, Roullier C, Sun K, Tehan RM, Boya P CA, Christian MH, Gutiérrez M, Ulloa AM, Tejeda Mora JA, Mojica-Flores R, Lakey-Beitia J, Vásquez-Chaves V, Zhang Y, Calderón AI, Tayler N, Keyzers RA, Tugizimana F, Ndlovu N, Aksenov AA, Jarmusch AK, Schmid R, Truman AW, Bandeira N, Wang M, Dorrestein PC. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat Protoc 2020; 15:1954-1991. [PMID: 32405051 DOI: 10.1038/s41596-020-0317-5] [Citation(s) in RCA: 290] [Impact Index Per Article: 72.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 03/03/2020] [Indexed: 02/06/2023]
Abstract
Global Natural Product Social Molecular Networking (GNPS) is an interactive online small molecule-focused tandem mass spectrometry (MS2) data curation and analysis infrastructure. It is intended to provide as much chemical insight as possible into an untargeted MS2 dataset and to connect this chemical insight to the user's underlying biological questions. This can be performed within one liquid chromatography (LC)-MS2 experiment or at the repository scale. GNPS-MassIVE is a public data repository for untargeted MS2 data with sample information (metadata) and annotated MS2 spectra. These publicly accessible data can be annotated and updated with the GNPS infrastructure keeping a continuous record of all changes. This knowledge is disseminated across all public data; it is a living dataset. Molecular networking-one of the main analysis tools used within the GNPS platform-creates a structured data table that reflects the molecular diversity captured in tandem mass spectrometry experiments by computing the relationships of the MS2 spectra as spectral similarity. This protocol provides step-by-step instructions for creating reproducible, high-quality molecular networks. For training purposes, the reader is led through a 90- to 120-min procedure that starts by recalling an example public dataset and its sample information and proceeds to creating and interpreting a molecular network. Each data analysis job can be shared or cloned to disseminate the knowledge gained, thus propagating information that can lead to the discovery of molecules, metabolic pathways, and ecosystem/community interactions.
Collapse
Affiliation(s)
- Allegra T Aron
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Emily C Gentry
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Kerry L McPhail
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, USA
| | - Louis-Félix Nothias
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Mélissa Nothias-Esposito
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Amina Bouslimani
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Daniel Petras
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Julia M Gauglitz
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Nicole Sikora
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Fernando Vargas
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | | | - Madeleine Ernst
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Kyo Bin Kang
- College of Pharmacy, Sookmyung Women's University, Seoul, Korea
| | - Christine M Aceves
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | | | - Irina Koester
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Kelly C Weldon
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Center of Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - Samuel Bertrand
- Groupe Mer, Molécules, Santé-EA 2160, UFR des Sciences Pharmaceutiques et Biologiques, Université de Nantes, Nantes, France
- ThalassOMICS Metabolomics Facility, Plateforme Corsaire, Biogenouest, Nantes, France
| | - Catherine Roullier
- College of Pharmacy, Sookmyung Women's University, Seoul, Korea
- ThalassOMICS Metabolomics Facility, Plateforme Corsaire, Biogenouest, Nantes, France
| | - Kunyang Sun
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Richard M Tehan
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, USA
| | - Cristopher A Boya P
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
- Department of Biotechnology, Acharya Nagarjuna University, Guntur, Nagarjuna Nagar, India
| | - Martin H Christian
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
| | - Marcelino Gutiérrez
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
| | | | | | - Randy Mojica-Flores
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
- Departamento de Química, Universidad Autónoma de Chiriquí (UNACHI), David, Chiriquí, Panama
| | - Johant Lakey-Beitia
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
| | - Victor Vásquez-Chaves
- Centro de Investigaciones en Productos Naturales (CIPRONA), Universidad de Costa Rica, San José, Costa Rica
| | - Yilue Zhang
- Department of Drug Discovery and Development, Harrison School of Pharmacy, Auburn University, Auburn, AL, USA
| | - Angela I Calderón
- Department of Drug Discovery and Development, Harrison School of Pharmacy, Auburn University, Auburn, AL, USA
| | - Nicole Tayler
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
- Department of Biotechnology, Acharya Nagarjuna University, Guntur, Nagarjuna Nagar, India
| | - Robert A Keyzers
- School of Chemical & Physical Sciences, Victoria University of Wellington, Wellington, New Zealand
| | - Fidele Tugizimana
- Centre for Plant Metabolomics Research, Department of Biochemistry, University of Johannesburg, Auckland Park, South Africa
- International R&D Division, Omnia Group (Pty) Ltd., Johannesburg, South Africa
| | - Nombuso Ndlovu
- Centre for Plant Metabolomics Research, Department of Biochemistry, University of Johannesburg, Auckland Park, South Africa
| | - Alexander A Aksenov
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Alan K Jarmusch
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Robin Schmid
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Andrew W Truman
- Department of Molecular Microbiology, John Innes Centre, Norwich, UK
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
| | - Mingxun Wang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA.
| | - Pieter C Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA.
- Center for Computational Mass Spectrometry, University of California, San Diego, La Jolla, CA, USA.
- Department of Pharmacology, University of California, San Diego, La Jolla, CA, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
29
|
Verheggen K, Raeder H, Berven FS, Martens L, Barsnes H, Vaudel M. Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows. MASS SPECTROMETRY REVIEWS 2020; 39:292-306. [PMID: 28902424 DOI: 10.1002/mas.21543] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 07/05/2017] [Indexed: 06/07/2023]
Abstract
Sequence database search engines are bioinformatics algorithms that identify peptides from tandem mass spectra using a reference protein sequence database. Two decades of development, notably driven by advances in mass spectrometry, have provided scientists with more than 30 published search engines, each with its own properties. In this review, we present the common paradigm behind the different implementations, and its limitations for modern mass spectrometry datasets. We also detail how the search engines attempt to alleviate these limitations, and provide an overview of the different software frameworks available to the researcher. Finally, we highlight alternative approaches for the identification of proteomic mass spectrometry datasets, either as a replacement for, or as a complement to, sequence database search engines.
Collapse
Affiliation(s)
- Kenneth Verheggen
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Helge Raeder
- KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Norway
- Department of Pediatrics, Haukeland University Hospital, Bergen, Norway
| | - Frode S Berven
- Proteomics Unit, Department of Biomedicine, University of Bergen, Norway
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Harald Barsnes
- KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Norway
- Proteomics Unit, Department of Biomedicine, University of Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Norway
| | - Marc Vaudel
- KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Norway
- Proteomics Unit, Department of Biomedicine, University of Bergen, Norway
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
| |
Collapse
|
30
|
Klein JA, Zaia J. A Perspective on the Confident Comparison of Glycoprotein Site-Specific Glycosylation in Sample Cohorts. Biochemistry 2019; 59:3089-3097. [PMID: 31833756 DOI: 10.1021/acs.biochem.9b00730] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein glycosylation, resulting from glycosyl transferase reactions under complex control in the secretory pathway, consists of a distribution of related glycoforms at each glycosylation site. Because the biosynthetic substrate concentration and transport rates depend on architecture and other aspects of cellular phenotypes, site-specific glycosylation cannot be predicted accurately from genomic, transcriptomic, or proteomic information. Rather, it is necessary to quantify glycosylation at each protein site and how this changes among a sample cohort to provide information about disease mechanisms. At present, mature mass spectrometry-based methods allow for qualitative assignment of the glycan composition and glycosylation site of singly glycosylated proteolytic peptides. To make such quantitative comparisons, it is necessary to sample the glycosylation distribution with sufficient coverage and accuracy for confident assessment of the glycosylation changes that occur in the biological cohort. In this Perspective, we discuss the unmet needs for mass spectrometry acquisition methods and bioinformatics for the confident comparison of protein site-specific glycosylation among sample cohorts.
Collapse
|
31
|
De Novo Peptide Sequencing Reveals Many Cyclopeptides in the Human Gut and Other Environments. Cell Syst 2019; 10:99-108.e5. [PMID: 31864964 DOI: 10.1016/j.cels.2019.11.007] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 09/18/2019] [Accepted: 11/18/2019] [Indexed: 12/20/2022]
Abstract
Cyclic and branch cyclic peptides (cyclopeptides) represent a class of bioactive natural products that include many antibiotics and anti-tumor compounds. Despite the recent advances in metabolomics analysis, still little is known about the cyclopeptides in the human gut and their possible interactions due to a lack of computational analysis pipelines that are applicable to such compounds. Here, we introduce CycloNovo, an algorithm for automated de novo cyclopeptide analysis and sequencing that employs de Bruijn graphs, the workhorse of DNA sequencing algorithms, to identify cyclopeptides in spectral datasets. CycloNovo reconstructed 32 previously unreported cyclopeptides (to the best of our knowledge) in the human gut and reported over a hundred cyclopeptides in other environments represented by various spectra on Global Natural Products Social Molecular Network (GNPS). https://github.com/bbehsaz/cyclonovo.
Collapse
|
32
|
Hautbergue T, Jamin EL, Costantino R, Tadrist S, Meneghetti L, Tabet JC, Debrauwer L, Oswald IP, Puel O. Combination of Isotope Labeling and Molecular Networking of Tandem Mass Spectrometry Data To Reveal 69 Unknown Metabolites Produced by Penicillium nordicum. Anal Chem 2019; 91:12191-12202. [PMID: 31464421 DOI: 10.1021/acs.analchem.9b01634] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The secondary metabolome of Penicillium nordicum is poorly documented despite its frequent detection on contaminated food and its capacity to produce toxic metabolites such as ochratoxin A. To characterize metabolites produced by this fungi, we combined a full stable isotopes labeling with the dereplication of tandem mass spectrometry (MS/MS) data by molecular networking. First, the untargeted metabolomic analysis by high-resolution mass spectrometry of a double stable isotope labeling of P. nordicum enabled the specific detection of its metabolites and the unambiguous determination of their elemental composition. Analyses showed that infection of substrate by P. nordicum lead to the production of at least 92 metabolites and that 69 of them were completely unknown. Then, curated molecular networks of MS/MS data were generated with GNPS and MetGem, specifically on the features of interest, which allowed highlighting 13 fungisporin-related metabolites that had not previously been identified in this fungus and 8 that had never been observed in any fungus. The structures of the unknown compounds, namely, a native fungisporin and seven linear peptides, were characterized by tandem mass spectrometry experiments. The analysis of P. nordicum growing on its natural substrates, i.e. pork ham, turkey ham, and cheese, demonstrated that 10 of the known fungisporin-related metabolites and three of the new metabolites were also synthesized. Thus, the curation of data for molecular networking using a specific detection of metabolites of interest with stable isotopes labeling allowed the discovery of new metabolites produced by the food contaminant P. nordicum.
Collapse
Affiliation(s)
- Thaïs Hautbergue
- Toxalim (Research Centre in Food Toxicology) , Université de Toulouse, INRA, ENVT, INP-Purpan , UPS , F-31027 Toulouse , France.,Axiom platform, MetaToul-MetaboHUB , National Infrastructure for Metabolomics and Fluxomics , F-31027 Toulouse , France
| | - Emilien L Jamin
- Toxalim (Research Centre in Food Toxicology) , Université de Toulouse, INRA, ENVT, INP-Purpan , UPS , F-31027 Toulouse , France.,Axiom platform, MetaToul-MetaboHUB , National Infrastructure for Metabolomics and Fluxomics , F-31027 Toulouse , France
| | - Robin Costantino
- Toxalim (Research Centre in Food Toxicology) , Université de Toulouse, INRA, ENVT, INP-Purpan , UPS , F-31027 Toulouse , France.,Axiom platform, MetaToul-MetaboHUB , National Infrastructure for Metabolomics and Fluxomics , F-31027 Toulouse , France
| | - Souria Tadrist
- Toxalim (Research Centre in Food Toxicology) , Université de Toulouse, INRA, ENVT, INP-Purpan , UPS , F-31027 Toulouse , France
| | - Lauriane Meneghetti
- Toxalim (Research Centre in Food Toxicology) , Université de Toulouse, INRA, ENVT, INP-Purpan , UPS , F-31027 Toulouse , France.,Axiom platform, MetaToul-MetaboHUB , National Infrastructure for Metabolomics and Fluxomics , F-31027 Toulouse , France
| | - Jean-Claude Tabet
- Service de Pharmacologie et d'Immunoanalyse (SPI), Laboratoire d'Etude du Métabolisme des Médicaments, CEA, INRA , Université Paris Saclay, MetaboHUB , F-91191 Gif-sur-Yvette , France.,Sorbonne Universités , Campus Pierre et Marie Curie, IPCM , 4 place Jussieu , 75252 Paris Cedex 05, France
| | - Laurent Debrauwer
- Toxalim (Research Centre in Food Toxicology) , Université de Toulouse, INRA, ENVT, INP-Purpan , UPS , F-31027 Toulouse , France.,Axiom platform, MetaToul-MetaboHUB , National Infrastructure for Metabolomics and Fluxomics , F-31027 Toulouse , France
| | - Isabelle P Oswald
- Toxalim (Research Centre in Food Toxicology) , Université de Toulouse, INRA, ENVT, INP-Purpan , UPS , F-31027 Toulouse , France
| | - Olivier Puel
- Toxalim (Research Centre in Food Toxicology) , Université de Toulouse, INRA, ENVT, INP-Purpan , UPS , F-31027 Toulouse , France
| |
Collapse
|
33
|
Pilon AC, Gu H, Raftery D, Bolzani VDS, Lopes NP, Castro-Gamboa I, Carnevale Neto F. Mass Spectral Similarity Networking and Gas-Phase Fragmentation Reactions in the Structural Analysis of Flavonoid Glycoconjugates. Anal Chem 2019; 91:10413-10423. [PMID: 31313915 DOI: 10.1021/acs.analchem.8b05479] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Flavonoids represent an important class of natural products with a central role in plant physiology and human health. Their accurate annotation using untargeted mass spectrometry analysis still relies on differentiating similar chemical scaffolds through spectral matching to reference library spectra. In this work, we combined molecular network analysis with rules for fragment reactions and chemotaxonomy to enhance the annotation of similar flavonoid glyconjugates. Molecular network topology progressively propagated the flavonoid chemical functionalization according to collision-induced dissociation (CID) reactions, as the following chemical attributes: aglycone nature, saccharide type and number, and presence of methoxy substituents. This structure-based distribution across the spectral networks revealed the chemical composition of flavonoids across intra- and interspecies and guided the putatively assignment of 64 isomers and isobars in the Chrysobalanaceae plant species, most of which are not accurately annotated by automated untargeted MS2 matching. These proof of concept results demonstrate how molecular networking progressively grouped structurally related molecules according to their product ion scans, abundances, and ratios. The approach can be extrapolated to other classes of metabolites sharing similar structures and diagnostic fragments from tandem mass spectrometry.
Collapse
Affiliation(s)
- Alan Cesar Pilon
- Núcleo de Bioensaios, Biossíntese e Ecofisiologia de Produtos Naturais (NuBBE), Departamento de Química Orgânica, Instituto de Química , Universidade Estadual Paulista (UNESP) , Araraquara 14800-900 , São Paulo , Brazil.,Núcleo de Pesquisa em Produtos Naturais e Sintéticos (NPPNS), Departamento de Física e Química, Faculdade de Ciências Farmacêuticas de Ribeirão Preto , Universidade de São Paulo , Ribeirão Preto 14040-903 , São Paulo Brazil
| | - Haiwei Gu
- Northwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine , University of Washington , 850 Republican Street , Seattle , Washington 98109 , United States.,Jiangxi Key Laboratory for Mass Spectrometry and Instrumentation , East China Institute of Technology , Nanchang , Jiangxi Province 330013 , People's Republic of China
| | - Daniel Raftery
- Northwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine , University of Washington , 850 Republican Street , Seattle , Washington 98109 , United States.,Public Health Sciences Division , Fred Hutchinson Cancer Research Center , Seattle , Washington 98109 , United States
| | - Vanderlan da Silva Bolzani
- Núcleo de Bioensaios, Biossíntese e Ecofisiologia de Produtos Naturais (NuBBE), Departamento de Química Orgânica, Instituto de Química , Universidade Estadual Paulista (UNESP) , Araraquara 14800-900 , São Paulo , Brazil
| | - Norberto Peporine Lopes
- Núcleo de Pesquisa em Produtos Naturais e Sintéticos (NPPNS), Departamento de Física e Química, Faculdade de Ciências Farmacêuticas de Ribeirão Preto , Universidade de São Paulo , Ribeirão Preto 14040-903 , São Paulo Brazil
| | - Ian Castro-Gamboa
- Núcleo de Bioensaios, Biossíntese e Ecofisiologia de Produtos Naturais (NuBBE), Departamento de Química Orgânica, Instituto de Química , Universidade Estadual Paulista (UNESP) , Araraquara 14800-900 , São Paulo , Brazil
| | - Fausto Carnevale Neto
- Núcleo de Bioensaios, Biossíntese e Ecofisiologia de Produtos Naturais (NuBBE), Departamento de Química Orgânica, Instituto de Química , Universidade Estadual Paulista (UNESP) , Araraquara 14800-900 , São Paulo , Brazil.,Núcleo de Pesquisa em Produtos Naturais e Sintéticos (NPPNS), Departamento de Física e Química, Faculdade de Ciências Farmacêuticas de Ribeirão Preto , Universidade de São Paulo , Ribeirão Preto 14040-903 , São Paulo Brazil.,Northwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine , University of Washington , 850 Republican Street , Seattle , Washington 98109 , United States
| |
Collapse
|
34
|
Bouslimani A, da Silva R, Kosciolek T, Janssen S, Callewaert C, Amir A, Dorrestein K, Melnik AV, Zaramela LS, Kim JN, Humphrey G, Schwartz T, Sanders K, Brennan C, Luzzatto-Knaan T, Ackermann G, McDonald D, Zengler K, Knight R, Dorrestein PC. The impact of skin care products on skin chemistry and microbiome dynamics. BMC Biol 2019; 17:47. [PMID: 31189482 PMCID: PMC6560912 DOI: 10.1186/s12915-019-0660-6] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 04/30/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Use of skin personal care products on a regular basis is nearly ubiquitous, but their effects on molecular and microbial diversity of the skin are unknown. We evaluated the impact of four beauty products (a facial lotion, a moisturizer, a foot powder, and a deodorant) on 11 volunteers over 9 weeks. RESULTS Mass spectrometry and 16S rRNA inventories of the skin revealed decreases in chemical as well as in bacterial and archaeal diversity on halting deodorant use. Specific compounds from beauty products used before the study remain detectable with half-lives of 0.5-1.9 weeks. The deodorant and foot powder increased molecular, bacterial, and archaeal diversity, while arm and face lotions had little effect on bacterial and archaeal but increased chemical diversity. Personal care product effects last for weeks and produce highly individualized responses, including alterations in steroid and pheromone levels and in bacterial and archaeal ecosystem structure and dynamics. CONCLUSIONS These findings may lead to next-generation precision beauty products and therapies for skin disorders.
Collapse
Affiliation(s)
- Amina Bouslimani
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, San Diego, USA
| | - Ricardo da Silva
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, San Diego, USA
| | - Tomasz Kosciolek
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Stefan Janssen
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
- Department for Pediatric Oncology, Hematology and Clinical Immunology, University Children's Hospital, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Chris Callewaert
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
- Center for Microbial Ecology and Technology, Ghent University, 9000, Ghent, Belgium
| | - Amnon Amir
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Kathleen Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, San Diego, USA
| | - Alexey V Melnik
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, San Diego, USA
| | - Livia S Zaramela
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Ji-Nu Kim
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Gregory Humphrey
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Tara Schwartz
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Karenina Sanders
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Caitriona Brennan
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Tal Luzzatto-Knaan
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, San Diego, USA
| | - Gail Ackermann
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
| | - Karsten Zengler
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92307, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA.
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92307, USA.
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, 92093, USA.
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, 92093, USA.
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, San Diego, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92037, USA.
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92307, USA.
- Department of Pharmacology, University of California, San Diego, La Jolla, CA, 92037, USA.
| |
Collapse
|
35
|
Griss J, Stanek F, Hudecz O, Dürnberger G, Perez-Riverol Y, Vizcaíno JA, Mechtler K. Spectral Clustering Improves Label-Free Quantification of Low-Abundant Proteins. J Proteome Res 2019; 18:1477-1485. [PMID: 30859831 PMCID: PMC6456873 DOI: 10.1021/acs.jproteome.8b00377] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Indexed: 11/29/2022]
Abstract
Label-free quantification has become a common-practice in many mass spectrometry-based proteomics experiments. In recent years, we and others have shown that spectral clustering can considerably improve the analysis of (primarily large-scale) proteomics data sets. Here we show that spectral clustering can be used to infer additional peptide-spectrum matches and improve the quality of label-free quantitative proteomics data in data sets also containing only tens of MS runs. We analyzed four well-known public benchmark data sets that represent different experimental settings using spectral counting and peak intensity based label-free quantification. In both approaches, the additionally inferred peptide-spectrum matches through our spectra-cluster algorithm improved the detectability of low abundant proteins while increasing the accuracy of the derived quantitative data, without increasing the data sets' noise. Additionally, we developed a Proteome Discoverer node for our spectra-cluster algorithm which allows anyone to rebuild our proposed pipeline using the free version of Proteome Discoverer.
Collapse
Affiliation(s)
- Johannes Griss
- Department
of Dermatology, Medical University of Vienna, Währinger Gürtel 18-20, 1090 Vienna, Austria
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, CB10
1SD Hinxton, Cambridge, United Kingdom
| | - Florian Stanek
- Research
Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, 1030 Vienna, Austria
- Institute
of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Otto Hudecz
- Research
Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, 1030 Vienna, Austria
- Institute
of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Gerhard Dürnberger
- Research
Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, 1030 Vienna, Austria
- Institute
of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
- Gregor
Mendel Institute of Molecular Plant Biology (GMI), Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, CB10
1SD Hinxton, Cambridge, United Kingdom
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, CB10
1SD Hinxton, Cambridge, United Kingdom
| | - Karl Mechtler
- Research
Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, 1030 Vienna, Austria
- Institute
of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| |
Collapse
|
36
|
Deutsch EW, Perez-Riverol Y, Chalkley RJ, Wilhelm M, Tate S, Sachsenberg T, Walzer M, Käll L, Delanghe B, Böcker S, Schymanski EL, Wilmes P, Dorfer V, Kuster B, Volders PJ, Jehmlich N, Vissers JP, Wolan DW, Wang AY, Mendoza L, Shofstahl J, Dowsey AW, Griss J, Salek RM, Neumann S, Binz PA, Lam H, Vizcaíno JA, Bandeira N, Röst H. Expanding the Use of Spectral Libraries in Proteomics. J Proteome Res 2018; 17:4051-4060. [PMID: 30270626 PMCID: PMC6443480 DOI: 10.1021/acs.jproteome.8b00485] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Robert J. Chalkley
- University of California San Francisco, San Francisco, 94158, California, United States
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
| | | | - Timo Sachsenberg
- Department of Computer Science, Center for Bioinformatics, University of Tübingen, Sand 14, Tübingen, 72076, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH − Royal Institute of Technology, Stockholm 114 28, Sweden
| | - Bernard Delanghe
- Thermo Fisher Scientific Bremen, Hanna-Kunath Str. 11, 28199 Bremen, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Viktoria Dorfer
- University of Applied Sciences Upper Austria, Bioinformatics Research Group, Hagenberg, 4232, Austria
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
- Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich, Freising, 85354, Germany
| | | | - Nico Jehmlich
- Helmholtz-Centre for Environmental Research - UFZ, Leipzig, Germany
| | | | - Dennis W. Wolan
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Ana Y. Wang
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Jim Shofstahl
- Thermo Fisher Scientific, 355 River Oaks Parkway San Jose, CA 95134
| | - Andrew W. Dowsey
- Department of Population Health Sciences and Bristol Veterinary School, Faculty of Health Sciences, University of Bristol, Bristol BS9 1BN, UK
| | - Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Währinger Gürtel 18-20, Vienna 1090, Austria
| | - Reza M. Salek
- The International Agency for Research on Cancer (IARC), 150 Cours Albert Thomas, 69372 Lyon CEDEX 08, France
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Department of Stress and Developmental Biology, 06120 Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Pierre-Alain Binz
- Clinical Chemistry Service, Centre Hospitalier Universitaire Vaudois, 1011 Lausanne, Switzerland
| | - Henry Lam
- Department of Chemical and Biological Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 92093-0404, USA
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, 160 College St., Toronto, ON, M5S 3E1, Canada
| |
Collapse
|
37
|
Kind T, Tsugawa H, Cajka T, Ma Y, Lai Z, Mehta SS, Wohlgemuth G, Barupal DK, Showalter MR, Arita M, Fiehn O. Identification of small molecules using accurate mass MS/MS search. MASS SPECTROMETRY REVIEWS 2018; 37:513-532. [PMID: 28436590 PMCID: PMC8106966 DOI: 10.1002/mas.21535] [Citation(s) in RCA: 249] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Revised: 03/17/2017] [Accepted: 03/18/2017] [Indexed: 05/03/2023]
Abstract
Tandem mass spectral library search (MS/MS) is the fastest way to correctly annotate MS/MS spectra from screening small molecules in fields such as environmental analysis, drug screening, lipid analysis, and metabolomics. The confidence in MS/MS-based annotation of chemical structures is impacted by instrumental settings and requirements, data acquisition modes including data-dependent and data-independent methods, library scoring algorithms, as well as post-curation steps. We critically discuss parameters that influence search results, such as mass accuracy, precursor ion isolation width, intensity thresholds, centroiding algorithms, and acquisition speed. A range of publicly and commercially available MS/MS databases such as NIST, MassBank, MoNA, LipidBlast, Wiley MSforID, and METLIN are surveyed. In addition, software tools including NIST MS Search, MS-DIAL, Mass Frontier, SmileMS, Mass++, and XCMS2 to perform fast MS/MS search are discussed. MS/MS scoring algorithms and challenges during compound annotation are reviewed. Advanced methods such as the in silico generation of tandem mass spectra using quantum chemistry and machine learning methods are covered. Community efforts for curation and sharing of tandem mass spectra that will allow for faster distribution of scientific discoveries are discussed.
Collapse
Affiliation(s)
- Tobias Kind
- Genome Center, Metabolomics, UC Davis, Davis, California
| | - Hiroshi Tsugawa
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, Japan
| | - Tomas Cajka
- Genome Center, Metabolomics, UC Davis, Davis, California
| | - Yan Ma
- National Institute of Biological Sciences, Beijing, People’s Republic of China
| | - Zijuan Lai
- Genome Center, Metabolomics, UC Davis, Davis, California
| | | | | | | | | | - Masanori Arita
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, Japan
| | - Oliver Fiehn
- Genome Center, Metabolomics, UC Davis, Davis, California
- Faculty of Sciences, Department of Biochemistry, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
38
|
Griss J, Perez-Riverol Y, The M, Käll L, Vizcaíno JA. Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra". J Proteome Res 2018; 17:1993-1996. [PMID: 29682973 DOI: 10.1021/acs.jproteome.7b00824] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology , Medical University of Vienna , Währinger Gürtel 18-20 , Vienna 1090 , Austria.,European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Matthew The
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Stockholm 114 28 , Sweden
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Stockholm 114 28 , Sweden
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD , United Kingdom
| |
Collapse
|
39
|
Hurst GB, Asano KG, Doktycz CJ, Consoli EJ, Doktycz WL, Foster CM, Morrell-Falvey JL, Standaert RF, Doktycz MJ. Proteomics-Based Tools for Evaluation of Cell-Free Protein Synthesis. Anal Chem 2017; 89:11443-11451. [DOI: 10.1021/acs.analchem.7b02555] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
| | | | | | | | | | | | | | - Robert F. Standaert
- University of Tennessee, Department of Biochemistry & Cellular and Molecular Biology, Knoxville, Tennessee 37996, United States
| | | |
Collapse
|
40
|
Rieder V, Schork KU, Kerschke L, Blank-Landeshammer B, Sickmann A, Rahnenführer J. Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra. J Proteome Res 2017; 16:4035-4044. [DOI: 10.1021/acs.jproteome.7b00427] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
- Vera Rieder
- Department
of Statistics, TU Dortmund University, 44221 Dortmund, Germany
| | - Karin U. Schork
- Department
of Statistics, TU Dortmund University, 44221 Dortmund, Germany
- Medizinische
Fakultät, Medizinisches Proteom-Center, Ruhr-University Bochum, 44801 Bochum, Germany
| | - Laura Kerschke
- Department
of Statistics, TU Dortmund University, 44221 Dortmund, Germany
- Institut für Biometrie und Klinische Forschung (IBKF) der Westfälischen Wilhelms-Universität und des Universitätsklinikums Münster, 48149 Münster, Germany
| | | | - Albert Sickmann
- Medizinische
Fakultät, Medizinisches Proteom-Center, Ruhr-University Bochum, 44801 Bochum, Germany
- Leibniz-Institut für Analytische Wissenschaften-ISAS - e.V., 44139 Dortmund, Germany
- Department
of Chemistry, College of Physical Sciences, University of Aberdeen, Aberdeen AB24 3FX, Scotland, United Kingdom
| | - Jörg Rahnenführer
- Department
of Statistics, TU Dortmund University, 44221 Dortmund, Germany
| |
Collapse
|
41
|
Shao W, Lam H. Tandem mass spectral libraries of peptides and their roles in proteomics research. MASS SPECTROMETRY REVIEWS 2017; 36:634-648. [PMID: 27403644 DOI: 10.1002/mas.21512] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 05/21/2016] [Indexed: 05/15/2023]
Abstract
Proteomics is a rapidly maturing field aimed at the high-throughput identification and quantification of all proteins in a biological system. The cornerstone of proteomic technology is tandem mass spectrometry of peptides resulting from the digestion of protein mixtures. The fragmentation pattern of each peptide ion is captured in its tandem mass spectrum, which enables its identification and acts as a fingerprint for the peptide. Spectral libraries are simply searchable collections of these fingerprints, which have taken on an increasingly prominent role in proteomic data analysis. This review describes the historical development of spectral libraries in proteomics, details the computational procedures behind library building and searching, surveys the current applications of spectral libraries, and discusses the outstanding challenges. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:634-648, 2017.
Collapse
Affiliation(s)
- Wenguang Shao
- Department of Biology, Institute of Molecular Systems Biology, Eidgenössische Technische Hochschule (ETH) Zurich, Zurich, Switzerland
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
- Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| |
Collapse
|
42
|
Boya P CA, Fernández-Marín H, Mejía LC, Spadafora C, Dorrestein PC, Gutiérrez M. Imaging mass spectrometry and MS/MS molecular networking reveals chemical interactions among cuticular bacteria and pathogenic fungi associated with fungus-growing ants. Sci Rep 2017; 7:5604. [PMID: 28717220 PMCID: PMC5514151 DOI: 10.1038/s41598-017-05515-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 05/31/2017] [Indexed: 01/25/2023] Open
Abstract
The fungus-growing ant-microbe symbiosis is an ideal system to study chemistry-based microbial interactions due to the wealth of microbial interactions described, and the lack of information on the molecules involved therein. In this study, we employed a combination of MALDI imaging mass spectrometry (MALDI-IMS) and MS/MS molecular networking to study chemistry-based microbial interactions in this system. MALDI IMS was used to visualize the distribution of antimicrobials at the inhibition zone between bacteria associated to the ant Acromyrmex echinatior and the fungal pathogen Escovopsis sp. MS/MS molecular networking was used for the dereplication of compounds found at the inhibition zones. We identified the antibiotics actinomycins D, X2 and X0β, produced by the bacterium Streptomyces CBR38; and the macrolides elaiophylin, efomycin A and efomycin G, produced by the bacterium Streptomyces CBR53.These metabolites were found at the inhibition zones using MALDI IMS and were identified using MS/MS molecular networking. Additionally, three shearinines D, F, and J produced by the fungal pathogen Escovopsis TZ49 were detected. This is the first report of elaiophylins, actinomycin X0β and shearinines in the fungus-growing ant symbiotic system. These results suggest a secondary prophylactic use of these antibiotics by A. echinatior because of their permanent production by the bacteria.
Collapse
Affiliation(s)
- Cristopher A Boya P
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panamá, Apartado 0843-01103, Republic of Panama.,Department of Biotechnology, Acharya Nagarjuna University, Guntur, Nagarjuna Nagar, 522 510, India
| | - Hermógenes Fernández-Marín
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panamá, Apartado 0843-01103, Republic of Panama
| | - Luis C Mejía
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panamá, Apartado 0843-01103, Republic of Panama
| | - Carmenza Spadafora
- Centro de Biología Celular y Molecular de Enfermedades, INDICASAT AIP, Panamá, Apartado 0843-01103, Republic of Panama
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California at San Diego, San Diego, California, 92093, United States.,Department of Pharmacology, University of California at San Diego, San Diego, California, 92093, United States
| | - Marcelino Gutiérrez
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panamá, Apartado 0843-01103, Republic of Panama.
| |
Collapse
|
43
|
Kolmogorov M, Kennedy E, Dong Z, Timp G, Pevzner PA. Single-molecule protein identification by sub-nanopore sensors. PLoS Comput Biol 2017; 13:e1005356. [PMID: 28486472 PMCID: PMC5423552 DOI: 10.1371/journal.pcbi.1005356] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 01/09/2017] [Indexed: 11/19/2022] Open
Abstract
Recent advances in top-down mass spectrometry enabled identification of intact proteins, but this technology still faces challenges. For example, top-down mass spectrometry suffers from a lack of sensitivity since the ion counts for a single fragmentation event are often low. In contrast, nanopore technology is exquisitely sensitive to single intact molecules, but it has only been successfully applied to DNA sequencing, so far. Here, we explore the potential of sub-nanopores for single-molecule protein identification (SMPI) and describe an algorithm for identification of the electrical current blockade signal (nanospectrum) resulting from the translocation of a denaturated, linearly charged protein through a sub-nanopore. The analysis of identification p-values suggests that the current technology is already sufficient for matching nanospectra against small protein databases, e.g., protein identification in bacterial proteomes. Protein identification is the key step in many proteomics studies. Currently, the most popular technique for intact protein analysis is top-down mass spectrometry which recently enabled high-throughput identification of many proteins and their proteoforms. However, this approach requires large amounts of materials and is currently limited to short proteins, typically less than 30 kDa. On the other hand, nanopore sensors promise single molecule sensitivity in protein analysis, but an approach for the identification of a single protein from its blockade current (nanospectrum) has remained elusive, since the signal from the sensors relates to the amino acid sequence of the protein in a poorly understood way. In this work we describe the first algorithm for protein identification based on nanospectra associated with translocation of proteins through pores with sub-nanometer diameters. While identification accuracy currently does not allow reliable processing of complex protein samples yet, we believe, that the rapidly improving experimental protocols along with the new computational algorithms will transform into a viable protein identification approach in the near future.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
| | - Eamonn Kennedy
- Electrical Engineering and Biological Science, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Zhuxin Dong
- Electrical Engineering and Biological Science, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Gregory Timp
- Electrical Engineering and Biological Science, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
44
|
Bruderer R, Sondermann J, Tsou CC, Barrantes-Freer A, Stadelmann C, Nesvizhskii AI, Schmidt M, Reiter L, Gomez-Varela D. New targeted approaches for the quantification of data-independent acquisition mass spectrometry. Proteomics 2017; 17:10.1002/pmic.201700021. [PMID: 28319648 PMCID: PMC5870755 DOI: 10.1002/pmic.201700021] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 03/13/2017] [Accepted: 03/14/2017] [Indexed: 11/10/2022]
Abstract
The use of data-independent acquisition (DIA) approaches for the reproducible and precise quantification of complex protein samples has increased in the last years. The protein information arising from DIA analysis is stored in digital protein maps (DIA maps) that can be interrogated in a targeted way by using ad hoc or publically available peptide spectral libraries generated on the same sample species as for the generation of the DIA maps. The restricted availability of certain difficult-to-obtain human tissues (i.e., brain) together with the caveats of using spectral libraries generated under variable experimental conditions limits the potential of DIA. Therefore, DIA workflows would benefit from high-quality and extended spectral libraries that could be generated without the need of using valuable samples for library production. We describe here two new targeted approaches, using either classical data-dependent acquisition repositories (not specifically built for DIA) or ad hoc mouse spectral libraries, which enable the profiling of human brain DIA data set. The comparison of our results to both the most extended publically available human spectral library and to a state-of-the-art untargeted method supports the use of these new strategies to improve future DIA profiling efforts.
Collapse
Affiliation(s)
| | - Julia Sondermann
- Somatosensory Signaling and Systems Biology Research Group, Max Planck Institute of Experimental Medicine, Goettingen, Germany
| | - Chih-Chiang Tsou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | | | | | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Manuela Schmidt
- Somatosensory Signaling and Systems Biology Research Group, Max Planck Institute of Experimental Medicine, Goettingen, Germany
| | | | - David Gomez-Varela
- Somatosensory Signaling and Systems Biology Research Group, Max Planck Institute of Experimental Medicine, Goettingen, Germany
| |
Collapse
|
45
|
Metcalf JL, Xu ZZ, Bouslimani A, Dorrestein P, Carter DO, Knight R. Microbiome Tools for Forensic Science. Trends Biotechnol 2017; 35:814-823. [PMID: 28366290 DOI: 10.1016/j.tibtech.2017.03.006] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Revised: 03/08/2017] [Accepted: 03/09/2017] [Indexed: 01/28/2023]
Abstract
Microbes are present at every crime scene and have been used as physical evidence for over a century. Advances in DNA sequencing and computational approaches have led to recent breakthroughs in the use of microbiome approaches for forensic science, particularly in the areas of estimating postmortem intervals (PMIs), locating clandestine graves, and obtaining soil and skin trace evidence. Low-cost, high-throughput technologies allow us to accumulate molecular data quickly and to apply sophisticated machine-learning algorithms, building generalizable predictive models that will be useful in the criminal justice system. In particular, integrating microbiome and metabolomic data has excellent potential to advance microbial forensics.
Collapse
Affiliation(s)
- Jessica L Metcalf
- Department of Animal Sciences, Colorado State University, Fort Collins, CO 80523, USA.
| | - Zhenjiang Z Xu
- Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Amina Bouslimani
- Department of Pharmacology, University of California, San Diego, La Jolla, CA 92093, USA
| | - Pieter Dorrestein
- Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA 92093, USA; Department of Pharmacology, University of California, San Diego, La Jolla, CA 92093, USA; Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA; Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - David O Carter
- Laboratory of Forensic Taphonomy, Forensic Sciences Unit, Division of Natural Sciences and Mathematics, Chaminade University of Honolulu, Honolulu, HI 96816, USA
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA 92093, USA; Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
46
|
Molecular Networking As a Drug Discovery, Drug Metabolism, and Precision Medicine Strategy. Trends Pharmacol Sci 2017; 38:143-154. [DOI: 10.1016/j.tips.2016.10.011] [Citation(s) in RCA: 174] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Revised: 10/17/2016] [Accepted: 10/17/2016] [Indexed: 12/18/2022]
|
47
|
Lam MPY, Lau E, Ng DCM, Wang D, Ping P. Cardiovascular proteomics in the era of big data: experimental and computational advances. Clin Proteomics 2016; 13:23. [PMID: 27980500 PMCID: PMC5137214 DOI: 10.1186/s12014-016-9124-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2016] [Accepted: 08/24/2016] [Indexed: 01/14/2023] Open
Abstract
Proteomics plays an increasingly important role in our quest to understand cardiovascular biology. Fueled by analytical and computational advances in the past decade, proteomics applications can now go beyond merely inventorying protein species, and address sophisticated questions on cardiac physiology. The advent of massive mass spectrometry datasets has in turn led to increasing intersection between proteomics and big data science. Here we review new frontiers in technological developments and their applications to cardiovascular medicine. The impact of big data science on cardiovascular proteomics investigations and translation to medicine is highlighted.
Collapse
Affiliation(s)
- Maggie P Y Lam
- NIH BD2K Center of Excellence at UCLA; Department of Physiology, University of California at Los Angeles, 675 Charles E. Young Drive, Los Angeles, CA 90095 USA
| | - Edward Lau
- NIH BD2K Center of Excellence at UCLA; Department of Physiology, University of California at Los Angeles, 675 Charles E. Young Drive, Los Angeles, CA 90095 USA
| | - Dominic C M Ng
- NIH BD2K Center of Excellence at UCLA; Department of Physiology, University of California at Los Angeles, 675 Charles E. Young Drive, Los Angeles, CA 90095 USA
| | - Ding Wang
- NIH BD2K Center of Excellence at UCLA; Department of Physiology, University of California at Los Angeles, 675 Charles E. Young Drive, Los Angeles, CA 90095 USA
| | - Peipei Ping
- NIH BD2K Center of Excellence at UCLA; Department of Physiology, University of California at Los Angeles, 675 Charles E. Young Drive, Los Angeles, CA 90095 USA ; Department of Medicine, University of California at Los Angeles, 675 Charles E. Young Drive, Los Angeles, CA 90095 USA ; Department of Bioinformatics, University of California at Los Angeles, 675 Charles E. Young Drive, Los Angeles, CA 90095 USA
| |
Collapse
|
48
|
Abstract
Imagine a scenario where personal belongings such as pens, keys, phones, or handbags are found at an investigative site. It is often valuable to the investigative team that is trying to trace back the belongings to an individual to understand their personal habits, even when DNA evidence is also available. Here, we develop an approach to translate chemistries recovered from personal objects such as phones into a lifestyle sketch of the owner, using mass spectrometry and informatics approaches. Our results show that phones' chemistries reflect a personalized lifestyle profile. The collective repertoire of molecules found on these objects provides a sketch of the lifestyle of an individual by highlighting the type of hygiene/beauty products the person uses, diet, medical status, and even the location where this person may have been. These findings introduce an additional form of trace evidence from skin-associated lifestyle chemicals found on personal belongings. Such information could help a criminal investigator narrowing down the owner of an object found at a crime scene, such as a suspect or missing person.
Collapse
|
49
|
Ischenko D, Alexeev D, Shitikov E, Kanygina A, Malakhova M, Kostryukova E, Larin A, Kovalchuk S, Pobeguts O, Butenko I, Anikanov N, Altukhov I, Ilina E, Govorun V. Large scale analysis of amino acid substitutions in bacterial proteomics. BMC Bioinformatics 2016; 17:450. [PMID: 27821049 PMCID: PMC5100282 DOI: 10.1186/s12859-016-1301-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 10/21/2016] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Proteomics of bacterial pathogens is a developing field exploring microbial physiology, gene expression and the complex interactions between bacteria and their hosts. One of the complications in proteomic approach is micro- and macro-heterogeneity of bacterial species, which makes it impossible to build a comprehensive database of bacterial genomes for identification, while most of the existing algorithms rely largely on genomic data. RESULTS Here we present a large scale study of identification of single amino acid polymorphisms between bacterial strains. An ad hoc method was developed based on MS/MS spectra comparison without the support of a genomic database. Whole-genome sequencing was used to validate the accuracy of polymorphism detection. Several approaches presented earlier to the proteomics community as useful for polymorphism detection were tested on isolates of Helicobacter pylori, Neisseria gonorrhoeae and Escherichia coli. CONCLUSION The developed method represents a perspective approach in the field of bacterial proteomics allowing to identify hundreds of peptides with novel SAPs from a single proteome.
Collapse
Affiliation(s)
- Dmitry Ischenko
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation.
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation.
| | - Dmitry Alexeev
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation
| | - Egor Shitikov
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Alexandra Kanygina
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation
| | - Maja Malakhova
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Elena Kostryukova
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Andrey Larin
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Sergey Kovalchuk
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Olga Pobeguts
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Ivan Butenko
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Nikolay Anikanov
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Ilya Altukhov
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation
| | - Elena Ilina
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Vadim Govorun
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| |
Collapse
|
50
|
Griss J, Perez-Riverol Y, Lewis S, Tabb DL, Dianes JA, del-Toro N, Rurik M, Walzer MW, Kohlbacher O, Hermjakob H, Wang R, Vizcaíno JA. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat Methods 2016; 13:651-656. [PMID: 27493588 PMCID: PMC4968634 DOI: 10.1038/nmeth.3902] [Citation(s) in RCA: 114] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 05/24/2016] [Indexed: 12/13/2022]
Abstract
Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Steve Lewis
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - David L. Tabb
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville
| | - José A. Dianes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Noemi del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Marc Rurik
- Dept. of Computer Science, University of Tübingen, Germany
- Center for Bioinformatics, University of Tübingen, Germany
| | - Mathias W. Walzer
- Dept. of Computer Science, University of Tübingen, Germany
- Center for Bioinformatics, University of Tübingen, Germany
| | - Oliver Kohlbacher
- Dept. of Computer Science, University of Tübingen, Germany
- Center for Bioinformatics, University of Tübingen, Germany
- Quantitative Biology Center, University of Tübingen, Germany
- Max Planck Institute for Developmental Biology, Germany
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
- National Center for Protein Sciences, Beijing, China
| | - Rui Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|