1
|
Yang R, Ma J, Zhang S, Zheng Y, Wang L, Zhu D. mzMD: visualization-oriented MS data storage and retrieval. Bioinformatics 2022; 38:2333-2340. [PMID: 35171986 DOI: 10.1093/bioinformatics/btac098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 01/23/2022] [Accepted: 02/14/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Drawing peaks in a data window of an MS dataset happens at all time in MS data visualization applications. This asks to retrieve from an MS dataset some selected peaks in a data window whose image in a display window reflects the visual feature of all peaks in the data window. If an algorithm for this purpose is asked to output high-quality solutions in real time, then the most fundamental dependence of it is on the storage format of the MS dataset. RESULTS We present mzMD, a new storage format of MS datasets and an algorithm to query this format of a storage system for a summary (a set of selected representative peaks) of a given data window. We propose a criterion Q-score to examine the quality of data window summaries. Experimental statistics on real MS datasets verified the high speed of mzMD in retrieving high-quality data window summaries. mzMD reported summaries of data windows whose Q-score outperforms those mzTree reported. The query speed of mzMD is the same as that of mzTree whereas its query speed stability is better than that of mzTree. AVAILABILITY AND IMPLEMENTATION The source code is freely available at https://github.com/yrm9837/mzMD-java. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Runmin Yang
- School of Computer Science and Technology, Shandong University, Qingdao 266237, China
| | - Jingjing Ma
- School of Computer Science and Technology, Shandong University, Qingdao 266237, China
| | - Shu Zhang
- School of Computer Science and Technology, Shandong University, Qingdao 266237, China
| | - Yu Zheng
- School of Computer Science and Technology, Shandong University, Qingdao 266237, China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong, China.,City University of Hong Kong Shenzhen Research Institute, Shenzhen 518057, China
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, Qingdao 266237, China
| |
Collapse
|
2
|
Bouyssié D, Hesse AM, Mouton-Barbosa E, Rompais M, Macron C, Carapito C, Gonzalez de Peredo A, Couté Y, Dupierris V, Burel A, Menetrey JP, Kalaitzakis A, Poisat J, Romdhani A, Burlet-Schiltz O, Cianférani S, Garin J, Bruley C. Proline: an efficient and user-friendly software suite for large-scale proteomics. Bioinformatics 2020; 36:3148-3155. [PMID: 32096818 PMCID: PMC7214047 DOI: 10.1093/bioinformatics/btaa118] [Citation(s) in RCA: 112] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 01/10/2020] [Accepted: 02/18/2020] [Indexed: 11/30/2022] Open
Abstract
Motivation The proteomics field requires the production and publication of reliable mass spectrometry-based identification and quantification results. Although many tools or algorithms exist, very few consider the importance of combining, in a unique software environment, efficient processing algorithms and a data management system to process and curate hundreds of datasets associated with a single proteomics study. Results Here, we present Proline, a robust software suite for analysis of MS-based proteomics data, which collects, processes and allows visualization and publication of proteomics datasets. We illustrate its ease of use for various steps in the validation and quantification workflow, its data curation capabilities and its computational efficiency. The DDA label-free quantification workflow efficiency was assessed by comparing results obtained with Proline to those obtained with a widely used software using a spiked-in sample. This assessment demonstrated Proline’s ability to provide high quantification accuracy in a user-friendly interface for datasets of any size. Availability and implementation Proline is available for Windows and Linux under CECILL open-source license. It can be deployed in client–server mode or in standalone mode at http://proline.profiproteomics.fr/#downloads. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Bouyssié
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Anne-Marie Hesse
- Université Grenoble Alpes, Inserm, CEA, IRIG, BGE, Grenoble 38000, France
| | - Emmanuelle Mouton-Barbosa
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Magali Rompais
- Laboratoire de Spectrométrie de Masse BioOrganique, Université de Strasbourg, CNRS, IPHC, Strasbourg 67087, UMR 7178, France
| | - Charlotte Macron
- Laboratoire de Spectrométrie de Masse BioOrganique, Université de Strasbourg, CNRS, IPHC, Strasbourg 67087, UMR 7178, France
| | - Christine Carapito
- Laboratoire de Spectrométrie de Masse BioOrganique, Université de Strasbourg, CNRS, IPHC, Strasbourg 67087, UMR 7178, France
| | - Anne Gonzalez de Peredo
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Yohann Couté
- Université Grenoble Alpes, Inserm, CEA, IRIG, BGE, Grenoble 38000, France
| | | | - Alexandre Burel
- Laboratoire de Spectrométrie de Masse BioOrganique, Université de Strasbourg, CNRS, IPHC, Strasbourg 67087, UMR 7178, France
| | | | - Andrea Kalaitzakis
- Université Grenoble Alpes, Inserm, CEA, IRIG, BGE, Grenoble 38000, France
| | - Julie Poisat
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Aymen Romdhani
- Laboratoire de Spectrométrie de Masse BioOrganique, Université de Strasbourg, CNRS, IPHC, Strasbourg 67087, UMR 7178, France
| | - Odile Burlet-Schiltz
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Sarah Cianférani
- Laboratoire de Spectrométrie de Masse BioOrganique, Université de Strasbourg, CNRS, IPHC, Strasbourg 67087, UMR 7178, France
| | - Jerome Garin
- Université Grenoble Alpes, Inserm, CEA, IRIG, BGE, Grenoble 38000, France
| | - Christophe Bruley
- Université Grenoble Alpes, Inserm, CEA, IRIG, BGE, Grenoble 38000, France
| |
Collapse
|
3
|
Henning J, Smith R. A web-based system for creating, viewing, and editing precursor mass spectrometry ground truth data. BMC Bioinformatics 2020; 21:418. [PMID: 32972355 PMCID: PMC7517820 DOI: 10.1186/s12859-020-03752-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 09/15/2020] [Indexed: 11/12/2022] Open
Abstract
Background Mass spectrometry (MS) uses mass-to-charge ratios of measured particles to decode the identities and quantities of molecules in a sample. Interpretation of raw MS depends upon data processing algorithms that render it human-interpretable. Quantitative MS workflows are complex experimental chains and it is crucial to know the performance and bias of each data processing method as they impact accuracy, coverage, and statistical significance of the result. Creation of the ground truth necessary for quantitatively evaluating MS1-aware algorithms is difficult and tedious task, and better software for creating such datasets would facilitate more extensive evaluation and improvement of MS data processing algorithms. Results We present JS-MS 2.0, a software suite that provides a dependency-free, browser-based, one click, cross-platform solution for creating MS1 ground truth. The software retains the first version’s capacity for loading, viewing, and navigating MS1 data in 2- and 3-D, and adds tools for capturing, editing, saving, and viewing isotopic envelope and extracted isotopic chromatogram features. The software can also be used to view and explore the results of feature finding algorithms. Conclusions JS-MS 2.0 enables faster creation and inspection of MS1 ground truth data. It is publicly available with an MIT license at github.com/optimusmoose/jsms.
Collapse
Affiliation(s)
- Jessica Henning
- Department of Computer Science, University of Montana, 32 Campus Drive, Missoula, MT, 59812, USA
| | - Rob Smith
- Department of Computer Science, University of Montana, 32 Campus Drive, Missoula, MT, 59812, USA.
| |
Collapse
|
4
|
Tully B. Toffee - a highly efficient, lossless file format for DIA-MS. Sci Rep 2020; 10:8939. [PMID: 32488104 PMCID: PMC7265431 DOI: 10.1038/s41598-020-65015-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 04/17/2020] [Indexed: 11/09/2022] Open
Abstract
The closed nature of vendor file formats in mass spectrometry is a significant barrier to progress in developing robust bioinformatics software. In response, the community has developed the open mzML format, implemented in XML and based on controlled vocabularies. Widely adopted, mzML is an important step forward; however, it suffers from two challenges that are particularly apparent as the field moves to high-throughput proteomics: large increase in file size, and a largely sequential I/O access pattern. Described here is 'toffee', an open, random I/O format backed by HDF5, with lossless compression that gives file sizes similar to the original vendor format and can be reconverted back to mzML without penalty. It is shown that mzML and toffee are equivalent when processing data using OpenSWATH algorithms, in additional to novel applications that are enabled by new data access patterns. For instance, a peptide-centric deep-learning pipeline for peptide identification is proposed. Documentation and examples are available at https://toffee.readthedocs.io, and all code is MIT licensed at https://bitbucket.org/cmriprocan/toffee.
Collapse
Affiliation(s)
- Brett Tully
- The ACRF International Centre for the Proteome of Human Cancer (ProCan), Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia.
| |
Collapse
|
5
|
Deutsch EW, Perez-Riverol Y, Chalkley RJ, Wilhelm M, Tate S, Sachsenberg T, Walzer M, Käll L, Delanghe B, Böcker S, Schymanski EL, Wilmes P, Dorfer V, Kuster B, Volders PJ, Jehmlich N, Vissers JP, Wolan DW, Wang AY, Mendoza L, Shofstahl J, Dowsey AW, Griss J, Salek RM, Neumann S, Binz PA, Lam H, Vizcaíno JA, Bandeira N, Röst H. Expanding the Use of Spectral Libraries in Proteomics. J Proteome Res 2018; 17:4051-4060. [PMID: 30270626 PMCID: PMC6443480 DOI: 10.1021/acs.jproteome.8b00485] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Robert J. Chalkley
- University of California San Francisco, San Francisco, 94158, California, United States
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
| | | | - Timo Sachsenberg
- Department of Computer Science, Center for Bioinformatics, University of Tübingen, Sand 14, Tübingen, 72076, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH − Royal Institute of Technology, Stockholm 114 28, Sweden
| | - Bernard Delanghe
- Thermo Fisher Scientific Bremen, Hanna-Kunath Str. 11, 28199 Bremen, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Viktoria Dorfer
- University of Applied Sciences Upper Austria, Bioinformatics Research Group, Hagenberg, 4232, Austria
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
- Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich, Freising, 85354, Germany
| | | | - Nico Jehmlich
- Helmholtz-Centre for Environmental Research - UFZ, Leipzig, Germany
| | | | - Dennis W. Wolan
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Ana Y. Wang
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Jim Shofstahl
- Thermo Fisher Scientific, 355 River Oaks Parkway San Jose, CA 95134
| | - Andrew W. Dowsey
- Department of Population Health Sciences and Bristol Veterinary School, Faculty of Health Sciences, University of Bristol, Bristol BS9 1BN, UK
| | - Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Währinger Gürtel 18-20, Vienna 1090, Austria
| | - Reza M. Salek
- The International Agency for Research on Cancer (IARC), 150 Cours Albert Thomas, 69372 Lyon CEDEX 08, France
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Department of Stress and Developmental Biology, 06120 Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Pierre-Alain Binz
- Clinical Chemistry Service, Centre Hospitalier Universitaire Vaudois, 1011 Lausanne, Switzerland
| | - Henry Lam
- Department of Chemical and Biological Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 92093-0404, USA
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, 160 College St., Toronto, ON, M5S 3E1, Canada
| |
Collapse
|
6
|
Rosen J, Handy K, Gillan A, Smith R. JS-MS: a cross-platform, modular javascript viewer for mass spectrometry signals. BMC Bioinformatics 2017; 18:469. [PMID: 29110634 PMCID: PMC5674804 DOI: 10.1186/s12859-017-1883-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 10/26/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Despite the ubiquity of mass spectrometry (MS), data processing tools can be surprisingly limited. To date, there is no stand-alone, cross-platform 3-D visualizer for MS data. Available visualization toolkits require large libraries with multiple dependencies and are not well suited for custom MS data processing modules, such as MS storage systems or data processing algorithms. RESULTS We present JS-MS, a 3-D, modular JavaScript client application for viewing MS data. JS-MS provides several advantages over existing MS viewers, such as a dependency-free, browser-based, one click, cross-platform install and better navigation interfaces. The client includes a modular Java backend with a novel streaming.mzML parser to demonstrate the API-based serving of MS data to the viewer. CONCLUSIONS JS-MS enables custom MS data processing and evaluation by providing fast, 3-D visualization using improved navigation without dependencies. JS-MS is publicly available with a GPLv2 license at github.com/optimusmoose/jsms.
Collapse
Affiliation(s)
- Jebediah Rosen
- Department of Computer Science, University of Montana, 32 Campus Drive, Missoula, 59812, MT, USA
| | - Kyle Handy
- Department of Computer Science, University of Montana, 32 Campus Drive, Missoula, 59812, MT, USA
| | - André Gillan
- Department of Computer Science, University of Montana, 32 Campus Drive, Missoula, 59812, MT, USA
| | - Rob Smith
- Department of Computer Science, University of Montana, 32 Campus Drive, Missoula, 59812, MT, USA.
| |
Collapse
|