1
|
Berger MT, Hemmler D, Diederich P, Rychlik M, Marshall JW, Schmitt-Kopplin P. Open Search of Peptide Glycation Products from Tandem Mass Spectra. Anal Chem 2022; 94:5953-5961. [PMID: 35389626 DOI: 10.1021/acs.analchem.2c00388] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Identification of chemically modified peptides in mass spectrometry (MS)-based glycation studies is a crucial yet challenging task. There is a need to establish a mode for matching tandem mass spectrometry (MS/MS) data, allowing for both known and unknown peptide glycation modifications. We present an open search approach that uses classic and modified peptide fragment ions. The latter are shifted by the mass delta of the modification. Both provide key structural information that can be used to assess the peptide core structure of the glycation product. We also leverage redundant neutral losses from the modification side chain, introducing a third ion class for matching referred to as characteristic fragment ions. We demonstrate that peptide glycation product MS/MS spectra contain multidimensional information and that most often, more than half of the spectral information is ignored if no attempt is made to use a multi-step matching algorithm. Compared to regular and/or modified peptide ion matching, our triple-ion strategy significantly increased the median interpretable fraction of the glycation product MS/MS spectra. For reference, we apply our approach for Amadori product characterization and identify all established diagnostic ions automatically. We further show how this method effectively applies the open search concept and allows for optimized elucidation of unknown structures by presenting two hitherto undescribed peptide glycation modifications with a delta mass of 102.0311 and 268.1768 Da. We characterize their fragmentation signature by integration with isotopically labeled glycation products, which provides high validity for non-targeted structure identification.
Collapse
Affiliation(s)
- Michelle T Berger
- Chair of Analytical Food Chemistry, Technical University Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, Germany.,Research Unit Analytical BioGeoChemistry (BGC), Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
| | - Daniel Hemmler
- Chair of Analytical Food Chemistry, Technical University Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, Germany.,Research Unit Analytical BioGeoChemistry (BGC), Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
| | - Philippe Diederich
- Research Unit Analytical BioGeoChemistry (BGC), Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
| | - Michael Rychlik
- Chair of Analytical Food Chemistry, Technical University Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, Germany
| | - James W Marshall
- The Waltham Petcare Science Institute, Mars Petcare UK, Waltham-on-the-Wolds, Leicestershire LE14 4RT, United Kingdom
| | - Philippe Schmitt-Kopplin
- Chair of Analytical Food Chemistry, Technical University Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, Germany.,Research Unit Analytical BioGeoChemistry (BGC), Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
| |
Collapse
|
2
|
To PKP, Wu L, Chan CM, Hoque A, Lam H. ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics. J Proteome Res 2021; 20:5359-5367. [PMID: 34734728 DOI: 10.1021/acs.jproteome.1c00485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Modern shotgun proteomics experiments generate gigabytes of spectra every hour, only a fraction of which were utilized to form biological conclusions. Instead of being stored as flat files in public data repositories, this large amount of data can be better organized to facilitate data reuse. Clustering these spectra by similarity can be helpful in building high-quality spectral libraries, correcting identification errors, and highlighting frequently observed but unidentified spectra. However, large-scale clustering is time-consuming. Here, we present ClusterSheep, a method utilizing Graphics Processing Units (GPUs) to accelerate the process. Unlike previously proposed algorithms for this purpose, our method performs true pairwise comparison of all spectra within a precursor mass-to-charge ratio tolerance, thereby preserving the full cluster structures. ClusterSheep was benchmarked against previously reported clustering tools, MS-Cluster, MaRaCluster, and msCRUSH. The software tool also functions as an interactive visualization tool with a persistent state, enabling the user to explore the resulting clusters visually and retrieve the clustering results as desired.
Collapse
Affiliation(s)
- Paul Ka Po To
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Long Wu
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Chak Ming Chan
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Ayman Hoque
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| |
Collapse
|
3
|
Permiakova O, Guibert R, Kraut A, Fortin T, Hesse AM, Burger T. CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis. BMC Bioinformatics 2021; 22:68. [PMID: 33579189 PMCID: PMC7881590 DOI: 10.1186/s12859-021-03969-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 01/14/2021] [Indexed: 11/16/2022] Open
Abstract
Background The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms. Results We propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles. Conclusions Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data.
Collapse
Affiliation(s)
- Olga Permiakova
- Univ. Grenoble Alpes, CEA, Inserm, BGE U1038, 38000, Grenoble, France
| | - Romain Guibert
- Univ. Grenoble Alpes, CEA, Inserm, BGE U1038, 38000, Grenoble, France
| | - Alexandra Kraut
- Univ. Grenoble Alpes, CEA, Inserm, BGE U1038, 38000, Grenoble, France
| | - Thomas Fortin
- Univ. Grenoble Alpes, CEA, Inserm, BGE U1038, 38000, Grenoble, France
| | - Anne-Marie Hesse
- Univ. Grenoble Alpes, CEA, Inserm, BGE U1038, 38000, Grenoble, France
| | - Thomas Burger
- Univ. Grenoble Alpes, CNRS, CEA, Inserm, BGE U1038, 38000, Grenoble, France.
| |
Collapse
|
4
|
Abstract
Scoring functions that assess spectrum similarity play a crucial role in many computational mass spectrometry algorithms. These functions are used to compare an experimentally acquired fragmentation (MS/MS) spectrum against two different types of target MS/MS spectra: either against a theoretical MS/MS spectrum derived from a peptide from a sequence database, or against another, previously acquired MS/MS spectrum. The former is typically encountered in database searching, while the latter is used in spectrum clustering and spectral library searching. The comparison between acquired versus theoretical MS/MS spectra is most commonly performed using cross-correlations or probability derived scoring functions, while the comparison of two acquired MS/MS spectra typically makes use of a normalized dot product, especially in spectrum library search algorithms. In addition to these scoring functions, Pearson's or Spearman's correlation coefficients, mean squared error, or median absolute deviation scores can also be used for the same purpose. Here, we describe and evaluate these scoring functions with regards to their ability to assess spectrum similarity for theoretical versus acquired, and acquired versus acquired spectra.
Collapse
Affiliation(s)
- Şule Yilmaz
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Elien Vandermarliere
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Lennart Martens
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium.
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium.
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium.
| |
Collapse
|
5
|
The M, Käll L. MaRaCluster: A Fragment Rarity Metric for Clustering Fragment Spectra in Shotgun Proteomics. J Proteome Res 2016; 15:713-20. [PMID: 26653874 DOI: 10.1021/acs.jproteome.5b00749] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Shotgun proteomics experiments generate large amounts of fragment spectra as primary data, normally with high redundancy between and within experiments. Here, we have devised a clustering technique to identify fragment spectra stemming from the same species of peptide. This is a powerful alternative method to traditional search engines for analyzing spectra, specifically useful for larger scale mass spectrometry studies. As an aid in this process, we propose a distance calculation relying on the rarity of experimental fragment peaks, following the intuition that peaks shared by only a few spectra offer more evidence than peaks shared by a large number of spectra. We used this distance calculation and a complete-linkage scheme to cluster data from a recent large-scale mass spectrometry-based study. The clusterings produced by our method have up to 40% more identified peptides for their consensus spectra compared to those produced by the previous state-of-the-art method. We see that our method would advance the construction of spectral libraries as well as serve as a tool for mining large sets of fragment spectra. The source code and Ubuntu binary packages are available at https://github.com/statisticalbiotechnology/maracluster (under an Apache 2.0 license).
Collapse
Affiliation(s)
- Matthew The
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH , Box 1031, 17121 Solna, Sweden
| | - Lukas Käll
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH , Box 1031, 17121 Solna, Sweden
| |
Collapse
|
6
|
Vaudel M, Verheggen K, Csordas A, Raeder H, Berven FS, Martens L, Vizcaíno JA, Barsnes H. Exploring the potential of public proteomics data. Proteomics 2016; 16:214-25. [PMID: 26449181 PMCID: PMC4738454 DOI: 10.1002/pmic.201500295] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 08/25/2015] [Accepted: 09/28/2015] [Indexed: 12/22/2022]
Abstract
In a global effort for scientific transparency, it has become feasible and good practice to share experimental data supporting novel findings. Consequently, the amount of publicly available MS-based proteomics data has grown substantially in recent years. With some notable exceptions, this extensive material has however largely been left untouched. The time has now come for the proteomics community to utilize this potential gold mine for new discoveries, and uncover its untapped potential. In this review, we provide a brief history of the sharing of proteomics data, showing ways in which publicly available proteomics data are already being (re-)used, and outline potential future opportunities based on four different usage types: use, reuse, reprocess, and repurpose. We thus aim to assist the proteomics community in stepping up to the challenge, and to make the most of the rapidly increasing amount of public proteomics data.
Collapse
Affiliation(s)
- Marc Vaudel
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Kenneth Verheggen
- Medical Biotechnology Center, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Attila Csordas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Helge Raeder
- Department of Clinical Science, KG Jebsen Center for Diabetes Research, University of Bergen, Bergen, Norway
| | - Frode S Berven
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Department of Clinical Medicine, KG Jebsen Centre for Multiple Sclerosis Research, University of Bergen, Bergen, Norway
| | - Lennart Martens
- Medical Biotechnology Center, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Juan A Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Department of Clinical Science, KG Jebsen Center for Diabetes Research, University of Bergen, Bergen, Norway
| |
Collapse
|
7
|
Saeed F, Hoffert JD, Knepper MA. CAMS-RS: Clustering Algorithm for Large-Scale Mass Spectrometry Data Using Restricted Search Space and Intelligent Random Sampling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:128-41. [PMID: 26355513 PMCID: PMC6143137 DOI: 10.1109/tcbb.2013.152] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
High-throughput mass spectrometers can produce massive amounts of redundant data at an astonishing rate with many of them having poor signal-to-noise (S/N) ratio. These low S/N ratio spectra may not get interpreted using conventional spectra-to-database matching techniques. In this paper, we present an efficient algorithm, CAMS-RS (Clustering Algorithm for Mass Spectra using Restricted Space and Sampling) for clustering of raw mass spectrometry data. CAMS-RS utilizes a novel metric (called F-set) that exploits the temporal and spatial patterns to accurately assess similarity between two given spectra. The F-set similarity metric is independent of the retention time and allows clustering of mass spectrometry data from independent LC-MS/MS runs. A novel restricted search space strategy is devised to limit the comparisons of the number of spectra. An intelligent sampling method is executed on individual bins that allow merging of the results to make the final clusters. Our experiments, using experimentally generated data sets, show that the proposed algorithm is able to cluster spectra with high accuracy and is helpful in interpreting low S/N ratio spectra. The CAMS-RS algorithm is highly scalable with increasing number of spectra and our implementation allows clustering of up to a million spectra within minutes.
Collapse
|
8
|
Muth T, Benndorf D, Reichl U, Rapp E, Martens L. Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. MOLECULAR BIOSYSTEMS 2013; 9:578-85. [PMID: 23238088 DOI: 10.1039/c2mb25415h] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In the past years the integral study of microbial communities of varying complexity has gained increasing research interest. Mass spectrometry-driven metaproteomics enables the analysis of such communities on the functional level, but this fledgling field still faces various technical and semantic challenges regarding experimental data analysis and interpretation. In the present review, we outline the hurdles involved and attempt to cover the most valuable methods and software implementations available to researchers in the field today. Beyond merely focusing on protein identification, we provide an overview on different data pre- and post-processing steps, such as metabolic pathway analysis, that can be useful in a typical metaproteomics workflow. Finally, we briefly discuss directions for future work.
Collapse
Affiliation(s)
- Thilo Muth
- Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, Magdeburg, Germany
| | | | | | | | | |
Collapse
|
9
|
Armengaud J, Hartmann EM, Bland C. Proteogenomics for environmental microbiology. Proteomics 2013; 13:2731-42. [PMID: 23636904 DOI: 10.1002/pmic.201200576] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 03/06/2013] [Accepted: 04/09/2013] [Indexed: 11/09/2022]
Abstract
Proteogenomics sensu stricto refers to the use of proteomic data to refine the annotation of genomes from model organisms. Because of the limitations of automatic annotation pipelines, a relatively high number of errors occur during the structural annotation of genes coding for proteins. Whether putative orphan sequences or short genes encoding low-molecular-weight proteins really exist is still frequently a mystery. Whether start codons are well defined is also an open debate. These problems are exacerbated for genomes of microorganisms belonging to poorly documented genera, as related sequences are not always available for homology-guided annotation. The functional annotation of a significant proportion of genes is also another well-known issue when annotating environmental microorganisms. High-throughput shotgun proteomics has recently greatly evolved, allowing the exploration of the proteome from any microorganism at an unprecedented depth. The structural and functional annotation process may be usefully complemented with experimental data. Indeed, proteogenomic mapping has been successfully performed for a wide variety of organisms. Specific approaches devoted to systematically establishing the N-termini of a large set of proteins are being developed. N-terminomics is giving rise to datasets of experimentally proven translational start codons as well as validated peptide signals for secreted proteins. By extension, combining genomic and proteomic data is becoming routine in many research projects. The proteomic analysis of organisms with unfinished genome sequences, the so-called composite proteomics, and the search for microbial biomarkers by bottom-up and top-down combined approaches are some examples of proteogenomic-flavored studies. They illustrate the advent of a new era of environmental microbiology where proteomics and genomics are intimately integrated to answer key biological questions.
Collapse
Affiliation(s)
- Jean Armengaud
- CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze, France
| | | | | |
Collapse
|
10
|
Shao W, Lam H. Denoising Peptide Tandem Mass Spectra for Spectral Libraries: A Bayesian Approach. J Proteome Res 2013; 12:3223-32. [DOI: 10.1021/pr400080b] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Affiliation(s)
- Wenguang Shao
- Division
of Biomedical Engineering, and ‡Department of Chemical and Biomolecular Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Henry Lam
- Division
of Biomedical Engineering, and ‡Department of Chemical and Biomolecular Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| |
Collapse
|
11
|
Menschaert G, Hayakawa E, Schoofs L, Van Criekinge W, Baggerman G. Spectral Clustering in Peptidomics Studies Allows Homology Searching and Modification Profiling: HomClus, a Versatile Tool. J Proteome Res 2012; 11:2774-85. [DOI: 10.1021/pr201114m] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Gerben Menschaert
- Faculty of Bioscience Engineering,
Laboratory for Bioinformatics and Computational Genomics, Ghent University, Ghent, Belgium
- Prometa, Interfaculty Center for Proteomics
and Metabolomics, K.U. Leuven, Leuven,
Belgium
| | - Eisuke Hayakawa
- Prometa, Interfaculty Center for Proteomics
and Metabolomics, K.U. Leuven, Leuven,
Belgium
- Research Group of
Functional Genomics and Proteomics, K.U. Leuven, 3000 Leuven, Belgium
| | - Liliane Schoofs
- Research Group of
Functional Genomics and Proteomics, K.U. Leuven, 3000 Leuven, Belgium
| | - Wim Van Criekinge
- Faculty of Bioscience Engineering,
Laboratory for Bioinformatics and Computational Genomics, Ghent University, Ghent, Belgium
| | - Geert Baggerman
- VITO Nv, 2400 Mol, Belgium
- CFP, Center for Proteomics, 2020 Antwerpen, Belgium
| |
Collapse
|
12
|
Frank AM, Monroe ME, Shah AR, Carver JJ, Bandeira N, Moore RJ, Anderson GA, Smith RD, Pevzner PA. Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra. Nat Methods 2011; 8:587-91. [PMID: 21572408 PMCID: PMC3128193 DOI: 10.1038/nmeth.1609] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2010] [Accepted: 04/13/2011] [Indexed: 11/09/2022]
Abstract
Tandem mass spectrometry (MS/MS) experiments yield multiple, nearly identical spectra of the same peptide in various laboratories, but proteomics researchers typically do not leverage the unidentified spectra produced in other labs to decode spectra they generate. We propose a spectral archives approach that clusters MS/MS datasets, representing similar spectra by a single consensus spectrum. Spectral archives extend spectral libraries by analyzing both identified and unidentified spectra in the same way and maintaining information about peptide spectra that are common across species and conditions. Thus archives offer both traditional library spectrum similarity-based search capabilities along with new ways to analyze the data. By developing a clustering tool, MS-Cluster, we generated a spectral archive from ∼1.18 billion spectra that greatly exceeds the size of existing spectral repositories. We advocate that publicly available data should be organized into spectral archives rather than be analyzed as disparate datasets, as is mostly the case today.
Collapse
Affiliation(s)
- Ari M Frank
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Barsnes H, Eidhammer I, Martens L. A global analysis of peptide fragmentation variability. Proteomics 2011; 11:1181-8. [PMID: 21328539 DOI: 10.1002/pmic.201000640] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2010] [Revised: 11/25/2010] [Accepted: 11/29/2010] [Indexed: 11/08/2022]
Abstract
Understanding the fragmentation process in MS/MS experiments is vital when trying to validate the results of such experiments, and one way of improving our understanding is to analyze existing data. We here present our findings from an analysis of a large and diverse data set of MS/MS-based peptide identifications, in which each peptide has been identified from multiple spectra, recorded on two commonly used types of electrospray instruments. By analyzing these data we were able to study fragmentation variability on three levels: (i) variation in detection rates and intensities for fragment ions from the same peptide sequence measured multiple times on a single instrument; (ii) consistency of rank-based fragmentation patterns; and (iii) a set of general observations on fragment ion occurrence in MS/MS experiments, regardless of sequence. Our results confirm that substantial variation can be found at all levels, even when high-quality identifications are used and the experimental conditions as well as the peptide sequences are kept constant. Finally, we discuss the observed variability in light of ongoing efforts to create spectral libraries and predictive software for target selection in targeted proteomics.
Collapse
Affiliation(s)
- Harald Barsnes
- Department of Informatics, University of Bergen, Bergen, Norway
| | | | | |
Collapse
|
14
|
Degroeve S, Colaert N, Vandekerckhove J, Gevaert K, Martens L. A reproducibility-based evaluation procedure for quantifying the differences between MS/MS peak intensity normalization methods. Proteomics 2011; 11:1172-80. [DOI: 10.1002/pmic.201000605] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2010] [Revised: 11/29/2010] [Accepted: 12/05/2010] [Indexed: 11/07/2022]
|
15
|
Barsnes H, Eidhammer I, Martens L. FragmentationAnalyzer: an open-source tool to analyze MS/MS fragmentation data. Proteomics 2010; 10:1087-90. [PMID: 20049869 DOI: 10.1002/pmic.200900681] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
A thorough understanding of the fragmentation processes in MS/MS can be a powerful tool in assessing the resulting peptide and protein identifications. We here present the freely available, open-source FragmentationAnalyzer tool (http://fragmentation-analyzer.googlecode.com) that makes it straightforward to analyze large MS/MS data sets for specific types of identified peptides, using a common set of peptide properties. This enables the detection of fragmentation pattern nuances related to specific instruments or due to the presence of post-translational modifications.
Collapse
Affiliation(s)
- Harald Barsnes
- Department of Informatics, University of Bergen, Bergen, Norway.
| | | | | |
Collapse
|
16
|
Vandenborre G, Van Damme EJM, Ghesquière B, Menschaert G, Hamshou M, Rao RN, Gevaert K, Smagghe G. Glycosylation Signatures in Drosophila: Fishing with Lectins. J Proteome Res 2010; 9:3235-42. [DOI: 10.1021/pr1001753] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Gianni Vandenborre
- Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium, Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium, and Laboratory for Bioinformatics and
| | - Els J. M. Van Damme
- Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium, Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium, and Laboratory for Bioinformatics and
| | - Bart Ghesquière
- Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium, Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium, and Laboratory for Bioinformatics and
| | - Gerben Menschaert
- Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium, Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium, and Laboratory for Bioinformatics and
| | - Mohamad Hamshou
- Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium, Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium, and Laboratory for Bioinformatics and
| | - Rameshwaram Nagender Rao
- Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium, Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium, and Laboratory for Bioinformatics and
| | - Kris Gevaert
- Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium, Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium, and Laboratory for Bioinformatics and
| | - Guy Smagghe
- Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Laboratory of Biochemistry and Glycobiology, Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, B-9000 Ghent, Belgium, Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium, Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium, and Laboratory for Bioinformatics and
| |
Collapse
|
17
|
Tharakan R, Edwards N, Graham DRM. Data maximization by multipass analysis of protein mass spectra. Proteomics 2010; 10:1160-71. [DOI: 10.1002/pmic.200900433] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
18
|
Salmi J, Nyman TA, Nevalainen OS, Aittokallio T. Filtering strategies for improving protein identification in high-throughput MS/MS studies. Proteomics 2009; 9:848-60. [PMID: 19160393 DOI: 10.1002/pmic.200800517] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Despite the recent advances in streamlining high-throughput proteomic pipelines using tandem mass spectrometry (MS/MS), reliable identification of peptides and proteins on a larger scale has remained a challenging task, still involving a considerable degree of user interaction. Recently, a number of papers have proposed computational strategies both for distinguishing poor MS/MS spectra prior to database search (pre-filtering) as well as for verifying the peptide identifications made by the search programs (post-filtering). Both of these filtering approaches can be very beneficial to the overall protein identification pipeline, since they can remove a substantial part of the time consuming manual validation work and convert large sets of MS/MS spectra into more reliable and interpretable proteome information. The choice of the filtering method depends both on the properties of the data and on the goals of the experiment. This review discusses the different pre- and post-filtering strategies available to the researchers, together with their relative merits and potential pitfalls. We also highlight some additional research topics, such as spectral denoising and statistical assessment of the identification results, which aim at further improving the coverage and accuracy of high-throughput protein identification studies.
Collapse
Affiliation(s)
- Jussi Salmi
- Department of Information Technology, University of Turku, Turku, Finland.
| | | | | | | |
Collapse
|
19
|
Lam H, Deutsch EW, Eddes JS, Eng JK, Stein SE, Aebersold R. Building consensus spectral libraries for peptide identification in proteomics. Nat Methods 2008; 5:873-5. [PMID: 18806791 PMCID: PMC2637392 DOI: 10.1038/nmeth.1254] [Citation(s) in RCA: 209] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2008] [Accepted: 08/26/2008] [Indexed: 11/09/2022]
Abstract
Spectral searching has drawn increasing interest as an alternative to sequence-database searching in proteomics. We developed and validated an open-source software toolkit, SpectraST, to enable proteomics researchers to build spectral libraries and to integrate this promising approach in their data-analysis pipeline. It allows individual researchers to condense raw data into spectral libraries, summarizing information about observed proteomes into a concise and retrievable format for future data analyses.
Collapse
Affiliation(s)
- Henry Lam
- Institute for Systems Biology, Seattle, Washington 98103, USA.
| | | | | | | | | | | |
Collapse
|
20
|
Tharakan R, Martens L, Van Eyk JE, Graham DR. OMSSAGUI: An open-source user interface component to configure and run the OMSSA search engine. Proteomics 2008; 8:2376-8. [PMID: 18563730 DOI: 10.1002/pmic.200701126] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We here present a user-friendly and extremely lightweight tool that can serve as a stand-alone front-end for the Open MS Search Algorithm (OMSSA) search engine, or that can directly be used as part of an informatics processing pipeline for MS driven proteomics. The OMSSA graphical user interface (OMSSAGUI) tool is written in Java, and is supported on Windows, Linux, and OSX platforms. It is an open source under the Apache 2 license and can be downloaded from http://code.google.com/p/mass-spec-gui/.
Collapse
Affiliation(s)
- Ravi Tharakan
- Department of Medicine, Bayview NHLBI Proteomics Center, Johns Hopkins School of Medicine Bayview Campus, Baltimore, MD, USA
| | | | | | | |
Collapse
|