201
|
Giese SH, Zickmann F, Renard BY. Detection of Unknown Amino Acid Substitutions Using Error-Tolerant Database Search. Methods Mol Biol 2016; 1362:247-264. [PMID: 26519182 DOI: 10.1007/978-1-4939-3106-4_16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Recent studies have demonstrated that mass spectrometry-based variant detection is feasible. Typically, either genomic variant databases or transcript data are used to construct customized target databases for the identification of single-amino acid variants in mass spectrometry data. However, both approaches require additional data to perform the identification of SAAVs. Here, we discuss the application of an error-tolerant peptide search engine such as BICEPS for identifying variants exclusively based on standard Uniprot databases. Thereby, unnecessary and redundant extensions of the search space are avoided. The workflow provides an unbiased view on the data; the search space is not limited to known variants and simultaneously does not require additional data. In a subsequent step a second identification search is performed to verify the initially identified variant peptides and aggregate information on the protein level.
Collapse
Affiliation(s)
- Sven H Giese
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Nordufer 20, 13353, Berlin, Germany
- Department of Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355, Berlin, Germany
- Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JR, UK
| | - Franziska Zickmann
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Nordufer 20, 13353, Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Nordufer 20, 13353, Berlin, Germany.
| |
Collapse
|
202
|
Database Search Engines: Paradigms, Challenges and Solutions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 919:147-156. [PMID: 27975215 DOI: 10.1007/978-3-319-41448-5_6] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.
Collapse
|
203
|
Abstract
With the advancement in proteomics separation techniques and improvements in mass analyzers, the data generated in a mass-spectrometry based proteomics experiment is rising exponentially. Such voluminous datasets necessitate automated computational tools for high-throughput data analysis and appropriate statistical control. The data is searched using one or more of the several popular database search algorithms. The matches assigned by these tools can have false positives and statistical validation of these false matches is necessary before making any biological interpretations. Without such procedures, the biological inferences do not hold true and may be outright misleading. There is a considerable overlap between true and false positives. To control the false positives amongst a set of accepted matches, there is a need for some statistical estimate that can reflect the amount of false positives present in the data processed. False discovery rate (FDR) is the metric for global confidence assessment of a large-scale proteomics dataset. This chapter covers the basics of FDR, its application in proteomics, and methods to estimate FDR.
Collapse
Affiliation(s)
- Suruchi Aggarwal
- Immunology Group, International Centre for Genetic Engineering and Biotechnology, ICGEB Campus, Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Amit Kumar Yadav
- Drug Discovery Research Center (DDRC), Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, 122001, Haryana, India.
| |
Collapse
|
204
|
Lereim RR, Oveland E, Berven FS, Vaudel M, Barsnes H. Visualization, Inspection and Interpretation of Shotgun Proteomics Identification Results. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 919:227-235. [DOI: 10.1007/978-3-319-41448-5_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
205
|
Luge T, Sauer S. Generating Sample-Specific Databases for Mass Spectrometry-Based Proteomic Analysis by Using RNA Sequencing. Methods Mol Biol 2016; 1394:219-232. [PMID: 26700052 DOI: 10.1007/978-1-4939-3341-9_16] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Mass spectrometry-based methods allow for the direct, comprehensive analysis of expressed proteins and their quantification among different conditions. However, in general identification of proteins by assigning experimental mass spectra to peptide sequences of proteins relies on matching mass spectra to theoretical spectra derived from genomic databases of organisms. This conventional approach limits the applicability of proteomic methodologies to species for which a genome reference sequence is available. Recently, RNA-sequencing (RNA-Seq) became a valuable tool to overcome this limitation by de novo construction of databases for organisms for which no DNA sequence is available, or by refining existing genomic databases with transcriptomic data. Here we present a generic pipeline to make use of transcriptomic data for proteomics experiments. We show in particular how to efficiently fuel proteomic analysis workflows with sample-specific RNA-sequencing databases. This approach is useful for the proteomic analysis of so far unsequenced organisms, complex microbial metatranscriptomes/metaproteomes (for example in the human body), and for refining current proteomics data analysis that solely relies on the genomic sequence and predicted gene expression but not on validated gene products. Finally, the approach used in the here presented protocol can help to improve the data quality of conventional proteomics experiments that can be influenced by genetic variation or splicing events.
Collapse
Affiliation(s)
- Toni Luge
- Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195, Berlin, Germany
| | - Sascha Sauer
- Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195, Berlin, Germany.
| |
Collapse
|
206
|
Provan F, Nilsen MM, Larssen E, Uleberg KE, Sydnes MO, Lyng E, Øysæd KB, Baussant T. An evaluation of coral lophelia pertusa mucus as an analytical matrix for environmental monitoring: A preliminary proteomic study. JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH. PART A 2016; 79:647-657. [PMID: 27484144 DOI: 10.1080/15287394.2016.1210494] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
For the environmental monitoring of coral, mucus appears to be an appropriate biological matrix due to its array of functions in coral biology and the non-intrusive manner in which it can be collected. The aim of the present study was to evaluate the feasibility of using mucus of the stony coral Lophelia pertusa (L. pertusa) as an analytical matrix for discovery of biomarkers used for environmental monitoring. More specifically, to assess whether a mass-spectrometry-based proteomic approach can be applied to characterize the protein composition of coral mucus and changes related to petroleum discharges at the seafloor. Surface-enhanced laser desorption/ionization-time of flight mass spectrometry (SELDI-TOF MS) screening analyses of orange and white L. pertusa showed that the mucosal protein composition varies significantly with color phenotype, a pattern not reported prior to this study. Hence, to reduce variability from phenotype difference, L. pertusa white individuals only were selected to characterize in more detail the basal protein composition in mucus using liquid chromatography, mass spectrometry, mass spectrometry (LC-MS/MS). In total, 297 proteins were identified in L. pertusa mucus of unexposed coral individuals. Individuals exposed to drill cuttings in the range 2 to 12 mg/L showed modifications in coral mucus protein composition compared to unexposed corals. Although the results were somewhat inconsistent between individuals and require further validation in both the lab and the field, this study demonstrated preliminary encouraging results for discovery of protein markers in coral mucus that might provide more comprehensive insight into potential consequences attributed to anthropogenic stressors and may be used in future monitoring of coral health.
Collapse
Affiliation(s)
- Fiona Provan
- a International Research Institute of Stavanger (IRIS), Biomiljø , Randaberg , Norway
| | - Mari Mæland Nilsen
- a International Research Institute of Stavanger (IRIS), Biomiljø , Randaberg , Norway
| | - Eivind Larssen
- a International Research Institute of Stavanger (IRIS), Biomiljø , Randaberg , Norway
| | - Kai-Erik Uleberg
- a International Research Institute of Stavanger (IRIS), Biomiljø , Randaberg , Norway
| | - Magne O Sydnes
- a International Research Institute of Stavanger (IRIS), Biomiljø , Randaberg , Norway
- b Faculty of Science and Technology, Department of Mathematics and Natural Science , University of Stavanger , Stavanger , Norway
| | - Emily Lyng
- a International Research Institute of Stavanger (IRIS), Biomiljø , Randaberg , Norway
| | - Kjell Birger Øysæd
- a International Research Institute of Stavanger (IRIS), Biomiljø , Randaberg , Norway
| | - Thierry Baussant
- a International Research Institute of Stavanger (IRIS), Biomiljø , Randaberg , Norway
| |
Collapse
|
207
|
Vaudel M, Barsnes H, Ræder H, Berven FS. Using Proteomics Bioinformatics Tools and Resources in Proteogenomic Studies. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 926:65-75. [DOI: 10.1007/978-3-319-42316-6_5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
208
|
Hook V, Bandeira N. Neuropeptidomics Mass Spectrometry Reveals Signaling Networks Generated by Distinct Protease Pathways in Human Systems. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:1970-80. [PMID: 26483184 PMCID: PMC4749436 DOI: 10.1007/s13361-015-1251-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 07/30/2015] [Accepted: 08/05/2015] [Indexed: 05/23/2023]
Abstract
Neuropeptides regulate intercellular signaling as neurotransmitters of the central and peripheral nervous systems, and as peptide hormones in the endocrine system. Diverse neuropeptides of distinct primary sequences of various lengths, often with post-translational modifications, coordinate and integrate regulation of physiological functions. Mass spectrometry-based analysis of the diverse neuropeptide structures in neuropeptidomics research is necessary to define the full complement of neuropeptide signaling molecules. Human neuropeptidomics has notable importance in defining normal and dysfunctional neuropeptide signaling in human health and disease. Neuropeptidomics has great potential for expansion in translational research opportunities for defining neuropeptide mechanisms of human diseases, providing novel neuropeptide drug targets for drug discovery, and monitoring neuropeptides as biomarkers of drug responses. In consideration of the high impact of human neuropeptidomics for health, an observed gap in this discipline is the few published articles in human neuropeptidomics compared with, for example, human proteomics and related mass spectrometry disciplines. Focus on human neuropeptidomics will advance new knowledge of the complex neuropeptide signaling networks participating in the fine control of neuroendocrine systems. This commentary review article discusses several human neuropeptidomics accomplishments that illustrate the rapidly expanding diversity of neuropeptides generated by protease processing of pro-neuropeptide precursors occurring within the secretory vesicle proteome. Of particular interest is the finding that human-specific cathepsin V participates in producing enkephalin and likely other neuropeptides, indicating unique proteolytic mechanisms for generating human neuropeptides. The field of human neuropeptidomics has great promise to solve new mechanisms in disease conditions, leading to new drug targets and therapeutic agents for human diseases. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Vivian Hook
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, 92093-0719, USA.
- School of Medicine, Department of Neurosciences and Department of Pharmacology, University of California, San Diego, La Jolla, CA, 92093-0719, USA.
| | - Nuno Bandeira
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, 92093-0719, USA
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, 92093-0719, USA
| |
Collapse
|
209
|
Shanmugam AK, Nesvizhskii AI. Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics. J Proteome Res 2015; 14:5169-78. [PMID: 26569054 DOI: 10.1021/acs.jproteome.5b00504] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
In shotgun proteomics, peptides are typically identified using database searching, which involves scoring acquired tandem mass spectra against peptides derived from standard protein sequence databases such as Uniprot, Refseq, or Ensembl. In this strategy, the sensitivity of peptide identification is known to be affected by the size of the search space. Therefore, creating a targeted sequence database containing only peptides likely to be present in the analyzed sample can be a useful technique for improving the sensitivity of peptide identification. In this study, we describe how targeted peptide databases can be created based on the frequency of identification in the global proteome machine database (GPMDB), the largest publicly available repository of peptide and protein identification data. We demonstrate that targeted peptide databases can be easily integrated into existing proteome analysis workflows and describe a computational strategy for minimizing any loss of peptide identifications arising from potential search space incompleteness in the targeted search spaces. We demonstrate the performance of our workflow using several data sets of varying size and sample complexity.
Collapse
Affiliation(s)
- Avinash K Shanmugam
- Department of Computational Medicine and Bioinformatics and ‡Department of Pathology, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics and ‡Department of Pathology, University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|
210
|
Mayne J, Ning Z, Zhang X, Starr AE, Chen R, Deeke S, Chiang CK, Xu B, Wen M, Cheng K, Seebun D, Star A, Moore JI, Figeys D. Bottom-Up Proteomics (2013-2015): Keeping up in the Era of Systems Biology. Anal Chem 2015; 88:95-121. [PMID: 26558748 DOI: 10.1021/acs.analchem.5b04230] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Janice Mayne
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Zhibin Ning
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Xu Zhang
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Amanda E Starr
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Rui Chen
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Shelley Deeke
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Cheng-Kang Chiang
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Bo Xu
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Ming Wen
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Kai Cheng
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Deeptee Seebun
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Alexandra Star
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Jasmine I Moore
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Daniel Figeys
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| |
Collapse
|
211
|
Choong WK, Chang HY, Chen CT, Tsai CF, Hsu WL, Chen YJ, Sung TY. Informatics View on the Challenges of Identifying Missing Proteins from Shotgun Proteomics. J Proteome Res 2015; 14:5396-407. [DOI: 10.1021/acs.jproteome.5b00482] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Wai-Kok Choong
- Institute
of Information Science, Academia Sinica, Taipei 11529, Taiwan
| | - Hui-Yin Chang
- Institute
of Information Science, Academia Sinica, Taipei 11529, Taiwan
- Bioinformatics
Program, Taiwan International Graduate Program, Academia Sinica, Taipei 11529, Taiwan
- Institute
of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan
| | - Ching-Tai Chen
- Institute
of Information Science, Academia Sinica, Taipei 11529, Taiwan
| | - Chia-Feng Tsai
- Institute
of Chemistry, Academia Sinica, Taipei 11529, Taiwan
| | - Wen-Lian Hsu
- Institute
of Information Science, Academia Sinica, Taipei 11529, Taiwan
| | - Yu-Ju Chen
- Institute
of Chemistry, Academia Sinica, Taipei 11529, Taiwan
| | - Ting-Yi Sung
- Institute
of Information Science, Academia Sinica, Taipei 11529, Taiwan
| |
Collapse
|
212
|
Liang X, Xia Z, Jian L, Niu X, Link A. An adaptive classification model for peptide identification. BMC Genomics 2015; 16 Suppl 11:S1. [PMID: 26578406 PMCID: PMC4652454 DOI: 10.1186/1471-2164-16-s11-s1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Background Peptide sequence assignment is the central task in protein identification with MS/MS-based strategies. Although a number of post-database search algorithms for filtering target peptide spectrum matches (PSMs) have been developed, the discrepancy among the output PSMs is usually significant, remaining a few disputable PSMs. Current studies show that a number of target PSMs which are close to decoy PSMs can hardly be separated from those decoys by only using the discrimination function. Results In this paper, we assign each target PSM a weight showing its possibility of being correct. We employ a SVM-based learning model to search the optimal weight for each target PSM and develop a new score system, CRanker, to rank all target PSMs. Due to the large PSM datasets generated in routine database searches, we use the Cholesky factorization technique for storing a kernel matrix to reduce the memory requirement. Conclusions Compared with PeptideProphet and Percolator, CRanker has identified more PSMs under similar false discover rates over different datasets. CRanker has shown consistent performance on different test sets, validated the reasonability the proposed model.
Collapse
|
213
|
Computational proteomics: Integrating mass spectral data into a biological context. J Proteomics 2015; 129:1-2. [PMID: 26521030 DOI: 10.1016/j.jprot.2015.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
214
|
Friso G, van Wijk KJ. Posttranslational Protein Modifications in Plant Metabolism. PLANT PHYSIOLOGY 2015; 169:1469-87. [PMID: 26338952 PMCID: PMC4634103 DOI: 10.1104/pp.15.01378] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Accepted: 09/02/2015] [Indexed: 05/18/2023]
Abstract
Posttranslational modifications (PTMs) of proteins greatly expand proteome diversity, increase functionality, and allow for rapid responses, all at relatively low costs for the cell. PTMs play key roles in plants through their impact on signaling, gene expression, protein stability and interactions, and enzyme kinetics. Following a brief discussion of the experimental and bioinformatics challenges of PTM identification, localization, and quantification (occupancy), a concise overview is provided of the major PTMs and their (potential) functional consequences in plants, with emphasis on plant metabolism. Classic examples that illustrate the regulation of plant metabolic enzymes and pathways by PTMs and their cross talk are summarized. Recent large-scale proteomics studies mapped many PTMs to a wide range of metabolic functions. Unraveling of the PTM code, i.e. a predictive understanding of the (combinatorial) consequences of PTMs, is needed to convert this growing wealth of data into an understanding of plant metabolic regulation.
Collapse
Affiliation(s)
- Giulia Friso
- School for Integrative Plant Sciences, Section Plant Biology, Cornell University, Ithaca, New York 14853
| | - Klaas J van Wijk
- School for Integrative Plant Sciences, Section Plant Biology, Cornell University, Ithaca, New York 14853
| |
Collapse
|
215
|
Ezkurdia I, Calvo E, Del Pozo A, Vázquez J, Valencia A, Tress ML. The potential clinical impact of the release of two drafts of the human proteome. Expert Rev Proteomics 2015; 12:579-93. [PMID: 26496066 PMCID: PMC4732427 DOI: 10.1586/14789450.2015.1103186] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The authors have carried out an investigation of the two "draft maps of the human proteome" published in 2014 in Nature. The findings include an abundance of poor spectra, low-scoring peptide-spectrum matches and incorrectly identified proteins in both these studies, highlighting clear issues with the application of false discovery rates. This noise means that the claims made by the two papers - the identification of high numbers of protein coding genes, the detection of novel coding regions and the draft tissue maps themselves - should be treated with considerable caution. The authors recommend that clinicians and researchers do not use the unfiltered data from these studies. Despite this these studies will inspire further investigation into tissue-based proteomics. As long as this future work has proper quality controls, it could help produce a consensus map of the human proteome and improve our understanding of the processes that underlie health and disease.
Collapse
Affiliation(s)
- Iakes Ezkurdia
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Enrique Calvo
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Angela Del Pozo
- Instituto de Genetica Medica y Molecular, Hospital Universitario La Paz, Madrid, Spain
| | - Jesús Vázquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Michael L. Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
216
|
Bilbao A, Zhang Y, Varesio E, Luban J, Strambio-De-Castillia C, Lisacek F, Hopfgartner G. Ranking Fragment Ions Based on Outlier Detection for Improved Label-Free Quantification in Data-Independent Acquisition LC-MS/MS. J Proteome Res 2015; 14:4581-93. [PMID: 26412574 DOI: 10.1021/acs.jproteome.5b00394] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Data-independent acquisition LC-MS/MS techniques complement supervised methods for peptide quantification. However, due to the wide precursor isolation windows, these techniques are prone to interference at the fragment ion level, which, in turn, is detrimental for accurate quantification. The nonoutlier fragment ion (NOFI) ranking algorithm has been developed to assign low priority to fragment ions affected by interference. By using the optimal subset of high-priority fragment ions, these interfered fragment ions are effectively excluded from quantification. NOFI represents each fragment ion as a vector of four dimensions related to chromatographic and MS fragmentation attributes and applies multivariate outlier detection techniques. Benchmarking conducted on a well-defined quantitative data set (i.e., the SWATH Gold Standard) indicates that NOFI on average is able to accurately quantify 11-25% more peptides than the commonly used Top-N library intensity ranking method. The sum of the area of the Top3-5 NOFIs produces similar coefficients of variation as compared to that with the library intensity method but with more accurate quantification results. On a biologically relevant human dendritic cell digest data set, NOFI properly assigns low-priority ranks to 85% of annotated interferences, resulting in sensitivity values between 0.92 and 0.80, against 0.76 for the Spectronaut interference detection algorithm.
Collapse
Affiliation(s)
- Aivett Bilbao
- Life Sciences Mass Spectrometry, School of Pharmaceutical Sciences, University of Geneva, University of Lausanne , CH-1211 Geneva 4, Switzerland.,Proteome Informatics Group, SIB Swiss Institute of Bioinformatics , CH-1211 Geneva 4, Switzerland
| | - Ying Zhang
- Life Sciences Mass Spectrometry, School of Pharmaceutical Sciences, University of Geneva, University of Lausanne , CH-1211 Geneva 4, Switzerland
| | - Emmanuel Varesio
- Life Sciences Mass Spectrometry, School of Pharmaceutical Sciences, University of Geneva, University of Lausanne , CH-1211 Geneva 4, Switzerland
| | - Jeremy Luban
- Program in Molecular Medicine, University of Massachusetts Medical School , Worcester, Massachusetts 01605, United States
| | - Caterina Strambio-De-Castillia
- Program in Molecular Medicine, University of Massachusetts Medical School , Worcester, Massachusetts 01605, United States
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics , CH-1211 Geneva 4, Switzerland.,Faculty of Sciences, University of Geneva , CH-1211 Geneva 4, Switzerland
| | - Gérard Hopfgartner
- Life Sciences Mass Spectrometry, School of Pharmaceutical Sciences, University of Geneva, University of Lausanne , CH-1211 Geneva 4, Switzerland
| |
Collapse
|
217
|
Pettersen VK, Steinsland H, Wiker HG. Improving genome annotation of enterotoxigenicEscherichia coliTW10598 by a label-free quantitative MS/MS approach. Proteomics 2015; 15:3826-34. [DOI: 10.1002/pmic.201500278] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 08/18/2015] [Accepted: 09/04/2015] [Indexed: 12/14/2022]
Affiliation(s)
- Veronika Kuchařová Pettersen
- The Gade Research Group for Infection and Immunity; Department of Clinical Science; University of Bergen; Bergen Norway
| | - Hans Steinsland
- Centre for International Health; Department of Global Public Health and Primary Care; University of Bergen; Bergen Norway
- Department of Biomedicine; University of Bergen; Bergen Norway
| | - Harald G. Wiker
- The Gade Research Group for Infection and Immunity; Department of Clinical Science; University of Bergen; Bergen Norway
| |
Collapse
|
218
|
|
219
|
Arsène-Ploetze F, Bertin PN, Carapito C. Proteomic tools to decipher microbial community structure and functioning. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2015; 22:13599-13612. [PMID: 25475614 PMCID: PMC4560766 DOI: 10.1007/s11356-014-3898-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 11/20/2014] [Indexed: 06/04/2023]
Abstract
Recent advances in microbial ecology allow studying microorganisms in their environment, without laboratory cultivation, in order to get access to the large uncultivable microbial community. With this aim, environmental proteomics has emerged as an appropriate complementary approach to metagenomics providing information on key players that carry out main metabolic functions and addressing the adaptation capacities of living organisms in situ. In this review, a wide range of proteomic approaches applied to investigate the structure and functioning of microbial communities as well as recent examples of such studies are presented.
Collapse
Affiliation(s)
- Florence Arsène-Ploetze
- Génétique moléculaire, Génomique et Microbiologie, Université de Strasbourg, UMR7156 CNRS, Strasbourg, France,
| | | | | |
Collapse
|
220
|
Manes NP, Mann JM, Nita-Lazar A. Selected Reaction Monitoring Mass Spectrometry for Absolute Protein Quantification. J Vis Exp 2015:e52959. [PMID: 26325288 DOI: 10.3791/52959] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Absolute quantification of target proteins within complex biological samples is critical to a wide range of research and clinical applications. This protocol provides step-by-step instructions for the development and application of quantitative assays using selected reaction monitoring (SRM) mass spectrometry (MS). First, likely quantotypic target peptides are identified based on numerous criteria. This includes identifying proteotypic peptides, avoiding sites of posttranslational modification, and analyzing the uniqueness of the target peptide to the target protein. Next, crude external peptide standards are synthesized and used to develop SRM assays, and the resulting assays are used to perform qualitative analyses of the biological samples. Finally, purified, quantified, heavy isotope labeled internal peptide standards are prepared and used to perform isotope dilution series SRM assays. Analysis of all of the resulting MS data is presented. This protocol was used to accurately assay the absolute abundance of proteins of the chemotaxis signaling pathway within RAW 264.7 cells (a mouse monocyte/macrophage cell line). The quantification of Gi2 (a heterotrimeric G-protein α-subunit) is described in detail.
Collapse
Affiliation(s)
- Nathan P Manes
- Cellular Networks Proteomics Unit, Laboratory of Systems Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health
| | - Jessica M Mann
- Cellular Networks Proteomics Unit, Laboratory of Systems Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health
| | - Aleksandra Nita-Lazar
- Cellular Networks Proteomics Unit, Laboratory of Systems Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health;
| |
Collapse
|
221
|
Deutsch EW, Mendoza L, Shteynberg D, Slagel J, Sun Z, Moritz RL. Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. Proteomics Clin Appl 2015; 9:745-54. [PMID: 25631240 PMCID: PMC4506239 DOI: 10.1002/prca.201400164] [Citation(s) in RCA: 243] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Revised: 12/19/2014] [Accepted: 01/27/2015] [Indexed: 11/11/2022]
Abstract
Democratization of genomics technologies has enabled the rapid determination of genotypes. More recently the democratization of comprehensive proteomics technologies is enabling the determination of the cellular phenotype and the molecular events that define its dynamic state. Core proteomic technologies include MS to define protein sequence, protein:protein interactions, and protein PTMs. Key enabling technologies for proteomics are bioinformatic pipelines to identify, quantitate, and summarize these events. The Trans-Proteomics Pipeline (TPP) is a robust open-source standardized data processing pipeline for large-scale reproducible quantitative MS proteomics. It supports all major operating systems and instrument vendors via open data formats. Here, we provide a review of the overall proteomics workflow supported by the TPP, its major tools, and how it can be used in its various modes from desktop to cloud computing. We describe new features for the TPP, including data visualization functionality. We conclude by describing some common perils that affect the analysis of MS/MS datasets, as well as some major upcoming features.
Collapse
Affiliation(s)
| | | | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, WA, USA
| | | |
Collapse
|
222
|
Opheim M, Šližytė R, Sterten H, Provan F, Larssen E, Kjos NP. Hydrolysis of Atlantic salmon (Salmo salar) rest raw materials—Effect of raw material and processing on composition, nutritional value, and potential bioactive peptides in the hydrolysates. Process Biochem 2015. [DOI: 10.1016/j.procbio.2015.04.017] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
223
|
Keich U, Kertesz-Farkas A, Noble WS. Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics. J Proteome Res 2015; 14:3148-61. [PMID: 26152888 PMCID: PMC4533616 DOI: 10.1021/acs.jproteome.5b00081] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
![]()
Interpreting the potentially vast
number of hypotheses generated
by a shotgun proteomics experiment requires a valid and accurate procedure
for assigning statistical confidence estimates to identified tandem
mass spectra. Despite the crucial role such procedures play in most
high-throughput proteomics experiments, the scientific literature
has not reached a consensus about the best confidence estimation methodology.
In this work, we evaluate, using theoretical and empirical analysis,
four previously proposed protocols for estimating the false discovery
rate (FDR) associated with a set of identified tandem mass spectra:
two variants of the target-decoy competition protocol (TDC) of Elias
and Gygi and two variants of the separate target-decoy search protocol
of Käll et al. Our analysis reveals significant biases in the
two separate target-decoy search protocols. Moreover, the one TDC
protocol that provides an unbiased FDR estimate among the target PSMs
does so at the cost of forfeiting a random subset of high-scoring
spectrum identifications. We therefore propose the mix-max procedure
to provide unbiased, accurate FDR estimates in the presence of well-calibrated
scores. The method avoids biases associated with the two separate
target-decoy search protocols and also avoids the propensity for target-decoy
competition to discard a random subset of high-scoring target identifications.
Collapse
Affiliation(s)
- Uri Keich
- †School of Mathematics and Statistics F07, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Attila Kertesz-Farkas
- ‡Department of Genome Sciences, University of Washington, Foege Building S220B, 3720 15th Avenue North East, Seattle, Washington 98195-5065, United States
| | - William Stafford Noble
- ‡Department of Genome Sciences, University of Washington, Foege Building S220B, 3720 15th Avenue North East, Seattle, Washington 98195-5065, United States.,§Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195-5065, United States
| |
Collapse
|
224
|
Akhtar MN, Southey BR, Andrén PE, Sweedler JV, Rodriguez-Zas SL. Identification of best indicators of peptide-spectrum match using a permutation resampling approach. J Bioinform Comput Biol 2015; 12:1440001. [PMID: 25362838 DOI: 10.1142/s0219720014400010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Various indicators of observed-theoretical spectrum matches were compared and the resulting statistical significance was characterized using permutation resampling. Novel decoy databases built by resampling the terminal positions of peptide sequences were evaluated to identify the conditions for accurate computation of peptide match significance levels. The methodology was tested on real and manually curated tandem mass spectra from peptides across a wide range of sizes. Spectra match indicators from complementary database search programs were profiled and optimal indicators were identified. The combination of the optimal indicator and permuted decoy databases improved the calculation of the peptide match significance compared to the approaches currently implemented in the database search programs that rely on distributional assumptions. Permutation tests using p-values obtained from software-dependent matching scores and E-values outperformed permutation tests using all other indicators. The higher overlap in matches between the database search programs when using end permutation compared to existing approaches confirmed the superiority of the end permutation method to identify peptides. The combination of effective match indicators and the end permutation method is recommended for accurate detection of peptides.
Collapse
Affiliation(s)
- Malik N Akhtar
- Department of Animal Sciences, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | | | | | | | | |
Collapse
|
225
|
Carapito C, Lane L, Benama M, Opsomer A, Mouton-Barbosa E, Garrigues L, Gonzalez de Peredo A, Burel A, Bruley C, Gateau A, Bouyssié D, Jaquinod M, Cianferani S, Burlet-Schiltz O, Van Dorsselaer A, Garin J, Vandenbrouck Y. Computational and Mass-Spectrometry-Based Workflow for the Discovery and Validation of Missing Human Proteins: Application to Chromosomes 2 and 14. J Proteome Res 2015; 14:3621-34. [PMID: 26132440 DOI: 10.1021/pr5010345] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
In the framework of the C-HPP, our Franco-Swiss consortium has adopted chromosomes 2 and 14, coding for a total of 382 missing proteins (proteins for which evidence is lacking at protein level). Over the last 4 years, the French proteomics infrastructure has collected high-quality data sets from 40 human samples, including a series of rarely studied cell lines, tissue types, and sample preparations. Here we described a step-by-step strategy based on the use of bioinformatics screening and subsequent mass spectrometry (MS)-based validation to identify what were up to now missing proteins in these data sets. Screening database search results (85,326 dat files) identified 58 of the missing proteins (36 on chromosome 2 and 22 on chromosome 14) by 83 unique peptides following the latest release of neXtProt (2014-09-19). PSMs corresponding to these peptides were thoroughly examined by applying two different MS-based criteria: peptide-level false discovery rate calculation and expert PSM quality assessment. Synthetic peptides were then produced and used to generate reference MS/MS spectra. A spectral similarity score was then calculated for each pair of reference-endogenous spectra and used as a third criterion for missing protein validation. Finally, LC-SRM assays were developed to target proteotypic peptides from four of the missing proteins detected in tissue/cell samples, which were still available and for which sample preparation could be reproduced. These LC-SRM assays unambiguously detected the endogenous unique peptide for three of the proteins. For two of these, identification was confirmed by additional proteotypic peptides. We concluded that of the initial set of 58 proteins detected by the bioinformatics screen, the consecutive MS-based validation criteria led to propose the identification of 13 of these proteins (8 on chromosome 2 and 5 on chromosome 14) that passed at least two of the three MS-based criteria. Thus, a rigorous step-by-step approach combining bioinformatics screening and MS-based validation assays is particularly suitable to obtain protein-level evidence for proteins previously considered as missing. All MS/MS data have been deposited in ProteomeXchange under identifier PXD002131.
Collapse
Affiliation(s)
- Christine Carapito
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178 , 25 Rue Becquerel, 67087 Strasbourg, France
| | - Lydie Lane
- CALIPHO Group, SIB-Swiss Institute of Bioinformatics, CMU , rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland.,Department of Human Protein Sciences, Faculty of Medicine, rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland
| | - Mohamed Benama
- CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, 17 rue des martyrs, Grenoble, F-38054, France.,INSERM U1038 , 17, rue des Martyrs, Grenoble F-38054, France.,Université Grenoble , Grenoble F-38054, France
| | - Alisson Opsomer
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178 , 25 Rue Becquerel, 67087 Strasbourg, France
| | - Emmanuelle Mouton-Barbosa
- CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 118 route de Narbonne, 31077 Toulouse, France.,Université de Toulouse , 205, route de Narbonne, 31077 Toulouse, France
| | - Luc Garrigues
- CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 118 route de Narbonne, 31077 Toulouse, France.,Université de Toulouse , 205, route de Narbonne, 31077 Toulouse, France
| | - Anne Gonzalez de Peredo
- CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 118 route de Narbonne, 31077 Toulouse, France.,Université de Toulouse , 205, route de Narbonne, 31077 Toulouse, France
| | - Alexandre Burel
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178 , 25 Rue Becquerel, 67087 Strasbourg, France
| | - Christophe Bruley
- CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, 17 rue des martyrs, Grenoble, F-38054, France.,INSERM U1038 , 17, rue des Martyrs, Grenoble F-38054, France.,Université Grenoble , Grenoble F-38054, France
| | - Alain Gateau
- CALIPHO Group, SIB-Swiss Institute of Bioinformatics, CMU , rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland.,Department of Human Protein Sciences, Faculty of Medicine, rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland
| | - David Bouyssié
- CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 118 route de Narbonne, 31077 Toulouse, France.,Université de Toulouse , 205, route de Narbonne, 31077 Toulouse, France
| | - Michel Jaquinod
- CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, 17 rue des martyrs, Grenoble, F-38054, France.,INSERM U1038 , 17, rue des Martyrs, Grenoble F-38054, France.,Université Grenoble , Grenoble F-38054, France
| | - Sarah Cianferani
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178 , 25 Rue Becquerel, 67087 Strasbourg, France
| | - Odile Burlet-Schiltz
- CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 118 route de Narbonne, 31077 Toulouse, France.,Université de Toulouse , 205, route de Narbonne, 31077 Toulouse, France
| | - Alain Van Dorsselaer
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178 , 25 Rue Becquerel, 67087 Strasbourg, France
| | - Jérôme Garin
- CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, 17 rue des martyrs, Grenoble, F-38054, France.,INSERM U1038 , 17, rue des Martyrs, Grenoble F-38054, France.,Université Grenoble , Grenoble F-38054, France
| | - Yves Vandenbrouck
- CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, 17 rue des martyrs, Grenoble, F-38054, France.,INSERM U1038 , 17, rue des Martyrs, Grenoble F-38054, France.,Université Grenoble , Grenoble F-38054, France
| |
Collapse
|
226
|
Šlechtová T, Gilar M, Kalíková K, Tesařová E. Insight into Trypsin Miscleavage: Comparison of Kinetic Constants of Problematic Peptide Sequences. Anal Chem 2015; 87:7636-43. [PMID: 26158323 DOI: 10.1021/acs.analchem.5b00866] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Trypsin, a high fidelity protease, is the most widely used enzyme for protein digestion in proteomic research. Optimal digestion conditions are well-known and so are the expected cleavage products. However, missed cleavage sites are frequently observed when acidic amino acids, aspartic and glutamic acids, are present near the cleavage site. Also, the sequence motifs with successive lysine and/or arginine residues represent a source of missed cleaved sites. In spite of an adverse role of missed cleaved peptides on proteomic research, the digestion kinetics of these problematic sequences is not well-known. In this work, synthetic peptides with various sequence motifs were used as trypsin substrates. Cleavage products were analyzed with reversed-phase high performance liquid chromatography, and the kinetic constants for selected missed cleavage sites were calculated. Relative digestion speed for lysine and arginine sites is compared, including the digestion motifs flanked with aspartic and glutamic acid. Our findings show that DK and DTR motifs are cleaved by trypsin with 3 orders of magnitude lower speed than the arginine site. These motifs are likely to produce missed cleavage peptides in protein tryptic digests even at prolonged digestion times.
Collapse
Affiliation(s)
- Tereza Šlechtová
- †Department of Physical and Macromolecular Chemistry, Faculty of Science, Charles University in Prague, Hlavova 8, 128 43, Prague, Czech Republic
| | - Martin Gilar
- ‡Waters Corporation, 34 Maple Street, Milford, Massachusetts 01757, United States
| | - Květa Kalíková
- †Department of Physical and Macromolecular Chemistry, Faculty of Science, Charles University in Prague, Hlavova 8, 128 43, Prague, Czech Republic
| | - Eva Tesařová
- †Department of Physical and Macromolecular Chemistry, Faculty of Science, Charles University in Prague, Hlavova 8, 128 43, Prague, Czech Republic
| |
Collapse
|
227
|
Szabo Z, Janaky T. Challenges and developments in protein identification using mass spectrometry. Trends Analyt Chem 2015. [DOI: 10.1016/j.trac.2015.03.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
228
|
|
229
|
Savitski MM, Wilhelm M, Hahne H, Kuster B, Bantscheff M. A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets. Mol Cell Proteomics 2015; 14:2394-404. [PMID: 25987413 DOI: 10.1074/mcp.m114.046995] [Citation(s) in RCA: 286] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Indexed: 02/06/2023] Open
Abstract
Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target-decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target-decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The "picked" protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The "picked" target-decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used "classic" protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software.
Collapse
Affiliation(s)
| | - Mathias Wilhelm
- §Chair for Proteomics and Bioanalytics, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354 Freising, Germany; ¶SAP SE, Dietmar-Hopp-Allee 16, 69190 Walldorf, Germany
| | - Hannes Hahne
- §Chair for Proteomics and Bioanalytics, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354 Freising, Germany
| | - Bernhard Kuster
- §Chair for Proteomics and Bioanalytics, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354 Freising, Germany; ‖Center for Integrated Protein Science Munich, Emil Erlenmeyer Forum 5, 85354 Freising, Germany
| | - Marcus Bantscheff
- From the ‡Cellzome GmbH, Meyerhofstrasse 1, 69117 Heidelberg, Germany;
| |
Collapse
|
230
|
Deutsch EW, Albar JP, Binz PA, Eisenacher M, Jones AR, Mayer G, Omenn GS, Orchard S, Vizcaíno JA, Hermjakob H. Development of data representation standards by the human proteome organization proteomics standards initiative. J Am Med Inform Assoc 2015; 22:495-506. [PMID: 25726569 PMCID: PMC4457114 DOI: 10.1093/jamia/ocv001] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Revised: 09/29/2014] [Accepted: 01/05/2015] [Indexed: 11/22/2022] Open
Abstract
OBJECTIVE To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization, the methods that the PSI has employed to create data standards, the resulting output of the PSI, lessons learned from the PSI's evolution, and future directions and synergies for the group. MATERIALS AND METHODS The PSI has 5 categories of deliverables that have guided the group. These are minimum information guidelines, data formats, controlled vocabularies, resources and software tools, and dissemination activities. These deliverables are produced via the leadership and working group organization of the initiative, driven by frequent workshops and ongoing communication within the working groups. Official standards are subjected to a rigorous document process that includes several levels of peer review prior to release. RESULTS We have produced and published minimum information guidelines describing what information should be provided when making data public, either via public repositories or other means. The PSI has produced a series of standard formats covering mass spectrometer input, mass spectrometer output, results of informatics analysis (both qualitative and quantitative analyses), reports of molecular interaction data, and gel electrophoresis analyses. We have produced controlled vocabularies that ensure that concepts are uniformly annotated in the formats and engaged in extensive software development and dissemination efforts so that the standards can efficiently be used by the community.Conclusion In its first dozen years of operation, the PSI has produced many standards that have accelerated the field of proteomics by facilitating data exchange and deposition to data repositories. We look to the future to continue developing standards for new proteomics technologies and workflows and mechanisms for integration with other omics data types. Our products facilitate the translation of genomics and proteomics findings to clinical and biological phenotypes. The PSI website can be accessed at http://www.psidev.info.
Collapse
Affiliation(s)
| | - Juan Pablo Albar
- Died July 18, 2014 Proteomics Facility, Centro Nacional de Biotecnología - CSIC, Madrid, Spain ProteoRed Consortium, Spanish National Institute of Proteomics, Madrid, Spain
| | - Pierre-Alain Binz
- CHUV Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Martin Eisenacher
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, Bochum, Germany
| | - Andrew R Jones
- Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Gerhard Mayer
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, Bochum, Germany
| | - Gilbert S Omenn
- Institute for Systems Biology, Seattle, USA Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, USA
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
231
|
Guthals A, Boucher C, Bandeira N. The generating function approach for Peptide identification in spectral networks. J Comput Biol 2015; 22:353-66. [PMID: 25423621 PMCID: PMC4425220 DOI: 10.1089/cmb.2014.0165] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Tandem mass (MS/MS) spectrometry has become the method of choice for protein identification and has launched a quest for the identification of every translated protein and peptide. However, computational developments have lagged behind the pace of modern data acquisition protocols and have become a major bottleneck in proteomics analysis of complex samples. As it stands today, attempts to identify MS/MS spectra against large databases (e.g., the human microbiome or 6-frame translation of the human genome) face a search space that is 10-100 times larger than the human proteome, where it becomes increasingly challenging to separate between true and false peptide matches. As a result, the sensitivity of current state-of-the-art database search methods drops by nearly 38% to such low identification rates that almost 90% of all MS/MS spectra are left as unidentified. We address this problem by extending the generating function approach to rigorously compute the joint spectral probability of multiple spectra being matched to peptides with overlapping sequences, thus enabling the confident assignment of higher significance to overlapping peptide-spectrum matches (PSMs). We find that these joint spectral probabilities can be several orders of magnitude more significant than individual PSMs, even in the ideal case when perfect separation between signal and noise peaks could be achieved per individual MS/MS spectrum. After benchmarking this approach on a typical lysate MS/MS dataset, we show that the proposed intersecting spectral probabilities for spectra from overlapping peptides improve peptide identification by 30-62%.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California–San Diego, La Jolla, California
| | - Christina Boucher
- Department of Computer Science, Colorado State University, Fort Collins, Colorado
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California–San Diego, La Jolla, California
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California–San Diego, La Jolla, California
| |
Collapse
|
232
|
Muth T, Kolmeder CA, Salojärvi J, Keskitalo S, Varjosalo M, Verdam FJ, Rensen SS, Reichl U, de Vos WM, Rapp E, Martens L. Navigating through metaproteomics data: a logbook of database searching. Proteomics 2015; 15:3439-53. [PMID: 25778831 DOI: 10.1002/pmic.201400560] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Revised: 02/13/2015] [Accepted: 03/06/2015] [Indexed: 11/12/2022]
Abstract
Metaproteomic research involves various computational challenges during the identification of fragmentation spectra acquired from the proteome of a complex microbiome. These issues are manifold and range from the construction of customized sequence databases, the optimal setting of search parameters to limitations in the identification search algorithms themselves. In order to assess the importance of these individual factors, we studied the effect of strategies to combine different search algorithms, explored the influence of chosen database search settings, and investigated the impact of the size of the protein sequence database used for identification. Furthermore, we applied de novo sequencing as a complementary approach to classic database searching. All evaluations were performed on a human intestinal metaproteome dataset. Pyrococcus furiosus proteome data were used to contrast database searching of metaproteomic data to a classic proteomic experiment. Searching against subsets of metaproteome databases and the use of multiple search engines increased the number of identifications. The integration of P. furiosus sequences in a metaproteomic sequence database showcased the limitation of the target-decoy-controlled false discovery rate approach in combination with large sequence databases. The selection of varying search engine parameters and the application of de novo sequencing represented useful methods to increase the reliability of the results. Based on our findings, we provide recommendations for the data analysis that help researchers to establish or improve analysis workflows in metaproteomics.
Collapse
Affiliation(s)
- Thilo Muth
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Carolin A Kolmeder
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
| | - Jarkko Salojärvi
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
| | - Salla Keskitalo
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Markku Varjosalo
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Froukje J Verdam
- Department of General Surgery, NUTRIM, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Sander S Rensen
- Department of General Surgery, NUTRIM, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Udo Reichl
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany.,Otto-von-Guericke University, Bioprocess Engineering, Magdeburg, Germany
| | - Willem M de Vos
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland.,Department of Bacteriology and Immunology, University of Helsinki, Helsinki, Finland.,Laboratory of Microbiology, Wageningen University, Wageningen, The Netherlands
| | - Erdmann Rapp
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Lennart Martens
- Department of Biochemistry, Ghent University, Ghent, Belgium.,Department of Medical Protein Research, VIB, Ghent, Belgium
| |
Collapse
|
233
|
Tsou CC, Avtonomov D, Larsen B, Tucholska M, Choi H, Gingras AC, Nesvizhskii AI. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat Methods 2015; 12:258-64, 7 p following 264. [PMID: 25599550 PMCID: PMC4399776 DOI: 10.1038/nmeth.3255] [Citation(s) in RCA: 419] [Impact Index Per Article: 46.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Accepted: 11/17/2014] [Indexed: 12/26/2022]
Abstract
As a result of recent improvements in mass spectrometry (MS), there is increased interest in data-independent acquisition (DIA) strategies in which all peptides are systematically fragmented using wide mass-isolation windows ('multiplex fragmentation'). DIA-Umpire (http://diaumpire.sourceforge.net/), a comprehensive computational workflow and open-source software for DIA data, detects precursor and fragment chromatographic features and assembles them into pseudo-tandem MS spectra. These spectra can be identified with conventional database-searching and protein-inference tools, allowing sensitive, untargeted analysis of DIA data without the need for a spectral library. Quantification is done with both precursor- and fragment-ion intensities. Furthermore, DIA-Umpire enables targeted extraction of quantitative information based on peptides initially identified in only a subset of the samples, resulting in more consistent quantification across multiple samples. We demonstrated the performance of the method with control samples of varying complexity and publicly available glycoproteomics and affinity purification-MS data.
Collapse
Affiliation(s)
- Chih-Chiang Tsou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
- Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA
| | - Dmitry Avtonomov
- Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA
| | - Brett Larsen
- Lunenfeld-Tanenbaum Research Institute, Toronto, Canada
| | | | - Hyungwon Choi
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| | - Anne-Claude Gingras
- Lunenfeld-Tanenbaum Research Institute, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Alexey I. Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
- Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
234
|
Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nat Methods 2015; 11:1114-25. [PMID: 25357241 DOI: 10.1038/nmeth.3144] [Citation(s) in RCA: 505] [Impact Index Per Article: 56.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 09/22/2014] [Indexed: 12/19/2022]
Abstract
Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry-based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models. In recent years, owing to the emergence of new sequencing technologies such as RNA-seq and dramatic improvements in the depth and throughput of mass spectrometry-based proteomics, the pace of proteogenomic research has greatly accelerated. Here I review the current state of proteogenomic methods and applications, including computational strategies for building and using customized protein sequence databases. I also draw attention to the challenge of false positive identifications in proteogenomics and provide guidelines for analyzing the data and reporting the results of proteogenomic studies.
Collapse
Affiliation(s)
- Alexey I Nesvizhskii
- 1] Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
235
|
Sauer S, Luge T. Nutriproteomics: Facts, concepts, and perspectives. Proteomics 2015; 15:997-1013. [DOI: 10.1002/pmic.201400383] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Revised: 11/03/2014] [Accepted: 11/27/2014] [Indexed: 12/19/2022]
Affiliation(s)
- Sascha Sauer
- Otto Warburg Laboratory; Max Planck Institute for Molecular Genetics; Berlin Germany
| | - Toni Luge
- Otto Warburg Laboratory; Max Planck Institute for Molecular Genetics; Berlin Germany
| |
Collapse
|
236
|
PhosphoHunter: An Efficient Software Tool for Phosphopeptide Identification. Adv Bioinformatics 2015; 2015:382869. [PMID: 25653679 PMCID: PMC4309027 DOI: 10.1155/2015/382869] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 12/14/2014] [Accepted: 12/15/2014] [Indexed: 12/31/2022] Open
Abstract
Phosphorylation is a protein posttranslational modification. It is responsible of the activation/inactivation of disease-related pathways, thanks to its role of “molecular switch.” The study of phosphorylated proteins becomes a key point for the proteomic analyses focused on the identification of diagnostic/therapeutic targets. Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the most widely used analytical approach. Although unmodified peptides are automatically identified by consolidated algorithms, phosphopeptides still require automated tools to avoid time-consuming manual interpretation. To improve phosphopeptide identification efficiency, a novel procedure was developed and implemented in a Perl/C tool called PhosphoHunter, here proposed and evaluated. It includes a preliminary heuristic step for filtering out the MS/MS spectra produced by nonphosphorylated peptides before sequence identification. A method to assess the statistical significance of identified phosphopeptides was also formulated. PhosphoHunter performance was tested on a dataset of 1500 MS/MS spectra and it was compared with two other tools: Mascot and Inspect. Comparisons demonstrated that a strong point of PhosphoHunter is sensitivity, suggesting that it is able to identify real phosphopeptides with superior performance. Performance indexes depend on a single parameter (intensity threshold) that users can tune according to the study aim. All the three tools localized >90% of phosphosites.
Collapse
|
237
|
Bilbao A, Varesio E, Luban J, Strambio-De-Castillia C, Hopfgartner G, Müller M, Lisacek F. Processing strategies and software solutions for data-independent acquisition in mass spectrometry. Proteomics 2015; 15:964-80. [DOI: 10.1002/pmic.201400323] [Citation(s) in RCA: 119] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Revised: 10/08/2014] [Accepted: 11/24/2014] [Indexed: 11/10/2022]
Affiliation(s)
- Aivett Bilbao
- Proteome Informatics Group; SIB Swiss Institute of Bioinformatics; Geneva Switzerland
- Life Sciences Mass Spectrometry; School of Pharmaceutical Sciences; University of Geneva; University of Lausanne; Geneva Switzerland
| | - Emmanuel Varesio
- Life Sciences Mass Spectrometry; School of Pharmaceutical Sciences; University of Geneva; University of Lausanne; Geneva Switzerland
| | - Jeremy Luban
- Program in Molecular Medicine; University of Massachusetts Medical School; Worcester MA USA
| | | | - Gérard Hopfgartner
- Life Sciences Mass Spectrometry; School of Pharmaceutical Sciences; University of Geneva; University of Lausanne; Geneva Switzerland
| | - Markus Müller
- Proteome Informatics Group; SIB Swiss Institute of Bioinformatics; Geneva Switzerland
- Faculty of Sciences; University of Geneva; Geneva Switzerland
| | - Frédérique Lisacek
- Proteome Informatics Group; SIB Swiss Institute of Bioinformatics; Geneva Switzerland
- Faculty of Sciences; University of Geneva; Geneva Switzerland
| |
Collapse
|
238
|
Lima DB, de Lima TB, Balbuena TS, Neves-Ferreira AGC, Barbosa VC, Gozzo FC, Carvalho PC. SIM-XL: A powerful and user-friendly tool for peptide cross-linking analysis. J Proteomics 2015; 129:51-55. [PMID: 25638023 DOI: 10.1016/j.jprot.2015.01.013] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 01/15/2015] [Accepted: 01/21/2015] [Indexed: 12/21/2022]
Abstract
Chemical cross-linking has emerged as a powerful approach for the structural characterization of proteins and protein complexes. However, the correct identification of covalently linked (cross-linked or XL) peptides analyzed by tandem mass spectrometry is still an open challenge. Here we present SIM-XL, a software tool that can analyze data generated through commonly used cross-linkers (e.g., BS3/DSS). Our software introduces a new paradigm for search-space reduction, which ultimately accounts for its increase in speed and sensitivity. Moreover, our search engine is the first to capitalize on reporter ions for selecting tandem mass spectra derived from cross-linked peptides. It also makes available a 2D interaction map and a spectrum-annotation tool unmatched by any of its kind. We show SIM-XL to be more sensitive and faster than a competing tool when analyzing a data set obtained from the human HSP90. The software is freely available for academic use at http://patternlabforproteomics.org/sim-xl. A video demonstrating the tool is available at http://patternlabforproteomics.org/sim-xl/video. SIM-XL is the first tool to support XL data in the mzIdentML format; all data are thus available from the ProteomeXchange consortium (identifier PXD001677). This article is part of a Special Issue entitled: Computational Proteomics.
Collapse
Affiliation(s)
- Diogo B Lima
- Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil.
| | - Tatiani B de Lima
- Dalton Mass Spectrometry Laboratory, University of Campinas, São Paulo, Brazil
| | - Tiago S Balbuena
- College of Agricultural and Veterinary Sciences, State University of São Paulo, Jaboticabal, São Paulo, Brazil
| | | | - Valmir C Barbosa
- Systems Engineering and Computer Science Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fábio C Gozzo
- Dalton Mass Spectrometry Laboratory, University of Campinas, São Paulo, Brazil.
| | - Paulo C Carvalho
- Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil.
| |
Collapse
|
239
|
Gross R, Fouxon I, Lancet D, Markovitch O. Quasispecies in population of compositional assemblies. BMC Evol Biol 2014; 14:265. [PMID: 25547629 PMCID: PMC4357159 DOI: 10.1186/s12862-014-0265-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 12/11/2014] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The quasispecies model refers to information carriers that undergo self-replication with errors. A quasispecies is a steady-state population of biopolymer sequence variants generated by mutations from a master sequence. A quasispecies error threshold is a minimal replication accuracy below which the population structure breaks down. Theory and experimentation of this model often refer to biopolymers, e.g. RNA molecules or viral genomes, while its prebiotic context is often associated with an RNA world scenario. Here, we study the possibility that compositional entities which code for compositional information, intrinsically different from biopolymers coding for sequential information, could show quasispecies dynamics. RESULTS We employed a chemistry-based model, graded autocatalysis replication domain (GARD), which simulates the network dynamics within compositional molecular assemblies. In GARD, a compotype represents a population of similar assemblies that constitute a quasi-stationary state in compositional space. A compotype's center-of-mass is found to be analogous to a master sequence for a sequential quasispecies. Using single-cycle GARD dynamics, we measured the quasispecies transition matrix (Q) for the probabilities of transition from one center-of-mass Euclidean distance to another. Similarly, the quasispecies' growth rate vector (A) was obtained. This allowed computing a steady state distribution of distances to the center of mass, as derived from the quasispecies equation. In parallel, a steady state distribution was obtained via the GARD equation kinetics. Rewardingly, a significant correlation was observed between the distributions obtained by these two methods. This was only seen for distances to the compotype center-of-mass, and not to randomly selected compositions. A similar correspondence was found when comparing the quasispecies time dependent dynamics towards steady state. Further, changing the error rate by modifying basal assembly joining rate of GARD kinetics was found to display an error catastrophe, similar to the standard quasispecies model. Additional augmentation of compositional mutations leads to the complete disappearance of the master-like composition. CONCLUSIONS Our results show that compositional assemblies, as simulated by the GARD formalism, portray significant attributes of quasispecies dynamics. This expands the applicability of the quasispecies model beyond sequence-based entities, and potentially enhances validity of GARD as a model for prebiotic evolution.
Collapse
Affiliation(s)
- Renan Gross
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel.
| | - Itzhak Fouxon
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel.
| | - Doron Lancet
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel.
| | - Omer Markovitch
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel.
- Interdisciplinary Computing and Complex Bio-Systems research group, School of Computing Science, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK.
| |
Collapse
|
240
|
Bereman MS. Tools for monitoring system suitability in LC MS/MS centric proteomic experiments. Proteomics 2014; 15:891-902. [DOI: 10.1002/pmic.201400373] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 09/12/2014] [Accepted: 10/13/2014] [Indexed: 11/06/2022]
Affiliation(s)
- Michael S. Bereman
- Department of Biological Sciences, Center for Human Health and the Environment; North Carolina State University; Raleigh NC USA
| |
Collapse
|
241
|
Bonzon-Kulichenko E, Garcia-Marques F, Trevisan-Herraz M, Vázquez J. Revisiting peptide identification by high-accuracy mass spectrometry: problems associated with the use of narrow mass precursor windows. J Proteome Res 2014; 14:700-10. [PMID: 25494653 DOI: 10.1021/pr5007284] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Peptide identification is increasingly achieved through database searches in which mass precursor tolerance is set in the ppm range. This trend is driven by the high resolution and accuracy of modern mass spectrometers and the belief that the quality of peptide identification is fully controlled by estimating the false discovery rate (FDR) using the decoy-target approach. However, narrowing mass tolerance decreases the number of sequence candidates, and several authors have raised concerns that these search conditions can introduce inaccuracies. Here, we demonstrate that when scores that only depend on one sequence candidate are used, decoy-based estimates of the number of false positive identifications are accurate even with an average number of candidates of just 200, to the point that remarkably accurate FDR predictions can be made in completely different search conditions. However, when scores that are constructed taking information from additional sequence candidates are used together with low precursor mass tolerances, the proportion of peptides incorrectly identified may become significantly higher than the FDR estimated by the target-decoy approach. Our results suggest that with this kind of score the high mass accuracy of modern mass spectrometers should be exploited by using wide mass windows followed by postscoring mass filtering algorithms.
Collapse
Affiliation(s)
- Elena Bonzon-Kulichenko
- Laboratory of Cardiovascular Proteomics, Centro Nacional de Investigaciones Cardiovasculares (CNIC) , Melchor Fernández Almagro, 3, 28029 Madrid, Spain
| | | | | | | |
Collapse
|
242
|
Kucharova V, Wiker HG. Proteogenomics in microbiology: taking the right turn at the junction of genomics and proteomics. Proteomics 2014; 14:2360-675. [PMID: 25263021 DOI: 10.1002/pmic.201400168] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 08/18/2014] [Accepted: 09/23/2014] [Indexed: 12/14/2022]
Abstract
High-accuracy and high-throughput proteomic methods have completely changed the way we can identify and characterize proteins. MS-based proteomics can now provide a unique supplement to genomic data and add a new level of information to the interpretation of genomic sequences. Proteomics-driven genome annotation has become especially relevant in microbiology where genomes are sequenced on a daily basis and limitations of an in silico driven annotation process are well recognized. In this review paper, we outline different strategies on how one can design a proteogenomic experiment, for example on genome-sequenced (synonymous proteogenomics) versus unsequenced organisms (ortho-proteogenomics) or with the aid of other "omic" data such as RNA-seq. We touch upon many challenges that are encountered during a typical proteogenomic study, mostly concerning bioinformatics methods and downstream data analysis, but also related to creation and use of sequence databases. A large list of proteogenomic case studies of different microorganisms is provided to illustrate the mapping of MS/MS-derived peptide spectra to genomic DNA sequences. These investigations have led to accurate determination of translational initiation sites, pointed out eventual read-throughs or programmed frameshifts, detected signal peptide processing or other protein maturation events, removed questionable annotation assignments, and provided evidence for predicted hypothetical proteins.
Collapse
Affiliation(s)
- Veronika Kucharova
- Department of Clinical Science, The Gade Research Group for Infection and Immunity, University of Bergen, Norway
| | | |
Collapse
|
243
|
Morris JH, Knudsen GM, Verschueren E, Johnson JR, Cimermancic P, Greninger AL, Pico AR. Affinity purification-mass spectrometry and network analysis to understand protein-protein interactions. Nat Protoc 2014; 9:2539-54. [PMID: 25275790 PMCID: PMC4332878 DOI: 10.1038/nprot.2014.164] [Citation(s) in RCA: 127] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
By determining protein-protein interactions in normal, diseased and infected cells, we can improve our understanding of cellular systems and their reaction to various perturbations. In this protocol, we discuss how to use data obtained in affinity purification-mass spectrometry (AP-MS) experiments to generate meaningful interaction networks and effective figures. We begin with an overview of common epitope tagging, expression and AP practices, followed by liquid chromatography-MS (LC-MS) data collection. We then provide a detailed procedure covering a pipeline approach to (i) pre-processing the data by filtering against contaminant lists such as the Contaminant Repository for Affinity Purification (CRAPome) and normalization using the spectral index (SIN) or normalized spectral abundance factor (NSAF); (ii) scoring via methods such as MiST, SAInt and CompPASS; and (iii) testing the resulting scores. Data formats familiar to MS practitioners are then transformed to those most useful for network-based analyses. The protocol also explores methods available in Cytoscape to visualize and analyze these types of interaction data. The scoring pipeline can take anywhere from 1 d to 1 week, depending on one's familiarity with the tools and data peculiarities. Similarly, the network analysis and visualization protocol in Cytoscape takes 2-4 h to complete with the provided sample data, but we recommend taking days or even weeks to explore one's data and find the right questions.
Collapse
Affiliation(s)
- John H Morris
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California, USA
| | - Giselle M Knudsen
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California, USA
| | - Erik Verschueren
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, California, USA
| | - Jeffrey R Johnson
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, California, USA
| | - Peter Cimermancic
- 1] Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, California, USA. [2] Graduate Group in Bioinformatics, University of California, San Francisco, San Francisco, California, USA
| | - Alexander L Greninger
- School of Medicine, University of California, San Francisco, San Francisco, California, USA
| | - Alexander R Pico
- Gladstone Institutes, University of California, San Francisco, San Francisco, California, USA
| |
Collapse
|
244
|
Zhang B, Pirmoradian M, Chernobrovkin A, Zubarev RA. DeMix workflow for efficient identification of cofragmented peptides in high resolution data-dependent tandem mass spectrometry. Mol Cell Proteomics 2014; 13:3211-23. [PMID: 25100859 PMCID: PMC4223503 DOI: 10.1074/mcp.o114.038877] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 07/11/2014] [Indexed: 01/14/2023] Open
Abstract
Based on conventional data-dependent acquisition strategy of shotgun proteomics, we present a new workflow DeMix, which significantly increases the efficiency of peptide identification for in-depth shotgun analysis of complex proteomes. Capitalizing on the high resolution and mass accuracy of Orbitrap-based tandem mass spectrometry, we developed a simple deconvolution method of "cloning" chimeric tandem spectra for cofragmented peptides. Additional to a database search, a simple rescoring scheme utilizes mass accuracy and converts the unwanted cofragmenting events into a surprising advantage of multiplexing. With the combination of cloning and rescoring, we obtained on average nine peptide-spectrum matches per second on a Q-Exactive workbench, whereas the actual MS/MS acquisition rate was close to seven spectra per second. This efficiency boost to 1.24 identified peptides per MS/MS spectrum enabled analysis of over 5000 human proteins in single-dimensional LC-MS/MS shotgun experiments with an only two-hour gradient. These findings suggest a change in the dominant "one MS/MS spectrum - one peptide" paradigm for data acquisition and analysis in shotgun data-dependent proteomics. DeMix also demonstrated higher robustness than conventional approaches in terms of lower variation among the results of consecutive LC-MS/MS runs.
Collapse
Affiliation(s)
- Bo Zhang
- From the ‡Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-17177 Stockholm, Sweden
| | - Mohammad Pirmoradian
- From the ‡Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-17177 Stockholm, Sweden; §Biomotif AB, Stockholm SE-182 12, Sweden
| | - Alexey Chernobrovkin
- From the ‡Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-17177 Stockholm, Sweden
| | - Roman A Zubarev
- From the ‡Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-17177 Stockholm, Sweden;
| |
Collapse
|
245
|
Accurate assignment of significance to neuropeptide identifications using Monte Carlo k-permuted decoy databases. PLoS One 2014; 9:e111112. [PMID: 25329667 PMCID: PMC4201571 DOI: 10.1371/journal.pone.0111112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Accepted: 09/26/2014] [Indexed: 12/18/2022] Open
Abstract
In support of accurate neuropeptide identification in mass spectrometry experiments, novel Monte Carlo permutation testing was used to compute significance values. Testing was based on k-permuted decoy databases, where k denotes the number of permutations. These databases were integrated with a range of peptide identification indicators from three popular open-source database search software (OMSSA, Crux, and X! Tandem) to assess the statistical significance of neuropeptide spectra matches. Significance p-values were computed as the fraction of the sequences in the database with match indicator value better than or equal to the true target spectra. When applied to a test-bed of all known manually annotated mouse neuropeptides, permutation tests with k-permuted decoy databases identified up to 100% of the neuropeptides at p-value < 10(-5). The permutation test p-values using hyperscore (X! Tandem), E-value (OMSSA) and Sp score (Crux) match indicators outperformed all other match indicators. The robust performance to detect peptides of the intuitive indicator "number of matched ions between the experimental and theoretical spectra" highlights the importance of considering this indicator when the p-value was borderline significant. Our findings suggest permutation decoy databases of size 1×105 are adequate to accurately detect neuropeptides and this can be exploited to increase the speed of the search. The straightforward Monte Carlo permutation testing (comparable to a zero order Markov model) can be easily combined with existing peptide identification software to enable accurate and effective neuropeptide detection. The source code is available at http://stagbeetle.animal.uiuc.edu/pepshop/MSMSpermutationtesting.
Collapse
|
246
|
Penzlin A, Lindner MS, Doellinger J, Dabrowski PW, Nitsche A, Renard BY. Pipasic: similarity and expression correction for strain-level identification and quantification in metaproteomics. ACTA ACUST UNITED AC 2014; 30:i149-56. [PMID: 24931978 PMCID: PMC4058918 DOI: 10.1093/bioinformatics/btu267] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
MOTIVATION Metaproteomic analysis allows studying the interplay of organisms or functional groups and has become increasingly popular also for diagnostic purposes. However, difficulties arise owing to the high sequence similarity between related organisms. Further, the state of conservation of proteins between species can be correlated with their expression level, which can lead to significant bias in results and interpretation. These challenges are similar but not identical to the challenges arising in the analysis of metagenomic samples and require specific solutions. RESULTS We introduce Pipasic (peptide intensity-weighted proteome abundance similarity correction) as a tool that corrects identification and spectral counting-based quantification results using peptide similarity estimation and expression level weighting within a non-negative lasso framework. Pipasic has distinct advantages over approaches only regarding unique peptides or aggregating results to the lowest common ancestor, as demonstrated on examples of viral diagnostics and an acid mine drainage dataset. AVAILABILITY AND IMPLEMENTATION Pipasic source code is freely available from https://sourceforge.net/projects/pipasic/. CONTACT RenardB@rki.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anke Penzlin
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| | - Martin S Lindner
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| | - Joerg Doellinger
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, GermanyResearch Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| | - Piotr Wojtek Dabrowski
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, GermanyResearch Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| | - Andreas Nitsche
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Centre for Biological Threats and Special Pathogens 1 (ZBS 1), Centre for Biological Threats and Special Pathogens 6 (ZBS 6) and Central Administration 4 (IT), Robert Koch Institute, 13353 Berlin, Germany
| |
Collapse
|
247
|
Wang J, Bourne PE, Bandeira N. MixGF: spectral probabilities for mixture spectra from more than one peptide. Mol Cell Proteomics 2014; 13:3688-97. [PMID: 25225354 DOI: 10.1074/mcp.o113.037218] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30-390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra.
Collapse
Affiliation(s)
- Jian Wang
- From the ‡Bioinformatics Program, University of California, San Diego, La Jolla, California
| | - Philip E Bourne
- §Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California
| | - Nuno Bandeira
- §Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California; ¶Center for Computational Mass Spectrometry, University of California, San Diego, La, Jolla, California; ‖Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92092
| |
Collapse
|
248
|
Chocu S, Evrard B, Lavigne R, Rolland AD, Aubry F, Jégou B, Chalmel F, Pineau C. Forty-four novel protein-coding loci discovered using a proteomics informed by transcriptomics (PIT) approach in rat male germ cells. Biol Reprod 2014; 91:123. [PMID: 25210130 DOI: 10.1095/biolreprod.114.122416] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Spermatogenesis is a complex process, dependent upon the successive activation and/or repression of thousands of gene products, and ends with the production of haploid male gametes. RNA sequencing of male germ cells in the rat identified thousands of novel testicular unannotated transcripts (TUTs). Although such RNAs are usually annotated as long noncoding RNAs (lncRNAs), it is possible that some of these TUTs code for protein. To test this possibility, we used a "proteomics informed by transcriptomics" (PIT) strategy combining RNA sequencing data with shotgun proteomics analyses of spermatocytes and spermatids in the rat. Among 3559 TUTs and 506 lncRNAs found in meiotic and postmeiotic germ cells, 44 encoded at least one peptide. We showed that these novel high-confidence protein-coding loci exhibit several genomic features intermediate between those of lncRNAs and mRNAs. We experimentally validated the testicular expression pattern of two of these novel protein-coding gene candidates, both highly conserved in mammals: one for a vesicle-associated membrane protein we named VAMP-9, and the other for an enolase domain-containing protein. This study confirms the potential of PIT approaches for the discovery of protein-coding transcripts initially thought to be untranslated or unknown transcripts. Our results contribute to the understanding of spermatogenesis by characterizing two novel proteins, implicated by their strong expression in germ cells. The mass spectrometry proteomics data have been deposited with the ProteomeXchange Consortium under the data set identifier PXD000872.
Collapse
Affiliation(s)
- Sophie Chocu
- Proteomics Core Facility Biogenouest, Inserm U1085, IRSET, Campus de Beaulieu, Rennes, France Inserm U1085, IRSET, Université de Rennes 1, Rennes, France
| | | | - Régis Lavigne
- Proteomics Core Facility Biogenouest, Inserm U1085, IRSET, Campus de Beaulieu, Rennes, France Inserm U1085, IRSET, Université de Rennes 1, Rennes, France
| | | | - Florence Aubry
- Inserm U1085, IRSET, Université de Rennes 1, Rennes, France
| | - Bernard Jégou
- Inserm U1085, IRSET, Université de Rennes 1, Rennes, France
| | | | - Charles Pineau
- Proteomics Core Facility Biogenouest, Inserm U1085, IRSET, Campus de Beaulieu, Rennes, France Inserm U1085, IRSET, Université de Rennes 1, Rennes, France
| |
Collapse
|
249
|
Zhu Z, Su X, Go EP, Desaire H. New glycoproteomics software, GlycoPep Evaluator, generates decoy glycopeptides de novo and enables accurate false discovery rate analysis for small data sets. Anal Chem 2014; 86:9212-9. [PMID: 25137014 PMCID: PMC4165450 DOI: 10.1021/ac502176n] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
![]()
Glycoproteins
are biologically significant large molecules that
participate in numerous cellular activities. In order to obtain site-specific
protein glycosylation information, intact glycopeptides, with the
glycan attached to the peptide sequence, are characterized by tandem
mass spectrometry (MS/MS) methods such as collision-induced dissociation
(CID) and electron transfer dissociation (ETD). While several emerging
automated tools are developed, no consensus is present in the field
about the best way to determine the reliability of the tools and/or
provide the false discovery rate (FDR). A common approach to calculate
FDRs for glycopeptide analysis, adopted from the target-decoy strategy
in proteomics, employs a decoy database that is created based on the
target protein sequence database. Nonetheless, this approach is not
optimal in measuring the confidence of N-linked glycopeptide
matches, because the glycopeptide data set is considerably smaller
compared to that of peptides, and the requirement of a consensus sequence
for N-glycosylation further limits the number of
possible decoy glycopeptides tested in a database search. To address
the need to accurately determine FDRs for automated glycopeptide assignments,
we developed GlycoPep Evaluator (GPE), a tool that helps to measure
FDRs in identifying glycopeptides without using a decoy database.
GPE generates decoy glycopeptides de novo for every target glycopeptide,
in a 1:20 target-to-decoy ratio. The decoys, along with target glycopeptides,
are scored against the ETD data, from which FDRs can be calculated
accurately based on the number of decoy matches and the ratio of the
number of targets to decoys, for small data sets. GPE is freely accessible
for download and can work with any search engine that interprets ETD
data of N-linked glycopeptides. The software is provided
at https://desairegroup.ku.edu/research.
Collapse
Affiliation(s)
- Zhikai Zhu
- Department of Chemistry, University of Kansas , Lawrence, Kansas 66047, United States
| | | | | | | |
Collapse
|
250
|
Nardiello D, Conte A, Natale A, Lucera A, Palermo C, Centonze D, Del Nobile M. Effects of different packaging systems on microbiological, sensory and peptide profile in fiordilatte cheese. Food Res Int 2014. [DOI: 10.1016/j.foodres.2014.03.053] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|