1
|
Prieto G, Vázquez J. Protein Probability Model for High-Throughput Protein Identification by Mass Spectrometry-Based Proteomics. J Proteome Res 2020; 19:1285-1297. [PMID: 32037837 DOI: 10.1021/acs.jproteome.9b00819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Shotgun proteomics is the method of choice for high-throughput protein identification; however, robust statistical methods are essential to automatize this task while minimizing the number of false identifications. The standard method for estimating the false discovery rate (FDR) of individual identifications and keeping it below a threshold (typically 1%) is the target-decoy approach. However, numerous works have shown that FDR at the protein level may become much larger than FDR at the peptide level. The development of an appropriate scoring model to identify proteins from their peptides using high-throughput shotgun proteomics is highly needed. In this study, we present a novel protein-level scoring algorithm that uses the scores of the identified peptides and maintains all of the properties expected for a true protein probability. We also present a refinement of the picked method to calculate FDR at the protein level. These algorithms can be used together as a robust identification workflow suitable for large-scale proteomics, and we show that the identification performance of this workflow is superior to that of other widely used methods in several samples and using different search engines. Our protein probability model offers the scientific community an algorithm that is easy to integrate into protein identification workflows for the automated analysis of shotgun proteomics data.
Collapse
Affiliation(s)
- Gorka Prieto
- Department of Communications Engineering, University of the Basque Country (UPV/EHU), 48013 Bilbao, Spain
| | - Jesús Vázquez
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28049 Madrid, Spain
| |
Collapse
|
2
|
Jurick WM, Peng H, Beard HS, Garrett WM, Lichtner FJ, Luciano-Rosario D, Macarisin O, Liu Y, Peter KA, Gaskins VL, Yang T, Mowery J, Bauchan G, Keller NP, Cooper B. Blistering1 Modulates Penicillium expansum Virulence Via Vesicle-mediated Protein Secretion. Mol Cell Proteomics 2020; 19:344-361. [PMID: 31871254 PMCID: PMC7000123 DOI: 10.1074/mcp.ra119.001831] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 11/15/2019] [Indexed: 11/06/2022] Open
Abstract
The blue mold fungus, Penicillium expansum, is a postharvest apple pathogen that contributes to food waste by rotting fruit and by producing harmful mycotoxins (e.g. patulin). To identify genes controlling pathogen virulence, a random T-DNA insertional library was created from wild-type P. expansum strain R19. One transformant, T625, had reduced virulence in apples, blistered mycelial hyphae, and a T-DNA insertion that abolished transcription of the single copy locus in which it was inserted. The gene, Blistering1, encodes a protein with a DnaJ domain, but otherwise has little homology outside the Aspergillaceae, a family of fungi known for producing antibiotics, mycotoxins, and cheese. Because protein secretion is critical for these processes and for host infection, mass spectrometry was used to monitor proteins secreted into liquid media during fungal growth. T625 failed to secrete a set of enzymes that degrade plant cell walls, along with ones that synthesize the three final biosynthetic steps of patulin. Consequently, the culture broth of T625 had significantly reduced capacity to degrade apple tissue and contained 30 times less patulin. Quantitative mass spectrometry of 3,282 mycelial proteins revealed that T625 had altered cellular networks controlling protein processing in the endoplasmic reticulum, protein export, vesicle-mediated transport, and endocytosis. T625 also had reduced proteins controlling mRNA surveillance and RNA processing. Transmission electron microscopy of hyphal cross sections confirmed that T625 formed abnormally enlarged endosomes or vacuoles. These data reveal that Blistering1 affects internal and external protein processing involving vesicle-mediated transport in a family of fungi with medical, commercial, and agricultural importance.
Collapse
Affiliation(s)
- Wayne M Jurick
- USDA-ARS, Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland.
| | - Hui Peng
- USDA-ARS, Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland
| | - Hunter S Beard
- USDA-ARS, Soybean Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland
| | - Wesley M Garrett
- USDA-ARS, Animal Biosciences and Biotechnology Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland
| | - Franz J Lichtner
- USDA-ARS, Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland; Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee 37830
| | - Dianiris Luciano-Rosario
- University of Wisconsin, Department of Medical Microbiology and Immunology and Bacteriology, Madison, Wisconsin
| | - Otilia Macarisin
- USDA-ARS, Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland
| | - Yingjian Liu
- USDA-ARS, Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland
| | - Kari A Peter
- Penn State University, Department of Plant Pathology and Environmental Microbiology, Fruit Research and Extension Center, Biglerville, Pennsylvania
| | - Verneta L Gaskins
- USDA-ARS, Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland
| | - Tianbao Yang
- USDA-ARS, Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland
| | - Joseph Mowery
- USDA-ARS, Soybean Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland
| | - Gary Bauchan
- USDA-ARS, Soybean Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland
| | - Nancy P Keller
- University of Wisconsin, Department of Medical Microbiology and Immunology and Bacteriology, Madison, Wisconsin
| | - Bret Cooper
- USDA-ARS, Soybean Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Beltsville, Maryland
| |
Collapse
|
3
|
Abstract
In proteomics, identification of proteins from complex mixtures of proteins extracted from biological samples is an important problem. Among the experimental technologies, Mass-Spectrometry (MS) is the most popular one. Protein identification from MS data typically relies on a "two-step" procedure of identifying the peptide first followed by the separate protein identification procedure next. In this setup, the interdependence of peptides and proteins are neglected resulting in relatively inaccurate protein identification. In this article, we propose a Markov chain Monte Carlo (MCMC) based Bayesian hierarchical model, a first of its kind in protein identification, which integrates the two steps and performs joint analysis of proteins and peptides using posterior probabilities. We remove the assumption of independence of proteins by using clustering group priors to the proteins based on the assumption that proteins sharing the same biological pathway are likely to be present or absent together and are correlated. The complete conditionals of the proposed joint model being tractable, we propose and implement a Gibbs sampling scheme for full posterior inference that provides the estimation and statistical uncertainties of all relevant parameters. The model has better operational characteristics compared to two existing "one-step" procedures on a range of simulation settings as well as on two well-studied datasets.
Collapse
Affiliation(s)
- Riten Mitra
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202
| | - Ryan Gill
- Department of Mathematics, University of Louisville, Louisville, KY 40292
| | - Sinjini Sikdar
- Department of Biostatistics, University of Florida, Gainesville, FL 32611
| | - Susmita Datta
- Department of Biostatistics, University of Florida, Gainesville, FL 32611
| |
Collapse
|
4
|
Zhong J, Wang J, Ding X, Zhang Z, Li M, Wu FX, Pan Y. Protein Inference from the Integration of Tandem MS Data and Interactome Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1399-1409. [PMID: 28113634 DOI: 10.1109/tcbb.2016.2601618] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.
Collapse
|
5
|
Different Cellular Origins and Functions of Extracellular Proteins from Escherichia coli O157:H7 and O104:H4 as Determined by Comparative Proteomic Analysis. Appl Environ Microbiol 2016; 82:4371-4378. [PMID: 27208096 DOI: 10.1128/aem.00977-16] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 05/04/2016] [Indexed: 12/20/2022] Open
Abstract
UNLABELLED Extracellular proteins play important roles in bacterial interactions with the environmental matrices. In this study, we examined the extracellular proteins from Escherichia coli O157:H7 and O104:H4 by tandem mass spectrometry. We identified 500 and 859 proteins from the growth media of E. coli O157:H7 and O104:H4, respectively, including 371 proteins common to both strains. Among proteins that were considered specific to E. coli O157:H7 or present at higher relative abundances in O157:H7 medium, most (57 of 65) had secretion signal sequences in their encoding genes. Noticeably, the proteins included locus of enterocyte effacement (LEE) virulence factors, proteins required for peptidyl-lipoprotein accumulation, and proteins involved in iron scavenging. In contrast, a much smaller proportion of proteins (37 of 150) that were considered specific to O104:H4 or presented at higher relative abundances in O104:H4 medium had signals targeting them for secretion. These proteins included Shiga toxin 2 subunit B and O104:H4 signature proteins, including AAF/1 major fimbrial subunit and serine protease autotransporters. Most of the abundant proteins from the growth medium of E. coli O104:H4 were annotated as having functions in the cytoplasm. We provide evidence that the extensive presence of cytoplasmic proteins in E. coli O104:H4 growth medium was due to biological processes independent of cell lysis, indicating alternative mechanisms for this potent pathogen releasing cytoplasmic contents into the growth milieu, which could play a role in interaction with the environmental matrices, such as pathogenesis and biofilm formation. IMPORTANCE In this study, we compared the extracellular proteins from two of the most prominent foodborne pathogenic E. coli organisms that have caused severe outbreaks in the United States and in Europe. E. coli O157:H7 is a well-studied Shiga toxigenic foodborne pathogen of the enterohemorrhagic pathotype that has caused numerous outbreaks associated with various contaminated foods worldwide. E. coli O104:H4 is a newly emerged Shiga toxigenic foodborne pathogen of the enteroaggregative pathotype that gained notoriety for causing one of the most deadly foodborne outbreaks in Europe in 2011. Comparison of proteins in the growth medium revealed significant differences in the compositions of the extracellular proteins for these two pathogens. These differences may provide valuable information regarding the cellular responses of these pathogens to their environment, including cell survival and pathogenesis.
Collapse
|
6
|
Cooper B, Campbell KB, Beard HS, Garrett WM, Islam N. Putative Rust Fungal Effector Proteins in Infected Bean and Soybean Leaves. PHYTOPATHOLOGY 2016; 106:491-9. [PMID: 26780434 DOI: 10.1094/phyto-11-15-0310-r] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The plant-pathogenic fungi Uromyces appendiculatus and Phakopsora pachyrhizi cause debilitating rust diseases on common bean and soybean. These rust fungi secrete effector proteins that allow them to infect plants, but their effector repertoires are not understood. The discovery of rust fungus effectors may eventually help guide decisions and actions that mitigate crop production loss. Therefore, we used mass spectrometry to identify thousands of proteins in infected beans and soybeans and in germinated fungal spores. The comparative analysis between the two helped differentiate a set of 24 U. appendiculatus proteins targeted for secretion that were specifically found in infected beans and a set of 34 U. appendiculatus proteins targeted for secretion that were found in germinated spores and infected beans. The proteins specific to infected beans included family 26 and family 76 glycoside hydrolases that may contribute to degrading plant cell walls. There were also several types of proteins with structural motifs that may aid in stabilizing the specialized fungal haustorium cell that interfaces the plant cell membrane during infection. There were 16 P. pachyrhizi proteins targeted for secretion that were found in infected soybeans, and many of these proteins resembled the U. appendiculatus proteins found in infected beans, which implies that these proteins are important to rust fungal pathology in general. This data set provides insight to the biochemical mechanisms that rust fungi use to overcome plant immune systems and to parasitize cells.
Collapse
Affiliation(s)
- Bret Cooper
- First, second, and third authors: Soybean Genomics and Improvement Laboratory, U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Beltsville, MD 20705; fourth author: Animal Biosciences and Biotechnology Laboratory, USDA-ARS, Beltsville, MD 20705; and fifth author: Department of Nutrition and Food Science, University of Maryland, College Park 20742
| | - Kimberly B Campbell
- First, second, and third authors: Soybean Genomics and Improvement Laboratory, U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Beltsville, MD 20705; fourth author: Animal Biosciences and Biotechnology Laboratory, USDA-ARS, Beltsville, MD 20705; and fifth author: Department of Nutrition and Food Science, University of Maryland, College Park 20742
| | - Hunter S Beard
- First, second, and third authors: Soybean Genomics and Improvement Laboratory, U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Beltsville, MD 20705; fourth author: Animal Biosciences and Biotechnology Laboratory, USDA-ARS, Beltsville, MD 20705; and fifth author: Department of Nutrition and Food Science, University of Maryland, College Park 20742
| | - Wesley M Garrett
- First, second, and third authors: Soybean Genomics and Improvement Laboratory, U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Beltsville, MD 20705; fourth author: Animal Biosciences and Biotechnology Laboratory, USDA-ARS, Beltsville, MD 20705; and fifth author: Department of Nutrition and Food Science, University of Maryland, College Park 20742
| | - Nazrul Islam
- First, second, and third authors: Soybean Genomics and Improvement Laboratory, U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), Beltsville, MD 20705; fourth author: Animal Biosciences and Biotechnology Laboratory, USDA-ARS, Beltsville, MD 20705; and fifth author: Department of Nutrition and Food Science, University of Maryland, College Park 20742
| |
Collapse
|
7
|
Alves G, Wang G, Ogurtsov AY, Drake SK, Gucek M, Suffredini AF, Sacks DB, Yu YK. Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2016; 27:194-210. [PMID: 26510657 PMCID: PMC4723618 DOI: 10.1007/s13361-015-1271-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Revised: 09/04/2015] [Accepted: 09/05/2015] [Indexed: 05/13/2023]
Abstract
Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple 'fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Guanghui Wang
- Proteomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Aleksey Y Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Steven K Drake
- Critical Care Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Marjan Gucek
- Proteomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anthony F Suffredini
- Critical Care Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David B Sacks
- Department of Laboratory Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
8
|
Zhang Y, Xu T, Shan B, Hart J, Aslanian A, Han X, Zong N, Li H, Choi H, Wang D, Acharya L, Du L, Vogt PK, Ping P, Yates JR. ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects. J Proteomics 2015. [PMID: 26196237 DOI: 10.1016/j.jprot.2015.07.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Shotgun proteomics generates valuable information from large-scale and target protein characterizations, including protein expression, protein quantification, protein post-translational modifications (PTMs), protein localization, and protein-protein interactions. Typically, peptides derived from proteolytic digestion, rather than intact proteins, are analyzed by mass spectrometers because peptides are more readily separated, ionized and fragmented. The amino acid sequences of peptides can be interpreted by matching the observed tandem mass spectra to theoretical spectra derived from a protein sequence database. Identified peptides serve as surrogates for their proteins and are often used to establish what proteins were present in the original mixture and to quantify protein abundance. Two major issues exist for assigning peptides to their originating protein. The first issue is maintaining a desired false discovery rate (FDR) when comparing or combining multiple large datasets generated by shotgun analysis and the second issue is properly assigning peptides to proteins when homologous proteins are present in the database. Herein we demonstrate a new computational tool, ProteinInferencer, which can be used for protein inference with both small- or large-scale data sets to produce a well-controlled protein FDR. In addition, ProteinInferencer introduces confidence scoring for individual proteins, which makes protein identifications evaluable. This article is part of a Special Issue entitled: Computational Proteomics.
Collapse
Affiliation(s)
- Yaoyang Zhang
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA; Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, China.
| | - Tao Xu
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA; Dow AgroSciences LLC, Indianapolis, IN 46268, USA.
| | - Bing Shan
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA; Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, China.
| | - Jonathan Hart
- Department of Molecular & Experimental Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA.
| | - Aaron Aslanian
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA.
| | - Xuemei Han
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA.
| | - Nobel Zong
- NHLBI Proteomics Center at UCLA, Departments of Physiology and Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA.
| | - Haomin Li
- NHLBI Proteomics Center at UCLA, Departments of Physiology and Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA.
| | - Howard Choi
- NHLBI Proteomics Center at UCLA, Departments of Physiology and Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA.
| | - Dong Wang
- Vanderbilt University Medical Center, Nashville, TN 37232, USA.
| | - Lipi Acharya
- Dow AgroSciences LLC, Indianapolis, IN 46268, USA.
| | - Lisa Du
- Department of Molecular & Experimental Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA.
| | - Peter K Vogt
- Department of Molecular & Experimental Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA.
| | - Peipei Ping
- NHLBI Proteomics Center at UCLA, Departments of Physiology and Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA.
| | - John R Yates
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA.
| |
Collapse
|
9
|
Sikdar S, Gill R, Datta S. Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Brief Bioinform 2015; 17:262-9. [PMID: 26141827 DOI: 10.1093/bib/bbv043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Many approaches have been proposed for the protein identification problem based on tandem mass spectrometry (MS/MS) data. In these experiments, proteins are digested into peptides and the resulting peptide mixture is subjected to mass spectrometry. Some interesting putative peptide features (peaks) are selected from the mass spectra. Following that, the precursor ions undergo fragmentation and are analyzed by MS/MS. The process of identification of peptides from the mass spectra and the constituent proteins in the sample is called protein identification from MS/MS data. There are many two-step protein identification procedures, reviewed in the literature, which first attempt to identify the peptides in a separate process and then use these results to infer the proteins. However, in recent years, there have been attempts to provide a one-step solution to protein identification, which simultaneously identifies the proteins and the peptides in the sample. RESULTS In this review, we briefly introduce the most popular two-step protein identification procedure, PeptideProphet coupled with ProteinProphet. Following that, we describe the difficulties with two-step procedures and review some recently introduced one-step protein/peptide identification procedures that do not suffer from these issues. The focus of this review is on one-step procedures that are based on statistical likelihood-based models, but some discussion of other one-step procedures is also included. We report comparative performances of one-step and two-step methods, which support the overall superiorities of one-step procedures. We also cover some recent efforts to improve protein identification by incorporating other molecular data along with MS/MS data.
Collapse
|
10
|
Islam N, Li G, Garrett WM, Lin R, Sriram G, Cooper B, Coleman GD. Proteomics of Nitrogen Remobilization in Poplar Bark. J Proteome Res 2014; 14:1112-26. [DOI: 10.1021/pr501090p] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Nazrul Islam
- Department
of Plant Sciences and Landscape Architecture, University of Maryland, College
Park, Maryland 20742, United States
| | - Gen Li
- Department
of Plant Sciences and Landscape Architecture, University of Maryland, College
Park, Maryland 20742, United States
| | - Wesley M. Garrett
- Animal
Biosciences and Biotechnology Laboratory, USDA-ARS, Beltsville, Maryland 20705, United States
| | - Rongshuang Lin
- Department
of Plant Sciences and Landscape Architecture, University of Maryland, College
Park, Maryland 20742, United States
| | - Ganesh Sriram
- Department
of Chemical and Biomolecular Engineering, University of Maryland, College
Park, Maryland 20742, United States
| | - Bret Cooper
- Soybean
Genomics and Improvement Laboratory, USDA-ARS, Beltsville, Maryland 20705, United States
| | - Gary D. Coleman
- Department
of Plant Sciences and Landscape Architecture, University of Maryland, College
Park, Maryland 20742, United States
| |
Collapse
|
11
|
Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014; 14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]
Abstract
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.
Collapse
Affiliation(s)
- Pieter Kelchtermans
- Department of Medical Protein Research, VIB, Ghent, Belgium; Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang, Mol, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Zhang Y, Fonslow BR, Shan B, Baek MC, Yates JR. Protein analysis by shotgun/bottom-up proteomics. Chem Rev 2013; 113:2343-94. [PMID: 23438204 PMCID: PMC3751594 DOI: 10.1021/cr3003533] [Citation(s) in RCA: 970] [Impact Index Per Article: 88.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Yaoyang Zhang
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bryan R. Fonslow
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bing Shan
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Moon-Chang Baek
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Department of Molecular Medicine, Cell and Matrix Biology Research Institute, School of Medicine, Kyungpook National University, Daegu 700-422, Republic of Korea
| | - John R. Yates
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
13
|
Huang T, Gong H, Yang C, He Z. ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics. Comput Biol Chem 2013; 43:46-54. [PMID: 23385215 DOI: 10.1016/j.compbiolchem.2012.12.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2012] [Revised: 12/30/2012] [Accepted: 12/30/2012] [Indexed: 11/28/2022]
Abstract
Protein inference is an important issue in proteomics research. Its main objective is to select a proper subset of candidate proteins that best explain the observed peptides. Although many methods have been proposed for solving this problem, several issues such as peptide degeneracy and one-hit wonders still remain unsolved. Therefore, the accurate identification of proteins that are truly present in the sample continues to be a challenging task. Based on the concept of peptide detectability, we formulate the protein inference problem as a constrained Lasso regression problem, which can be solved very efficiently through a coordinate descent procedure. The new inference algorithm is named as ProteinLasso, which explores an ensemble learning strategy to address the sparsity parameter selection problem in Lasso model. We test the performance of ProteinLasso on three datasets. As shown in the experimental results, ProteinLasso outperforms those state-of-the-art protein inference algorithms in terms of both identification accuracy and running efficiency. In addition, we show that ProteinLasso is stable under different parameter specifications. The source code of our algorithm is available at: http://sourceforge.net/projects/proteinlasso.
Collapse
Affiliation(s)
- Ting Huang
- School of Software, Dalian University of Technology, China
| | | | | | | |
Collapse
|
14
|
Abstract
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.
Collapse
Affiliation(s)
- Yong Fuga Li
- School of Informatics and Computing, Indiana University, Bloomington 150 S, Woodlawn Avenue, Bloomington, Indiana 47405, USA
| | | |
Collapse
|
15
|
Cooper B. The problem with peptide presumption and the downfall of target-decoy false discovery rates. Anal Chem 2012; 84:9663-7. [PMID: 23106481 DOI: 10.1021/ac303051s] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In proteomics, peptide-tandem mass spectrum match scores and target-decoy database derived false discovery rates (FDR) are confidence indicators describing the quality of individual and sets of tandem mass spectrum matches. A user can impose a standard by prescribing a limit to these values, equivalent to drawing a line that separates better from poorer quality matches. As a result of setting narrower parent ion mass tolerances to reflect the better resolution of modern mass spectrometers, target-decoy derived FDRs can diminish. FDRs lowered this way consequently drive down the lower-limit for peptide-spectrum match score acceptance. Hence, data quality confidence appears to improve even while fragmentation evidence for some spectra remains weak. One negative outcome can be the presumed identification of peptides that do not exist. The options researchers have to improve proteomics data confidence are not panaceas, and there may be no satisfying solution as long as peptides are identified from a circumscribed list of proteins scientists wish to find.
Collapse
|
16
|
Huang T, He Z. A linear programming model for protein inference problem in shotgun proteomics. ACTA ACUST UNITED AC 2012; 28:2956-62. [PMID: 22954624 DOI: 10.1093/bioinformatics/bts540] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
MOTIVATION Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. RESULTS In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. AVAILABILITY The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/. CONTACT zyhe@dlut.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Online.
Collapse
Affiliation(s)
- Ting Huang
- School of Software, Dalian University of Technology, Dalian 116621, China
| | | |
Collapse
|
17
|
|
18
|
Cooper B, Chen R, Garrett WM, Murphy C, Chang C, Tucker ML, Bhagwat AA. Proteomic Pleiotropy of OpgGH, an Operon Necessary for Efficient Growth of Salmonella enterica serovar Typhimurium under Low-Osmotic Conditions. J Proteome Res 2012; 11:1720-7. [DOI: 10.1021/pr200933d] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
| | - Ruiqiang Chen
- Department of Cell Biology and
Molecular Genetics, University of Maryland, College Park, Maryland 20742, United States
| | | | | | - Caren Chang
- Department of Cell Biology and
Molecular Genetics, University of Maryland, College Park, Maryland 20742, United States
| | | | | |
Collapse
|
19
|
Bern MW, Kil YJ. Two-dimensional target decoy strategy for shotgun proteomics. J Proteome Res 2011; 10:5296-301. [PMID: 22010998 DOI: 10.1021/pr200780j] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The target-decoy approach to estimating and controlling false discovery rate (FDR) has become a de facto standard in shotgun proteomics, and it has been applied at both the peptide-to-spectrum match (PSM) and protein levels. Current bioinformatics methods control either the PSM- or the protein-level FDR, but not both. In order to obtain the most reliable information from their data, users must employ one method when the number of tandem mass spectra exceeds the number of proteins in the database and another method when the reverse is true. Here we propose a simple variation of the standard target-decoy strategy that estimates and controls PSM and protein FDRs simultaneously, regardless of the relative numbers of spectra and proteins. We demonstrate that even if the final goal is a list of PSMs with a fixed low FDR and not a list of protein identifications, the proposed two-dimensional strategy offers advantages over a pure PSM-level strategy.
Collapse
Affiliation(s)
- Marshall W Bern
- Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, California 94304, United States.
| | | |
Collapse
|
20
|
Eng JK, Searle BC, Clauser KR, Tabb DL. A face in the crowd: recognizing peptides through database search. Mol Cell Proteomics 2011; 10:R111.009522. [PMID: 21876205 PMCID: PMC3226415 DOI: 10.1074/mcp.r111.009522] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2011] [Revised: 07/19/2011] [Indexed: 12/31/2022] Open
Abstract
Peptide identification via tandem mass spectrometry sequence database searching is a key method in the array of tools available to the proteomics researcher. The ability to rapidly and sensitively acquire tandem mass spectrometry data and perform peptide and protein identifications has become a commonly used proteomics analysis technique because of advances in both instrumentation and software. Although many different tandem mass spectrometry database search tools are currently available from both academic and commercial sources, these algorithms share similar core elements while maintaining distinctive features. This review revisits the mechanism of sequence database searching and discusses how various parameter settings impact the underlying search.
Collapse
Affiliation(s)
- Jimmy K Eng
- University of Washington, Department of Genome Sciences, Seattle, WA 98195, USA.
| | | | | | | |
Collapse
|
21
|
Lee J, Koh HJ. A label-free quantitative shotgun proteomics analysis of rice grain development. Proteome Sci 2011; 9:61. [PMID: 21957990 PMCID: PMC3190340 DOI: 10.1186/1477-5956-9-61] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Accepted: 09/30/2011] [Indexed: 11/25/2022] Open
Abstract
Background Although a great deal of rice proteomic research has been conducted, there are relatively few studies specifically addressing the rice grain proteome. The existing rice grain proteomic researches have focused on the identification of differentially expressed proteins or monitoring protein expression patterns during grain filling stages. Results Proteins were extracted from rice grains 10, 20, and 30 days after flowering, as well as from fully mature grains. By merging all of the identified proteins in this study, we identified 4,172 non-redundant proteins with a wide range of molecular weights (from 5.2 kDa to 611 kDa) and pI values (from pH 2.9 to pH 12.6). A Genome Ontology category enrichment analysis for the 4,172 proteins revealed that 52 categories were enriched, including the carbohydrate metabolic process, transport, localization, lipid metabolic process, and secondary metabolic process. The relative abundances of the 1,784 reproducibly identified proteins were compared to detect 484 differentially expressed proteins during rice grain development. Clustering analysis and Genome Ontology category enrichment analysis revealed that proteins involved in the metabolic process were enriched through all stages of development, suggesting that proteome changes occurred even in the desiccation phase. Interestingly, enrichments of proteins involved in protein folding were detected in the desiccation phase and in fully mature grain. Conclusion This is the first report conducting comprehensive identification of rice grain proteins. With a label free shotgun proteomic approach, we identified large number of rice grain proteins and compared the expression patterns of reproducibly identified proteins during rice grain development. Clustering analysis, Genome Ontology category enrichment analysis, and the analysis of composite expression profiles revealed dynamic changes of metabolisms during rice grain development. Interestingly, we detected that proteins involved in glycolysis, TCA-cycle, lipid metabolism, and proteolysis accumulated at higher levels in fully mature grain compared to grain developing stages, suggesting that the accumulation of these proteins during the desiccation stage may be associated with the preparation of proteins required in germination.
Collapse
Affiliation(s)
- Joohyun Lee
- Department of Plant Science, Plant Genomics and Breeding Institute, and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul 151-742, Korea.
| | | |
Collapse
|
22
|
Chen R, Binder BM, Garrett WM, Tucker ML, Chang C, Cooper B. Proteomic responses in Arabidopsis thaliana seedlings treated with ethylene. MOLECULAR BIOSYSTEMS 2011; 7:2637-50. [PMID: 21713283 DOI: 10.1039/c1mb05159h] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Ethylene (ET) is a volatile hormone that modulates fruit ripening, plant growth, development and stress responses. Key components of the ET-signaling pathway identified by genetic dissection in Arabidopsis thaliana include five ET receptors, the negative regulator CTR1 and the positive regulator EIN2, all of which localize to the endoplasmic reticulum. Mechanisms of signaling among these proteins are still unresolved and targets of ET responses are not fully known. So, we used mass spectrometry to identify proteins in microsomal membrane preparations from etiolated A. thaliana seedlings maintained in ambient air or treated with ET for 3 h. We compared 3814 proteins from ET-exposed seedlings and controls and identified 304 proteins with significant accumulation changes. The proteins with increased accumulation were involved in ET biosynthesis, cell morphogenesis, oxidative stress and vesicle secretion while those with decreased accumulation were ribosomal proteins and proteins positively regulated by brassinosteroid, another hormone involved in cell elongation. Several proteins, including EIN2, appeared to be differentially phosphorylated upon ET treatment, which suggests that the activity or stability of these proteins may be controlled by phosphorylation. TUA3, a component of microtubules that contributes to cellular morphological change, exhibited both increased accumulation and differential phosphorylation upon ET treatment. To verify the role of TUA3 in the ET response, tua3 mutants were evaluated. Mutant seedlings had altered ET-associated growth movements. The data indicate that ET perception leads to rapid proteomic change and that these changes are an important part of signaling and development. The data serve as a foundation for exploring ET signaling through systems biology.
Collapse
Affiliation(s)
- Ruiqiang Chen
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | | | | | | | | | | |
Collapse
|
23
|
Meyer-Arendt K, Old WM, Houel S, Renganathan K, Eichelberger B, Resing KA, Ahn NG. IsoformResolver: A peptide-centric algorithm for protein inference. J Proteome Res 2011; 10:3060-75. [PMID: 21599010 PMCID: PMC3167374 DOI: 10.1021/pr200039p] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
When analyzing proteins in complex samples using tandem mass spectrometry of peptides generated by proteolysis, the inference of proteins can be ambiguous, even with well-validated peptides. Unresolved questions include whether to show all possible proteins vs a minimal list, what to do when proteins are inferred ambiguously, and how to quantify peptides that bridge multiple proteins, each with distinguishing evidence. Here we describe IsoformResolver, a peptide-centric protein inference algorithm that clusters proteins in two ways, one based on peptides experimentally identified from MS/MS spectra, and the other based on peptides derived from an in silico digest of the protein database. MS/MS-derived protein groups report minimal list proteins in the context of all possible proteins, without redundantly listing peptides. In silico-derived protein groups pull together functionally related proteins, providing stable identifiers. The peptide-centric grouping strategy used by IsoformResolver allows proteins to be displayed together when they share peptides in common, providing a comprehensive yet concise way to organize protein profiles. It also summarizes information on spectral counts and is especially useful for comparing results from multiple LC–MS/MS experiments. Finally, we examine the relatedness of proteins within IsoformResolver groups and compare its performance to other protein inference software. IsoformResolver addresses problems in protein inference using a peptide-centric protein inference strategy. Inferred proteins are reported in the context of two types of protein groups, based on peptides observed from MS/MS spectra, and from an in silico digest of the protein database. This allows for complete and concise output, without replicated peptides, and counteracting volatility caused by protein inference. IsoformResolver algorithms and compare profile output are presented.
Collapse
Affiliation(s)
- Karen Meyer-Arendt
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 80309-0215, USA
| | | | | | | | | | | | | |
Collapse
|
24
|
Koskinen VR, Emery PA, Creasy DM, Cottrell JS. Hierarchical clustering of shotgun proteomics data. Mol Cell Proteomics 2011; 10:M110.003822. [PMID: 21447708 DOI: 10.1074/mcp.m110.003822] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
A new result report for Mascot search results is described. A greedy set cover algorithm is used to create a minimal set of proteins, which is then grouped into families on the basis of shared peptide matches. Protein families with multiple members are represented by dendrograms, generated by hierarchical clustering using the score of the nonshared peptide matches as a distance metric. The peptide matches to the proteins in a family can be compared side by side to assess the experimental evidence for each protein. If the evidence for a particular family member is considered inadequate, the dendrogram can be cut to reduce the number of distinct family members.
Collapse
|
25
|
Cooper B, Campbell KB, Feng J, Garrett WM, Frederick R. Nuclear proteomic changes linked to soybean rust resistance. MOLECULAR BIOSYSTEMS 2011; 7:773-83. [PMID: 21132161 DOI: 10.1039/c0mb00171f] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2023]
Abstract
Soybean rust, caused by the fungus Phakopsora pachyrhizi, is an emerging threat to the US soybean crop. In an effort to identify proteins that contribute to disease resistance in soybean we compared a susceptible Williams 82 cultivar to a resistant Williams 82 inbred isoline harboring the Rpp1 resistance gene (R-gene). Approximately 4975 proteins from nuclear preparations of leaves were detected using a high-throughput liquid chromatography-mass spectrometry method. Many of these proteins have predicted nuclear localization signals, have homology to transcription factors and other nuclear regulatory proteins, and are phosphorylated. Statistics of summed spectral counts revealed sets of proteins with differential accumulation changes between susceptible and resistant plants. These protein accumulation changes were compared to previously reported gene expression changes and very little overlap was found. Thus, it appears that numerous proteins are post-translationally affected in the nucleus after infection. To our knowledge, this is the first indication of large-scale proteomic change in a plant nucleus after infection. Furthermore, the data reveal distinct proteins under control of Rpp1 and show that this disease resistance gene regulates nuclear protein accumulation. These regulated proteins likely influence broader defense responses, and these data may facilitate the development of plants with improved resistance.
Collapse
Affiliation(s)
- Bret Cooper
- Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD 20705, USA.
| | | | | | | | | |
Collapse
|
26
|
Spirin V, Shpunt A, Seebacher J, Gentzel M, Shevchenko A, Gygi S, Sunyaev S. Assigning spectrum-specific P-values to protein identifications by mass spectrometry. ACTA ACUST UNITED AC 2011; 27:1128-34. [PMID: 21349864 DOI: 10.1093/bioinformatics/btr089] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Although many methods and statistical approaches have been developed for protein identification by mass spectrometry, the problem of accurate assessment of statistical significance of protein identifications remains an open question. The main issues are as follows: (i) statistical significance of inferring peptide from experimental mass spectra must be platform independent and spectrum specific and (ii) individual spectrum matches at the peptide level must be combined into a single statistical measure at the protein level. RESULTS We present a method and software to assign statistical significance to protein identifications from search engines for mass spectrometric data. The approach is based on asymptotic theory of order statistics. The parameters of the asymptotic distributions of identification scores are estimated for each spectrum individually. The method relies on new unbiased estimators for parameters of extreme value distribution. The estimated parameters are used to assign a spectrum-specific P-value to each peptide-spectrum match. The protein-level confidence measure combines P-values of peptide-to-spectrum matches. CONCLUSION We extensively tested the method using triplicate mouse and yeast high-throughput proteomic experiments. The proposed statistical approach improves the sensitivity of protein identifications without compromising specificity. While the method was primarily designed to work with Mascot, it is platform-independent and is applicable to any search engine which outputs a single score for a peptide-spectrum match. We demonstrate this by testing the method in conjunction with X!Tandem. AVAILABILITY The software is available for download at ftp://genetics.bwh.harvard.edu/SSPV/. CONTACT ssunyaev@rics.bwh.harvard.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victor Spirin
- Division of Genetics, Brigham and Women's Hospital, Department of Cell Biology, Harvard Medical School, 240 Longwood Avenue, Boston, MA 02115, USA
| | | | | | | | | | | | | |
Collapse
|
27
|
Neilson KA, Ali NA, Muralidharan S, Mirzaei M, Mariani M, Assadourian G, Lee A, van Sluyter SC, Haynes PA. Less label, more free: approaches in label-free quantitative mass spectrometry. Proteomics 2011; 11:535-53. [PMID: 21243637 DOI: 10.1002/pmic.201000553] [Citation(s) in RCA: 507] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 10/21/2010] [Accepted: 11/02/2010] [Indexed: 01/09/2023]
Abstract
In this review we examine techniques, software, and statistical analyses used in label-free quantitative proteomics studies for area under the curve and spectral counting approaches. Recent advances in the field are discussed in an order that reflects a logical workflow design. Examples of studies that follow this design are presented to highlight the requirement for statistical assessment and further experiments to validate results from label-free quantitation. Limitations of label-free approaches are considered, label-free approaches are compared with labelling techniques, and forward-looking applications for label-free quantitative data are presented. We conclude that label-free quantitative proteomics is a reliable, versatile, and cost-effective alternative to labelled quantitation.
Collapse
Affiliation(s)
- Karlie A Neilson
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW, Australia
| | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Lee J, Jiang W, Qiao Y, Cho YI, Woo MO, Chin JH, Kwon SW, Hong SS, Choi IY, Koh HJ. Shotgun proteomic analysis for detecting differentially expressed proteins in the reduced culm number rice. Proteomics 2011; 11:455-68. [DOI: 10.1002/pmic.201000077] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2010] [Revised: 11/15/2010] [Accepted: 11/17/2010] [Indexed: 11/06/2022]
|
29
|
Serang O, MacCoss MJ, Noble WS. Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data. J Proteome Res 2010; 9:5346-57. [PMID: 20712337 DOI: 10.1021/pr100594k] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, "degenerate" peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein's presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or estimated the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors.
Collapse
Affiliation(s)
- Oliver Serang
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | | |
Collapse
|
30
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 358] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
31
|
Cooper B, Feng J, Garrett WM. Relative, label-free protein quantitation: spectral counting error statistics from nine replicate MudPIT samples. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2010; 21:1534-46. [PMID: 20541435 DOI: 10.1016/j.jasms.2010.05.001] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2010] [Revised: 04/30/2010] [Accepted: 05/03/2010] [Indexed: 05/03/2023]
Abstract
Nine replicate samples of peptides from soybean leaves, each spiked with a different concentration of bovine apotransferrin peptides, were analyzed on a mass spectrometer using multidimensional protein identification technology (MudPIT). Proteins were detected from the peptide tandem mass spectra, and the numbers of spectra were statistically evaluated for variation between samples. The results corroborate prior knowledge that combining spectra from replicate samples increases the number of identifiable proteins and that a summed spectral count for a protein increases linearly with increasing molar amounts of protein. Furthermore, statistical analysis of spectral counts for proteins in two- and three-way comparisons between replicates and combined replicates revealed little significant variation arising from run-to-run differences or data-dependent instrument ion sampling that might falsely suggest differential protein accumulation. In these experiments, spectral counting was enabled by PANORAMICS, probability-based software that predicts proteins detected by sets of observed peptides. Three alternative approaches to counting spectra were also evaluated by comparison. As the counting thresholds were changed from weaker to more stringent, the accuracy of ratio determination also changed. These results suggest that thresholds for counting can be empirically set to improve relative quantitation. All together, the data confirm the accuracy and reliability of label-free spectral counting in the relative, quantitative analysis of proteins between samples.
Collapse
Affiliation(s)
- Bret Cooper
- Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, Maryland 20705, USA.
| | | | | |
Collapse
|
32
|
Ahrné E, Müller M, Lisacek F. Unrestricted identification of modified proteins using MS/MS. Proteomics 2010; 10:671-86. [PMID: 20029840 DOI: 10.1002/pmic.200900502] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Proteins undergo PTM, which modulates their structure and regulates their function. Estimates of the PTM occurrence vary but it is safe to assume that there is an important gap between what is currently known and what remains to be discovered. The highest throughput and most comprehensive efforts to catalogue protein mixtures have so far been using MS-based shotgun proteomics. The standard approach to analyse MS/MS data is to use Peptide Fragment Fingerprinting tools such as Sequest, MASCOT or Phenyx. These tools commonly identify 5-30% of the spectra in an MS/MS data set while only a limited list of predefined protein modifications can be screened. An important part of the unidentified spectra is likely to be spectra of peptides carrying modifications not considered in the search. Bioinformatics for PTM discovery is an active area of research. In this review we focus on software solutions developed for unrestricted identification of modifications in MS/MS data, here referred to as open modification search tools. We give an overview of the conceptually different algorithmic solutions to evaluate the large number of candidate peptides per spectrum when accounting for modifications of unrestricted size and demonstrate the value of results of large-scale open modification search studies. Efficient and easy-to-use tools for protein modification discovery should prove valuable in the quest for mapping the dynamics of proteomes.
Collapse
Affiliation(s)
- Erik Ahrné
- Swiss Institute of Bioinformatics, Proteome Informatics Group, Geneva, Switzerland.
| | | | | |
Collapse
|
33
|
Protein and gene model inference based on statistical modeling in k-partite graphs. Proc Natl Acad Sci U S A 2010; 107:12101-6. [PMID: 20562346 DOI: 10.1073/pnas.0907654107] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Collapse
|
34
|
Li Q, MacCoss MJ, Stephens M. A nested mixture model for protein identification using mass spectrometry. Ann Appl Stat 2010. [DOI: 10.1214/09-aoas316] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
35
|
Shi J, Wu FX. Assigning Probabilities to Mascot Peptide Identification Using Logistic Regression. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2010; 680:229-36. [DOI: 10.1007/978-1-4419-5913-3_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
36
|
Gupta N, Pevzner PA. False discovery rates of protein identifications: a strike against the two-peptide rule. J Proteome Res 2009; 8:4173-81. [PMID: 19627159 DOI: 10.1021/pr9004794] [Citation(s) in RCA: 144] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Most proteomics studies attempt to maximize the number of peptide identifications and subsequently infer proteins containing two or more peptides as reliable protein identifications. In this study, we evaluate the effect of this "two-peptide" rule on protein identifications, using multiple search tools and data sets. Contrary to the intuition, the "two-peptide" rule reduces the number of protein identifications in the target database more significantly than in the decoy database and results in increased false discovery rates, compared to the case when single-hit proteins are not discarded. We therefore recommend that the "two-peptide" rule should be abandoned, and instead, protein identifications should be subject to the estimation of error rates, as is the case with peptide identifications. We further extend the generating function approach (originally proposed for evaluating matches between a peptide and a single spectrum) to evaluating matches between a protein and an entire spectral data set.
Collapse
Affiliation(s)
- Nitin Gupta
- Bioinformatics Program and Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA.
| | | |
Collapse
|
37
|
LI N, WU SF, ZHU YP, YANG XM. Progress of Protein Quality Control Methods in Shotgun Proteomics*. PROG BIOCHEM BIOPHYS 2009. [DOI: 10.3724/sp.j.1206.2008.00404] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
38
|
Feng J, Garrett WM, Naiman DQ, Cooper B. Correlation of Multiple Peptide Mass Spectra for Phosphoprotein Identification. J Proteome Res 2009; 8:5396-405. [DOI: 10.1021/pr900596u] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Jian Feng
- Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland 21218, Animal Biosciences and Biotechnology Laboratory, USDA-ARS, Beltsville, Maryland 20705, and Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, Maryland 20705
| | - Wesley M. Garrett
- Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland 21218, Animal Biosciences and Biotechnology Laboratory, USDA-ARS, Beltsville, Maryland 20705, and Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, Maryland 20705
| | - Daniel Q. Naiman
- Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland 21218, Animal Biosciences and Biotechnology Laboratory, USDA-ARS, Beltsville, Maryland 20705, and Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, Maryland 20705
| | - Bret Cooper
- Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland 21218, Animal Biosciences and Biotechnology Laboratory, USDA-ARS, Beltsville, Maryland 20705, and Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, Maryland 20705
| |
Collapse
|
39
|
Ding J, Shi J, Poirier GG, Wu FX. A novel approach to denoising ion trap tandem mass spectra. Proteome Sci 2009; 7:9. [PMID: 19292921 PMCID: PMC2670284 DOI: 10.1186/1477-5956-7-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Accepted: 03/17/2009] [Indexed: 12/04/2022] Open
Abstract
Background Mass spectrometers can produce a large number of tandem mass spectra. They are unfortunately noise-contaminated. Noises can affect the quality of tandem mass spectra and thus increase the false positives and false negatives in the peptide identification. Therefore, it is appealing to develop an approach to denoising tandem mass spectra. Results We propose a novel approach to denoising tandem mass spectra. The proposed approach consists of two modules: spectral peak intensity adjustment and intensity local maximum extraction. In the spectral peak intensity adjustment module, we introduce five features to describe the quality of each peak. Based on these features, a score is calculated for each peak and is used to adjust its intensity. As a result, the intensity will be adjusted to a local maximum if a peak is a signal peak, and it will be decreased if the peak is a noisy one. The second module uses a morphological reconstruction filter to remove the peaks whose intensities are not the local maxima of the spectrum. Experiments have been conducted on two ion trap tandem mass spectral datasets: ISB and TOV. Experimental results show that our algorithm can remove about 69% of the peaks of a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31.23% and 14.12% for the two tandem mass spectra datasets, respectively. Conclusion The proposed denoising algorithm can be integrated into current popular peptide identification algorithms such as Mascot to improve the reliability of assigning peptides to spectra. Availability of the software The software created from this work is available upon request.
Collapse
Affiliation(s)
- Jiarui Ding
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Canada.
| | | | | | | |
Collapse
|
40
|
Shen C, Sheng Q, Dai J, Li Y, Zeng R, Tang H. On the estimation of false positives in peptide identifications using decoy search strategy. Proteomics 2009; 9:194-204. [PMID: 19053142 DOI: 10.1002/pmic.200800330] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
False positive control/estimate in peptide identifications by MS is of critical importance for reliable inference at the protein level and downstream bioinformatics analysis. Approaches based on search against decoy databases have become popular for its conceptual simplicity and easy implementation. Although various decoy search strategies have been proposed, few studies have investigated their difference in performance. With datasets collected on a mixture of model proteins, we demonstrate that a single search against the target database coupled with its reversed version offers a good balance between performance and simplicity. In particular, both the accuracy of the estimate of the number of false positives and sensitivity is at least comparable to other procedures examined in this study. It is also shown that scrambling while preserving frequency of amino acid words can potentially improve the accuracy of false positive estimate, though more studies are needed to investigate the optimal scrambling procedure for specific condition and the variation of the estimate across repeated scrambling.
Collapse
Affiliation(s)
- Changyu Shen
- Division of Biostatistics, Indiana University School of Medicine, Indianapolis, IN 46202 , USA.
| | | | | | | | | | | |
Collapse
|
41
|
Brechenmacher L, Lee J, Sachdev S, Song Z, Nguyen THN, Joshi T, Oehrle N, Libault M, Mooney B, Xu D, Cooper B, Stacey G. Establishment of a protein reference map for soybean root hair cells. PLANT PHYSIOLOGY 2009; 149:670-82. [PMID: 19036831 PMCID: PMC2633823 DOI: 10.1104/pp.108.131649] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2008] [Accepted: 11/24/2008] [Indexed: 05/19/2023]
Abstract
Root hairs are single tubular cells formed from the differentiation of epidermal cells on roots. They are involved in water and nutrient uptake and represent the infection site on leguminous roots by rhizobia, soil bacteria that establish a nitrogen-fixing symbiosis. Root hairs develop by polar cell expansion or tip growth, a unique mode of plant growth shared only with pollen tubes. A more complete characterization of root hair cell biology will lead to a better understanding of tip growth, the rhizobial infection process, and also lead to improvements in plant water and nutrient uptake. We analyzed the proteome of isolated soybean (Glycine max) root hair cells using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and shotgun proteomics (1D-PAGE-liquid chromatography and multidimensional protein identification technology) approaches. Soybean was selected for this study due to its agronomic importance and its root size. The resulting soybean root hair proteome reference map identified 1,492 different proteins. 2D-PAGE followed by mass spectrometry identified 527 proteins from total cell contents. A complementary shotgun analysis identified 1,134 total proteins, including 443 proteins that were specific to the microsomal fraction. Only 169 proteins were identified by the 2D-PAGE and shotgun methods, which highlights the advantage of using both methods. The proteins identified are involved not only in basic cell metabolism but also in functions more specific to the single root hair cell, including water and nutrient uptake, vesicle trafficking, and hormone and secondary metabolism. The data presented provide useful insight into the metabolic activities of a single, differentiated plant cell type.
Collapse
Affiliation(s)
- Laurent Brechenmacher
- National Center for Soybean Biotechnology, Division of Plant Sciences, University of Missouri, Columbia, Missouri 65211, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Chitteti BR, Tan F, Mujahid H, Magee BG, Bridges SM, Peng Z. Comparative analysis of proteome differential regulation during cell dedifferentiation in Arabidopsis. Proteomics 2009; 8:4303-16. [PMID: 18814325 DOI: 10.1002/pmic.200701149] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Cell dedifferentiation is a cell fate switching process in which differentiated cells undergo genome reprogramming to regain the competency of cell division and organ regeneration. The molecular mechanism underlying the cell dedifferentiation process remains obscure. In this report, we investigate the cell dedifferentiation process in Arabidopsis using a shotgun proteomics approach. A total of 758 proteins are identified by two or more matched peptides. Comparative analyses at four time points using two label-free methods reveal that 193 proteins display up-regulation and 183 proteins display down-regulation within 48 h. While the results of the two label-free quantification methods match well with each other, comparison with previously published 2-DE gel results reveal that label-free quantification results differ substantially from those of the 2-DE method for proteins with peptides common to multiple proteins, suggesting a limitation of the label-free methods in quantifying proteins with closely related family members in complex samples. Our results show that the shotgun approach and the traditional 2-DE gel approach complement each other in both protein identification and quantification. An interesting observation is that core histones and histone variants are subjected to extensive down-regulation, indicating that there is a dramatic change in the chromatin during cell differentiation.
Collapse
Affiliation(s)
- Brahmananda Reddy Chitteti
- Department of Biochemistry and Molecular Biology, Mississippi State University, Mississippi State, MS 39762, USA
| | | | | | | | | | | |
Collapse
|
43
|
Bern M, Goldberg D. Improved ranking functions for protein and modification-site identifications. J Comput Biol 2008; 15:705-19. [PMID: 18651800 DOI: 10.1089/cmb.2007.0119] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
There are a number of computational tools for assigning identifications to peptide tandem mass spectra, but only a few tools, most notably ProteinProphet, for the crucial next step of integrating peptide identifications into higher-level identifications, such as proteins or modification sites. Here we describe a new program called ComByne for scoring and ranking higher-level identifications. Unlike other identification integration tools, ComByne corrects for protein lengths; it also makes use of more information, such as retention times and spectrum-to-spectrum corroborations. We compare ComByne to existing algorithms on several complex biological samples, including a sample of mouse blood plasma spiked with known concentrations of human proteins. On our samples, the combination of ComByne with our database search tool ByOnic is more sensitive than the combinations of Mascot with ProteinProphet and SEQUEST with DTASelect, with over 40% more proteins identified at 1% false discovery rate. A Web interface to our software is at http://bio.parc.xerox.com.
Collapse
Affiliation(s)
- Marshall Bern
- Palo Alto Research Center, Palo Alto, California, USA.
| | | |
Collapse
|
44
|
Jiang X, Dong X, Ye M, Zou H. Instance Based Algorithm for Posterior Probability Calculation by Target−Decoy Strategy to Improve Protein Identifications. Anal Chem 2008; 80:9326-35. [DOI: 10.1021/ac8017229] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Xinning Jiang
- National Chromatographic R&A Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China, Graduate School of Chinese Academy of Sciences, Beijing 100049, China, and Department of Chemistry, Xixi Campus, Zhejiang University, Hangzhou 310028, China
| | - Xiaoli Dong
- National Chromatographic R&A Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China, Graduate School of Chinese Academy of Sciences, Beijing 100049, China, and Department of Chemistry, Xixi Campus, Zhejiang University, Hangzhou 310028, China
| | - Mingliang Ye
- National Chromatographic R&A Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China, Graduate School of Chinese Academy of Sciences, Beijing 100049, China, and Department of Chemistry, Xixi Campus, Zhejiang University, Hangzhou 310028, China
| | - Hanfa Zou
- National Chromatographic R&A Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China, Graduate School of Chinese Academy of Sciences, Beijing 100049, China, and Department of Chemistry, Xixi Campus, Zhejiang University, Hangzhou 310028, China
| |
Collapse
|
45
|
Lee J, Feng J, Campbell KB, Scheffler BE, Garrett WM, Thibivilliers S, Stacey G, Naiman DQ, Tucker ML, Pastor-Corrales MA, Cooper B. Quantitative proteomic analysis of bean plants infected by a virulent and avirulent obligate rust fungus. Mol Cell Proteomics 2008; 8:19-31. [PMID: 18755735 DOI: 10.1074/mcp.m800156-mcp200] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Plants appear to have two types of active defenses, a broad-spectrum basal system and a system controlled by R-genes providing stronger resistance to some pathogens that break the basal defense. However, it is unknown if the systems are separate entities. Therefore, we analyzed proteins from leaves of the dry bean crop plant Phaseolus vulgaris using a high-throughput liquid chromatography tandem mass spectrometry method. By statistically comparing the amounts of proteins detected in a single plant variety that is susceptible or resistant to infection, depending on the strains of a rust fungus introduced, we defined basal and R-gene-mediated plant defenses at the proteomic level. The data reveal that some basal defense proteins are potential regulators of a strong defense weakened by the fungus and that the R-gene modulates proteins similar to those in the basal system. The results satisfy a new model whereby R-genes are part of the basal system and repair disabled defenses to reinstate strong resistance.
Collapse
Affiliation(s)
- Joohyun Lee
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, Maryland 20705, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Feng J, Naiman DQ, Cooper B. Combined Dynamic Arrays for Storing and Searching Semi-Ordered Tandem Mass Spectrometry Data. J Comput Biol 2008; 15:457-68. [DOI: 10.1089/cmb.2008.0011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Jian Feng
- Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland
| | - Daniel Q. Naiman
- Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland
| | - Bret Cooper
- Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, Maryland
| |
Collapse
|
47
|
Padliya ND, Garrett WM, Campbell KB, Tabb DL, Cooper B. Tandem mass spectrometry for the detection of plant pathogenic fungi and the effects of database composition on protein inferences. Proteomics 2008; 7:3932-42. [PMID: 17922518 DOI: 10.1002/pmic.200700419] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.
Collapse
Affiliation(s)
- Neerav D Padliya
- Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD 20705, USA
| | | | | | | | | |
Collapse
|
48
|
Searle BC, Turner M, Nesvizhskii AI. Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies. J Proteome Res 2008; 7:245-53. [DOI: 10.1021/pr070540w] [Citation(s) in RCA: 142] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
49
|
Shen C, Wang Z, Shankar G, Zhang X, Li L. A hierarchical statistical model to assess the confidence of peptides and proteins inferred from tandem mass spectrometry. ACTA ACUST UNITED AC 2007; 24:202-8. [PMID: 18024968 DOI: 10.1093/bioinformatics/btm555] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Statistical evaluation of the confidence of peptide and protein identifications made by tandem mass spectrometry is a critical component for appropriately interpreting the experimental data and conducting downstream analysis. Although many approaches have been developed to assign confidence measure from different perspectives, a unified statistical framework that integrates the uncertainty of peptides and proteins is still missing. RESULTS We developed a hierarchical statistical model (HSM) that jointly models the uncertainty of the identified peptides and proteins and can be applied to any scoring system. With data sets of a standard mixture and the yeast proteome, we demonstrate that the HSM offers a reliable or at least conservative false discovery rate (FDR) estimate for peptide and protein identifications. The probability measure of HSM also offers a powerful discriminating score for peptide identification. AVAILABILITY The algorithm is available upon request from the authors.
Collapse
Affiliation(s)
- Changyu Shen
- Division of Biostatistics, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | | | | | | | | |
Collapse
|
50
|
Abstract
Two shotgun tandem MS proteomics approaches, multidimensional protein identification technology (MudPIT) and 1-D gel-LC-MS/MS, were used to identify Arabidopsis thaliana leaf proteins. These methods utilize different protein/peptide separation strategies. Detergents not compatible with MudPIT were used with 1-D gel-LC-MS/MS to help enrich for the detection of membrane-spanning and hydrophobic proteins. By combining the data from all MudPIT and 1-D gel-LC-MS/MS experiments, 2342 nonredundant proteins spanning a broad range of molecular weights and pI values were detected. With the exception of unknown proteins, the distribution of gene ontology (GO) classifications for the detected proteins was similar to that encoded by the genome, which shows that these extraction and separation procedures are useful for a broad proteomic survey of plant cells. Unknown proteins will likely have to be targeted by using additional methods, some of which should be compatible with separation strategies taken here.
Collapse
Affiliation(s)
- Joohyun Lee
- USDA-ARS, Soybean Genomics and Improvement Laboratory, Beltsville, MD 20705, USA.
| | | | | |
Collapse
|