1
|
White FM, Wolf-Yadlin A. Methods for the Analysis of Protein Phosphorylation-Mediated Cellular Signaling Networks. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2016; 9:295-315. [PMID: 27049636 DOI: 10.1146/annurev-anchem-071015-041542] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Protein phosphorylation-mediated cellular signaling networks regulate almost all aspects of cell biology, including the responses to cellular stimulation and environmental alterations. These networks are highly complex and comprise hundreds of proteins and potentially thousands of phosphorylation sites. Multiple analytical methods have been developed over the past several decades to identify proteins and protein phosphorylation sites regulating cellular signaling, and to quantify the dynamic response of these sites to different cellular stimulation. Here we provide an overview of these methods, including the fundamental principles governing each method, their relative strengths and weaknesses, and some examples of how each method has been applied to the analysis of complex signaling networks. When applied correctly, each of these techniques can provide insight into the topology, dynamics, and regulation of protein phosphorylation signaling networks.
Collapse
Affiliation(s)
- Forest M White
- Department of Biological Engineering and David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;
| | | |
Collapse
|
2
|
Akhtar MN, Southey BR, Andrén PE, Sweedler JV, Rodriguez-Zas SL. Identification of best indicators of peptide-spectrum match using a permutation resampling approach. J Bioinform Comput Biol 2015; 12:1440001. [PMID: 25362838 DOI: 10.1142/s0219720014400010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Various indicators of observed-theoretical spectrum matches were compared and the resulting statistical significance was characterized using permutation resampling. Novel decoy databases built by resampling the terminal positions of peptide sequences were evaluated to identify the conditions for accurate computation of peptide match significance levels. The methodology was tested on real and manually curated tandem mass spectra from peptides across a wide range of sizes. Spectra match indicators from complementary database search programs were profiled and optimal indicators were identified. The combination of the optimal indicator and permuted decoy databases improved the calculation of the peptide match significance compared to the approaches currently implemented in the database search programs that rely on distributional assumptions. Permutation tests using p-values obtained from software-dependent matching scores and E-values outperformed permutation tests using all other indicators. The higher overlap in matches between the database search programs when using end permutation compared to existing approaches confirmed the superiority of the end permutation method to identify peptides. The combination of effective match indicators and the end permutation method is recommended for accurate detection of peptides.
Collapse
Affiliation(s)
- Malik N Akhtar
- Department of Animal Sciences, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | | | | | | | | |
Collapse
|
3
|
Lee DCH, Jones AR, Hubbard SJ. Computational phosphoproteomics: from identification to localization. Proteomics 2015; 15:950-63. [PMID: 25475148 PMCID: PMC4384807 DOI: 10.1002/pmic.201400372] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 10/31/2014] [Accepted: 11/26/2014] [Indexed: 01/08/2023]
Abstract
Analysis of the phosphoproteome by MS has become a key technology for the characterization of dynamic regulatory processes in the cell, since kinase and phosphatase action underlie many major biological functions. However, the addition of a phosphate group to a suitable side chain often confounds informatic analysis by generating product ion spectra that are more difficult to interpret (and consequently identify) relative to unmodified peptides. Collectively, these challenges have motivated bioinformaticians to create novel software tools and pipelines to assist in the identification of phosphopeptides in proteomic mixtures, and help pinpoint or "localize" the most likely site of modification in cases where there is ambiguity. Here we review the challenges to be met and the informatics solutions available to address them for phosphoproteomic analysis, as well as highlighting the difficulties associated with using them and the implications for data standards.
Collapse
Affiliation(s)
- Dave C H Lee
- Faculty of Life Sciences, University of ManchesterManchester, UK
| | - Andrew R Jones
- Institute of Integrative Biology, University of LiverpoolLiverpool, UK
| | - Simon J Hubbard
- Faculty of Life Sciences, University of ManchesterManchester, UK
| |
Collapse
|
4
|
Accurate assignment of significance to neuropeptide identifications using Monte Carlo k-permuted decoy databases. PLoS One 2014; 9:e111112. [PMID: 25329667 PMCID: PMC4201571 DOI: 10.1371/journal.pone.0111112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Accepted: 09/26/2014] [Indexed: 12/18/2022] Open
Abstract
In support of accurate neuropeptide identification in mass spectrometry experiments, novel Monte Carlo permutation testing was used to compute significance values. Testing was based on k-permuted decoy databases, where k denotes the number of permutations. These databases were integrated with a range of peptide identification indicators from three popular open-source database search software (OMSSA, Crux, and X! Tandem) to assess the statistical significance of neuropeptide spectra matches. Significance p-values were computed as the fraction of the sequences in the database with match indicator value better than or equal to the true target spectra. When applied to a test-bed of all known manually annotated mouse neuropeptides, permutation tests with k-permuted decoy databases identified up to 100% of the neuropeptides at p-value < 10(-5). The permutation test p-values using hyperscore (X! Tandem), E-value (OMSSA) and Sp score (Crux) match indicators outperformed all other match indicators. The robust performance to detect peptides of the intuitive indicator "number of matched ions between the experimental and theoretical spectra" highlights the importance of considering this indicator when the p-value was borderline significant. Our findings suggest permutation decoy databases of size 1×105 are adequate to accurately detect neuropeptides and this can be exploited to increase the speed of the search. The straightforward Monte Carlo permutation testing (comparable to a zero order Markov model) can be easily combined with existing peptide identification software to enable accurate and effective neuropeptide detection. The source code is available at http://stagbeetle.animal.uiuc.edu/pepshop/MSMSpermutationtesting.
Collapse
|
5
|
Wu L, Han DK. Overcoming the dynamic range problem in mass spectrometry-based shotgun proteomics. Expert Rev Proteomics 2014; 3:611-9. [PMID: 17181475 DOI: 10.1586/14789450.3.6.611] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Protein profiling using mass spectrometry technology has emerged as a powerful method for analyzing large-scale protein-expression patterns in cells and tissues. However, a number of challenges are present in proteomics research, one of the greatest being the high degree of protein complexity and huge dynamic range of proteins expressed in the complex biological mixtures, which exceeds six orders of magnitude in cells and ten orders of magnitude in body fluids. Since many important signaling proteins have low expression levels, methods to detect the low-abundance proteins in a complex sample are required. This review will focus on the fundamental fractionation and mass spectrometry techniques currently used for large-scale shotgun proteomics research.
Collapse
Affiliation(s)
- Linfeng Wu
- University of Connecticut, School of Medicine, Department of Cell Biology, Farmington, Connecticut, CT 06030, USA.
| | | |
Collapse
|
6
|
Henry LG, Aruni W, Sandberg L, Fletcher HM. Protective role of the PG1036-PG1037-PG1038 operon in oxidative stress in Porphyromonas gingivalis W83. PLoS One 2013; 8:e69645. [PMID: 23990885 PMCID: PMC3747172 DOI: 10.1371/journal.pone.0069645] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Accepted: 06/13/2013] [Indexed: 12/15/2022] Open
Abstract
As an anaerobe, Porphyromonas gingivalis is significantly affected by the harsh inflammatory environment of the periodontal pocket during initial colonization and active periodontal disease. We reported previously that the repair of oxidative stress-induced DNA damage involving 8-oxo-7,8-dihydroguanine (8-oxoG) may occur by an undescribed mechanism in P. gingivalis. DNA affinity fractionation identified PG1037, a conserved hypothetical protein, among other proteins, that were bound to the 8-oxoG lesion. PG1037 is part of the uvrA-PG1037-pcrA operon in P. gingivalis which is known to be upregulated under H2O2 induced stress. A PCR-based linear transformation method was used to inactivate the uvrA and pcrA genes by allelic exchange mutagenesis. Several attempts to inactivate PG1037 were unsuccessful. Similar to the wild-type when plated on Brucella blood agar, the uvrA and pcrA-defective mutants were black-pigmented and beta-hemolytic. These isogenic mutants also had reduced gingipain activities and were more sensitive to H2O2 and UV irradiation compared to the parent strain. Additionally, glycosylase assays revealed that 8-oxoG repair activities were similar in both wild-type and mutant P. gingivalis strains. Several proteins, some of which are known to have oxidoreducatse activity, were shown to interact with PG1037. The purified recombinant PG1037 protein could protect DNA from H2O2-induced damage. Collectively, these findings suggest that the uvrA-PG1037-pcrA operon may play an important role in hydrogen peroxide stress-induced resistance in P. gingivalis.
Collapse
Affiliation(s)
- Leroy G. Henry
- Division of Microbiology and Molecular Genetics, School of Medicine, Loma Linda University, Loma Linda, California, United States of America
| | - Wilson Aruni
- Division of Microbiology and Molecular Genetics, School of Medicine, Loma Linda University, Loma Linda, California, United States of America
| | - Lawrence Sandberg
- Division of Biochemistry, School of Medicine, Loma Linda University, Loma Linda, California, United States of America
| | - Hansel M. Fletcher
- Division of Microbiology and Molecular Genetics, School of Medicine, Loma Linda University, Loma Linda, California, United States of America
| |
Collapse
|
7
|
Higdon R, Haynes W, Stanberry L, Stewart E, Yandl G, Howard C, Broomall W, Kolker N, Kolker E. Unraveling the Complexities of Life Sciences Data. BIG DATA 2013; 1:42-50. [PMID: 27447037 DOI: 10.1089/big.2012.1505] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The life sciences have entered into the realm of big data and data-enabled science, where data can either empower or overwhelm. These data bring the challenges of the 5 Vs of big data: volume, veracity, velocity, variety, and value. Both independently and through our involvement with DELSA Global (Data-Enabled Life Sciences Alliance, DELSAglobal.org), the Kolker Lab ( kolkerlab.org ) is creating partnerships that identify data challenges and solve community needs. We specialize in solutions to complex biological data challenges, as exemplified by the community resource of MOPED (Model Organism Protein Expression Database, MOPED.proteinspire.org ) and the analysis pipeline of SPIRE (Systematic Protein Investigative Research Environment, PROTEINSPIRE.org ). Our collaborative work extends into the computationally intensive tasks of analysis and visualization of millions of protein sequences through innovative implementations of sequence alignment algorithms and creation of the Protein Sequence Universe tool (PSU). Pushing into the future together with our collaborators, our lab is pursuing integration of multi-omics data and exploration of biological pathways, as well as assigning function to proteins and porting solutions to the cloud. Big data have come to the life sciences; discovering the knowledge in the data will bring breakthroughs and benefits.
Collapse
Affiliation(s)
- Roger Higdon
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Winston Haynes
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Larissa Stanberry
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Elizabeth Stewart
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Gregory Yandl
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Chris Howard
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 5 Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
| | - William Broomall
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Natali Kolker
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Eugene Kolker
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 6 Departments of Biomedical Informatics & Medical Education and Pediatrics, University of Washington , Seattle, Washington
| |
Collapse
|
8
|
Alfaro MP, Deskins DL, Wallus M, DasGupta J, Davidson JM, Nanney LB, Guney MA, Gannon M, Young PP. A physiological role for connective tissue growth factor in early wound healing. J Transl Med 2013; 93:81-95. [PMID: 23212098 PMCID: PMC3720136 DOI: 10.1038/labinvest.2012.162] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Mesenchymal stem cells (MSCs) that overexpress secreted frizzled-related protein 2 (sFRP2) exhibit an enhanced reparative phenotype. The secretomes of sFRP2-overexpressing MSCs and vector control-MSCs were compared through liquid chromatography tandem mass spectrometry. Proteomic profiling revealed that connective tissue growth factor (CTGF; CCN2) was overrepresented in the conditioned media of sFRP2-overexpressing MSCs and MSC-derived CTGF could thus be an important paracrine effector. Subcutaneously implanted, MSC-loaded polyvinyl alcohol (PVA) sponges and stented excisional wounds were used as wound models to study the dynamics of CTGF expression. Granulation tissue generated within the sponges and full-thickness skin wounds showed transient upregulation of CTGF expression by MSCs and fibroblasts, implying a role for this molecule in early tissue repair. Although collagen and COL1A2 mRNA were not increased when recombinant CTGF was administered to sponges during the early phase (day 1-6) of tissue repair, prolonged administration (>15 days) of exogenous CTGF into PVA sponges resulted in fibroblast proliferation and increased deposition of collagen within the experimental granulation tissue. In support of its physiological role, CTGF immunoinhibition during early repair (days 0-7) reduced the quantity, organizational quality and vascularity of experimental granulation tissue in the sponge model. However, CTGF haploinsufficiency was not enough to reduce collagen deposition in excisional wounds. Similar to acute murine wound models, CTGF was transiently present in the early phase of human acute burn wound healing. Together, these results further support a physiological role for CTGF in wound repair and demonstrate that when CTGF expression is confined to early tissue repair, it serves a pro-reparative role. These data also further illustrate the potential of MSC-derived paracrine modulators to enhance tissue repair.
Collapse
Affiliation(s)
- Maria P Alfaro
- Departments of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Desirae L Deskins
- Departments of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Meredith Wallus
- Departments of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jayasri DasGupta
- Departments of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jeffrey M Davidson
- Departments of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
,The Department of Veterans Affairs Medical Center, Nashville, TN, USA
| | - Lillian B Nanney
- Departments of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Michelle A Guney
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Maureen Gannon
- The Department of Veterans Affairs Medical Center, Nashville, TN, USA
,Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, USA
,Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
,Department of Cell and Developmental Biology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Pampee P Young
- Departments of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
,The Department of Veterans Affairs Medical Center, Nashville, TN, USA
,Department of Internal Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
9
|
Hines HB. Microbial proteomics using mass spectrometry. Methods Mol Biol 2012; 881:159-86. [PMID: 22639214 DOI: 10.1007/978-1-61779-827-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
Proteomic analyses involve a series of intricate, interdependent steps involving approaches and technical issues that must be fully coordinated to obtain the optimal amount of required information about the test subject. Fortunately, many of these steps are common to most test subjects, requiring only modifications to or, in some cases, substitution of some of the steps to ensure they are relevant to the desired objective of a study. This fortunate occurrence creates an essential core of proteomic approaches and techniques that are consistently available for most studies, regardless of test subject. In this chapter, an overview of some of these core approaches, techniques, and mass spectrometric instrumentation is given, while indicating how such steps are useful for and applied to bacterial investigations. To exemplify how such proteomic concepts and techniques are applicable to bacterial investigations, a practical, quantitative method useful for bacterial proteomic analysis is presented with a discussion of possibilities, pitfalls, and some emerging technology to provide a compilation of information from the diverse literature that is intermingled with experimental experience.
Collapse
Affiliation(s)
- Harry B Hines
- Integrated Toxicology Division, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD, USA.
| |
Collapse
|
10
|
Paramecium bursaria chlorella virus 1 proteome reveals novel architectural and regulatory features of a giant virus. J Virol 2012; 86:8821-34. [PMID: 22696644 DOI: 10.1128/jvi.00907-12] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The 331-kbp chlorovirus Paramecium bursaria chlorella virus 1 (PBCV-1) genome was resequenced and annotated to correct errors in the original 15-year-old sequence; 40 codons was considered the minimum protein size of an open reading frame. PBCV-1 has 416 predicted protein-encoding sequences and 11 tRNAs. A proteome analysis was also conducted on highly purified PBCV-1 virions using two mass spectrometry-based protocols. The mass spectrometry-derived data were compared to PBCV-1 and its host Chlorella variabilis NC64A predicted proteomes. Combined, these analyses revealed 148 unique virus-encoded proteins associated with the virion (about 35% of the coding capacity of the virus) and 1 host protein. Some of these proteins appear to be structural/architectural, whereas others have enzymatic, chromatin modification, and signal transduction functions. Most (106) of the proteins have no known function or homologs in the existing gene databases except as orthologs with proteins of other chloroviruses, phycodnaviruses, and nuclear-cytoplasmic large DNA viruses. The genes encoding these proteins are dispersed throughout the virus genome, and most are transcribed late or early-late in the infection cycle, which is consistent with virion morphogenesis.
Collapse
|
11
|
Trötschel C, Albaum SP, Wolff D, Schröder S, Goesmann A, Nattkemper TW, Poetsch A. Protein turnover quantification in a multilabeling approach: from data calculation to evaluation. Mol Cell Proteomics 2012; 11:512-26. [PMID: 22493176 DOI: 10.1074/mcp.m111.014134] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Liquid chromatography coupled to tandem mass spectrometry in combination with stable-isotope labeling is an established and widely spread method to measure gene expression on the protein level. However, it is often not considered that two opposing processes are responsible for the amount of a protein in a cell--the synthesis as well as the degradation. With this work, we provide an integrative, high-throughput method--from the experimental setup to the bioinformatics analysis--to measure synthesis and degradation rates of an organism's proteome. Applicability of the approach is demonstrated with an investigation of heat shock response, a well-understood regulatory mechanism in bacteria, on the biotechnologically relevant Corynebacterium glutamicum. Utilizing a multilabeling approach using both heavy stable nitrogen as well as carbon isotopes cells are metabolically labeled in a pulse-chase experiment to trace the labels' incorporation in newly synthesized proteins and its loss during protein degradation. Our work aims not only at the calculation of protein turnover rates but also at their statistical evaluation, including variance and hierarchical cluster analysis using the rich internet application QuPE.
Collapse
Affiliation(s)
- Christian Trötschel
- Department of Plant Biochemistry, Ruhr-University Bochum, 44780 Bochum, Germany.
| | | | | | | | | | | | | |
Collapse
|
12
|
Kolker E, Higdon R, Haynes W, Welch D, Broomall W, Lancet D, Stanberry L, Kolker N. MOPED: Model Organism Protein Expression Database. Nucleic Acids Res 2012; 40:D1093-9. [PMID: 22139914 PMCID: PMC3245040 DOI: 10.1093/nar/gkr1177] [Citation(s) in RCA: 90] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Revised: 11/10/2011] [Accepted: 11/11/2011] [Indexed: 01/14/2023] Open
Abstract
Large numbers of mass spectrometry proteomics studies are being conducted to understand all types of biological processes. The size and complexity of proteomics data hinders efforts to easily share, integrate, query and compare the studies. The Model Organism Protein Expression Database (MOPED, htttp://moped.proteinspire.org) is a new and expanding proteomics resource that enables rapid browsing of protein expression information from publicly available studies on humans and model organisms. MOPED is designed to simplify the comparison and sharing of proteomics data for the greater research community. MOPED uniquely provides protein level expression data, meta-analysis capabilities and quantitative data from standardized analysis. Data can be queried for specific proteins, browsed based on organism, tissue, localization and condition and sorted by false discovery rate and expression. MOPED empowers users to visualize their own expression data and compare it with existing studies. Further, MOPED links to various protein and pathway databases, including GeneCards, Entrez, UniProt, KEGG and Reactome. The current version of MOPED contains over 43,000 proteins with at least one spectral match and more than 11 million high certainty spectra.
Collapse
Affiliation(s)
- Eugene Kolker
- Bioinformatics and High-throughput Analysis Laboratory, High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute, Predicitive Analytics, Seattle Children's Hospital, Seattle, WA 98105, USA.
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Thakur D, Rejtar T, Wang D, Bones J, Cha S, Clodfelder-Miller B, Richardson E, Binns S, Dahiya S, Sgroi D, Karger BL. Microproteomic analysis of 10,000 laser captured microdissected breast tumor cells using short-range sodium dodecyl sulfate-polyacrylamide gel electrophoresis and porous layer open tubular liquid chromatography tandem mass spectrometry. J Chromatogr A 2011; 1218:8168-74. [PMID: 21982995 PMCID: PMC3205921 DOI: 10.1016/j.chroma.2011.09.022] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2011] [Revised: 09/07/2011] [Accepted: 09/08/2011] [Indexed: 01/04/2023]
Abstract
Precise proteomic profiling of limited levels of disease tissue represents an extremely challenging task. Here, we present an effective and reproducible microproteomic workflow for sample sizes of only 10,000 cells that integrates selective sample procurement via laser capture microdissection (LCM), sample clean-up and protein level fractionation using short-range SDS-PAGE, followed by ultrasensitive LC-MS/MS analysis using a 10 μm i.d. porous layer open tubular (PLOT) column. With 10,000 LCM captured mouse hepatocytes for method development and performance assessment, only 10% of the in-gel digest, equivalent to ∼1000 cells, was needed per LC-MS/MS analysis. The optimized workflow was applied to the differential proteomic analysis of 10,000 LCM collected primary and metastatic breast cancer cells from the same patient. More than 1100 proteins were identified from each injection with >1700 proteins identified from three LCM samples of 10,000 cells from the same patient (1123 with at least two unique peptides). Label free quantitation (spectral counting) was performed to identify differential protein expression between the primary and metastatic cell populations. Informatics analysis of the resulting data indicated that vesicular transport and extracellular remodeling processes were significantly altered between the two cell types. The ability to extract meaningful biological information from limited, but highly informative cell populations demonstrates the significant benefits of the described microproteomic workflow.
Collapse
Affiliation(s)
- Dipak Thakur
- Barnett Institute and Dept. of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115
| | - Tomas Rejtar
- Barnett Institute and Dept. of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115
| | - Dongdong Wang
- Barnett Institute and Dept. of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115
| | - Jonathan Bones
- Barnett Institute and Dept. of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115
| | - Sangwon Cha
- Barnett Institute and Dept. of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115
| | | | | | - Shemeica Binns
- Molecular Pathology Unit, Massachusetts General Hospital, Charlestown, MA 02129
| | - Sonika Dahiya
- Molecular Pathology Unit, Massachusetts General Hospital, Charlestown, MA 02129
| | - Dennis Sgroi
- Molecular Pathology Unit, Massachusetts General Hospital, Charlestown, MA 02129
| | - Barry L. Karger
- Barnett Institute and Dept. of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115
| |
Collapse
|
14
|
Makawita S, Smith C, Batruch I, Zheng Y, Rückert F, Grützmann R, Pilarsky C, Gallinger S, Diamandis EP. Integrated proteomic profiling of cell line conditioned media and pancreatic juice for the identification of pancreatic cancer biomarkers. Mol Cell Proteomics 2011; 10:M111.008599. [PMID: 21653254 PMCID: PMC3205865 DOI: 10.1074/mcp.m111.008599] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2011] [Revised: 05/19/2011] [Indexed: 12/13/2022] Open
Abstract
Pancreatic cancer is one of the leading causes of cancer-related deaths, for which serological biomarkers are urgently needed. Most discovery-phase studies focus on the use of one biological source for analysis. The present study details the combined mining of pancreatic cancer-related cell line conditioned media and pancreatic juice for identification of putative diagnostic leads. Using strong cation exchange chromatography, followed by LC-MS/MS on an LTQ-Orbitrap mass spectrometer, we extensively characterized the proteomes of conditioned media from six pancreatic cancer cell lines (BxPc3, MIA-PaCa2, PANC1, CAPAN1, CFPAC1, and SU.86.86), the normal human pancreatic ductal epithelial cell line HPDE, and two pools of six pancreatic juice samples from ductal adenocarcinoma patients. All samples were analyzed in triplicate. Between 1261 and 2171 proteins were identified with two or more peptides in each of the cell lines, and an average of 521 proteins were identified in the pancreatic juice pools. In total, 3479 nonredundant proteins were identified with high confidence, of which ∼ 40% were extracellular or cell membrane-bound based on Genome Ontology classifications. Three strategies were employed for identification of candidate biomarkers: (1) examination of differential protein expression between the cancer and normal cell lines using label-free protein quantification, (2) integrative analysis, focusing on the overlap of proteins among the multiple biological fluids, and (3) tissue specificity analysis through mining of publically available databases. Preliminary verification of anterior gradient homolog 2, syncollin, olfactomedin-4, polymeric immunoglobulin receptor, and collagen alpha-1(VI) chain in plasma samples from pancreatic cancer patients and healthy controls using ELISA, showed a significant increase (p < 0.01) of these proteins in plasma from pancreatic cancer patients. The combination of these five proteins showed an improved area under the receiver operating characteristic curve to CA19.9 alone. Further validation of these proteins is warranted, as is the investigation of the remaining group of candidates.
Collapse
Affiliation(s)
- Shalini Makawita
- From the ‡Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
- §Department of Clinical Biochemistry, University Health Network, Toronto, ON, Canada
| | - Chris Smith
- §Department of Clinical Biochemistry, University Health Network, Toronto, ON, Canada
| | - Ihor Batruch
- ¶Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, ON, Canada
| | - Yingye Zheng
- ‖The Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Felix Rückert
- **Visceral, Thoracic and Vascular Surgery, University Hospital Carl Gustav Carus, Technical University of Dresden, Germany
| | - Robert Grützmann
- **Visceral, Thoracic and Vascular Surgery, University Hospital Carl Gustav Carus, Technical University of Dresden, Germany
| | - Christian Pilarsky
- **Visceral, Thoracic and Vascular Surgery, University Hospital Carl Gustav Carus, Technical University of Dresden, Germany
| | - Steven Gallinger
- ‡‡Zane Cohen Familial Gastrointestinal Cancer Registry and Department of Surgery, Mount Sinai Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Eleftherios P. Diamandis
- From the ‡Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
- §Department of Clinical Biochemistry, University Health Network, Toronto, ON, Canada
- ¶Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, ON, Canada
| |
Collapse
|
15
|
Mohammadi M, Anoop V, Gleddie S, Harris LJ. Proteomic profiling of two maize inbreds during early gibberella ear rot infection. Proteomics 2011; 11:3675-84. [PMID: 21751381 DOI: 10.1002/pmic.201100177] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2011] [Revised: 05/27/2011] [Accepted: 06/15/2011] [Indexed: 11/08/2022]
Abstract
Fusarium graminearum is the causal agent of gibberella ear rot in maize ears, resulting in yield losses due to mouldy and mycotoxin-contaminated grain. This study represents a global proteomic approach to document the early infection by F. graminearum of two maize inbreds, B73 and CO441, which differ in disease susceptibility. Mock- and F. graminearum-treated developing kernels were sampled 48 h post-inoculation over three field seasons. Infected B73 kernels consistently contained higher concentrations of the mycotoxin deoxynivalenol than the kernels of the more tolerant inbred CO441. A total of 2067 maize proteins were identified in the iTRAQ analysis of extracted kernel proteins at a 99% confidence level. A subset of 878 proteins was identified in at least two biological replicates and exhibited statistically significantly altered expression between treatments and/or the two inbred lines of which 96 proteins exhibited changes in abundance >1.5-fold in at least one of the treatments. Many proteins associated with the defense response were more abundant after infection, including PR-10 (PR, pathogenesis-related), chitinases, xylanase inhibitors, proteinase inhibitors, and a class III peroxidase. Kernels of the tolerant inbred CO441 contained higher levels of these defense-related proteins than B73 kernels even after mock treatment, suggesting that these proteins may provide a basal defense against Fusarium infection in CO441.
Collapse
Affiliation(s)
- Mohsen Mohammadi
- Eastern Cereal and Oilseed Research Centre, Agriculture and Agri-Food Canada, Ottawa, ON, Canada
| | | | | | | |
Collapse
|
16
|
Higdon R, Reiter L, Hather G, Haynes W, Kolker N, Stewart E, Bauman AT, Picotti P, Schmidt A, van Belle G, Aebersold R, Kolker E. IPM: An integrated protein model for false discovery rate estimation and identification in high-throughput proteomics. J Proteomics 2011; 75:116-21. [PMID: 21718813 DOI: 10.1016/j.jprot.2011.06.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2011] [Revised: 05/28/2011] [Accepted: 06/02/2011] [Indexed: 12/19/2022]
Abstract
In high-throughput mass spectrometry proteomics, peptides and proteins are not simply identified as present or not present in a sample, rather the identifications are associated with differing levels of confidence. The false discovery rate (FDR) has emerged as an accepted means for measuring the confidence associated with identifications. We have developed the Systematic Protein Investigative Research Environment (SPIRE) for the purpose of integrating the best available proteomics methods. Two successful approaches to estimating the FDR for MS protein identifications are the MAYU and our current SPIRE methods. We present here a method to combine these two approaches to estimating the FDR for MS protein identifications into an integrated protein model (IPM). We illustrate the high quality performance of this IPM approach through testing on two large publicly available proteomics datasets. MAYU and SPIRE show remarkable consistency in identifying proteins in these datasets. Still, IPM results in a more robust FDR estimation approach and additional identifications, particularly among low abundance proteins. IPM is now implemented as a part of the SPIRE system.
Collapse
Affiliation(s)
- Roger Higdon
- Bioinformatics & High-throughput Analysis Laboratory, Seattle, WA, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Shi Y, Xu P, Qin J. Ubiquitinated proteome: ready for global? Mol Cell Proteomics 2011; 10:R110.006882. [PMID: 21339389 PMCID: PMC3098603 DOI: 10.1074/mcp.r110.006882] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2010] [Revised: 02/15/2011] [Indexed: 01/09/2023] Open
Abstract
Ubiquitin (Ub) is a small and highly conserved protein that can covalently modify protein substrates. Ubiquitination is one of the major post-translational modifications that regulate a broad spectrum of cellular functions. The advancement of mass spectrometers as well as the development of new affinity purification tools has greatly expedited proteome-wide analysis of several post-translational modifications (e.g. phosphorylation, glycosylation, and acetylation). In contrast, large-scale profiling of lysine ubiquitination remains a challenge. Most recently, new Ub affinity reagents such as Ub remnant antibody and tandem Ub binding domains have been developed, allowing for relatively large-scale detection of several hundreds of lysine ubiquitination events in human cells. Here we review different strategies for the identification of ubiquitination site and discuss several issues associated with data analysis. We suggest that careful interpretation and orthogonal confirmation of MS spectra is necessary to minimize false positive assignments by automatic searching algorithms.
Collapse
Affiliation(s)
- Yi Shi
- From the ‡Center for Molecular Discovery, Verna and Marrs McLean Department of Biochemistry and Molecular Biology
| | - Ping Xu
- ¶State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing, P. R. China
| | - Jun Qin
- From the ‡Center for Molecular Discovery, Verna and Marrs McLean Department of Biochemistry and Molecular Biology
- §Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA and
| |
Collapse
|
18
|
Fernández-Taboada E, Rodríguez-Esteban G, Saló E, Abril JF. A proteomics approach to decipher the molecular nature of planarian stem cells. BMC Genomics 2011; 12:133. [PMID: 21356107 PMCID: PMC3058083 DOI: 10.1186/1471-2164-12-133] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Accepted: 02/28/2011] [Indexed: 01/07/2023] Open
Abstract
Background In recent years, planaria have emerged as an important model system for research into stem cells and regeneration. Attention is focused on their unique stem cells, the neoblasts, which can differentiate into any cell type present in the adult organism. Sequencing of the Schmidtea mediterranea genome and some expressed sequence tag projects have generated extensive data on the genetic profile of these cells. However, little information is available on their protein dynamics. Results We developed a proteomic strategy to identify neoblast-specific proteins. Here we describe the method and discuss the results in comparison to the genomic high-throughput analyses carried out in planaria and to proteomic studies using other stem cell systems. We also show functional data for some of the candidate genes selected in our proteomic approach. Conclusions We have developed an accurate and reliable mass-spectra-based proteomics approach to complement previous genomic studies and to further achieve a more accurate understanding and description of the molecular and cellular processes related to the neoblasts.
Collapse
Affiliation(s)
- Enrique Fernández-Taboada
- Departament de Genètica and Institute of Biomedicine, Universitat de Barcelona, Avenida Diagonal 645, Barcelona, Catalonia, Spain
| | | | | | | |
Collapse
|
19
|
Neilson KA, Ali NA, Muralidharan S, Mirzaei M, Mariani M, Assadourian G, Lee A, van Sluyter SC, Haynes PA. Less label, more free: approaches in label-free quantitative mass spectrometry. Proteomics 2011; 11:535-53. [PMID: 21243637 DOI: 10.1002/pmic.201000553] [Citation(s) in RCA: 507] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 10/21/2010] [Accepted: 11/02/2010] [Indexed: 01/09/2023]
Abstract
In this review we examine techniques, software, and statistical analyses used in label-free quantitative proteomics studies for area under the curve and spectral counting approaches. Recent advances in the field are discussed in an order that reflects a logical workflow design. Examples of studies that follow this design are presented to highlight the requirement for statistical assessment and further experiments to validate results from label-free quantitation. Limitations of label-free approaches are considered, label-free approaches are compared with labelling techniques, and forward-looking applications for label-free quantitative data are presented. We conclude that label-free quantitative proteomics is a reliable, versatile, and cost-effective alternative to labelled quantitation.
Collapse
Affiliation(s)
- Karlie A Neilson
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW, Australia
| | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Suthammarak W, Morgan PG, Sedensky MM. Mutations in mitochondrial complex III uniquely affect complex I in Caenorhabditis elegans. J Biol Chem 2010; 285:40724-31. [PMID: 20971856 DOI: 10.1074/jbc.m110.159608] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Mitochondrial supercomplexes containing complexes I, III, and IV of the electron transport chain are now regarded as an established entity. Supercomplex I·III·IV has been theorized to improve respiratory chain function by allowing quinone channeling between complexes I and III. Here, we show that the role of the supercomplexes extends beyond channeling. Mutant analysis in Caenorhabditis elegans reveals that complex III affects supercomplex I·III·IV formation by acting as an assembly or stabilizing factor. Also, a complex III mtDNA mutation, ctb-1, inhibits complex I function by weakening the interaction of complex IV in supercomplex I·III·IV. Other complex III mutations inhibit complex I function either by decreasing the amount of complex I (isp-1), or decreasing the amount of complex I in its most active form, the I·III·IV supercomplex (isp-1;ctb-1). ctb-1 suppresses a nuclear encoded complex III defect, isp-1, without improving complex III function. Allosteric interactions involve all three complexes within the supercomplex and are necessary for maximal enzymatic activities.
Collapse
Affiliation(s)
- Wichit Suthammarak
- Department of Genetics, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | | | | |
Collapse
|
21
|
Hather G, Higdon R, Bauman A, von Haller PD, Kolker E. Estimating false discovery rates for peptide and protein identification using randomized databases. Proteomics 2010; 10:2369-76. [PMID: 20391536 DOI: 10.1002/pmic.200900619] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
MS-based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression-based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two-peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%).
Collapse
Affiliation(s)
- Gregory Hather
- Bioinformatics & High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | | | | | | | | |
Collapse
|
22
|
Higdon R, Haynes W, Kolker E. Meta-analysis for protein identification: a case study on yeast data. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010; 14:309-14. [PMID: 20569183 DOI: 10.1089/omi.2010.0034] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Large amounts of mass spectrometry (MS) proteomics data are now publicly available; however, little attention has been given to how to best combine these data and assess the error rates for protein identification. The objective of this article is to show how variation in the type and amount of data included with each study impacts coverage of the yeast proteome and estimation of the false discovery rate (FDR). Our analysis of a subset of the publicly available yeast data showed that failure to reevaluate the FDR when combining protein IDs from different experiments resulted in an underestimation of the FDR by approximately threefold. A worst-case approximation of the FDR was only slightly larger than estimating the FDR by randomized database matches. The use of a weighted model to emphasize the most informative experimental data provided an increase in the number of IDs at a 1% FDR when compared to other meta-analysis approaches. Also, using an FDR higher than 1% results in a very high rate of false discoveries for IDs above the 1% threshold. Ideally, raw MS data will be made publicly available for complete and consistent reanalysis. In the circumstance that raw data is not available, determining a combined FDR on the basis of the worst-case estimation provides a reasonable approximation of the FDR. When combining experimental results, adding additional experiments results in diminishing and in some cases negative returns on protein identifications. It may be beneficial to include only those experiments generating the most unique identifications due to solid experimental design and sensitive instrumentation.
Collapse
Affiliation(s)
- Roger Higdon
- Bioinformatics & High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington 98101, USA
| | | | | |
Collapse
|
23
|
Wall ML, Wheeler HL, Smith J, Figeys D, Altosaar I. Mass spectrometric analysis reveals remnants of host-pathogen molecular interactions at the starch granule surface in wheat endosperm. PHYTOPATHOLOGY 2010; 100:848-854. [PMID: 20701481 DOI: 10.1094/phyto-100-9-0848] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The starch granules of wheat seed are solar energy-driven deposits of fixed carbon and, as such, present themselves as targets of pathogen attack. The seed's array of antimicrobial proteins, peptides, and small molecules comprises a molecular defense against penetrating pathogens. In turn, pathogens exhibit an arsenal of enzymes to facilitate the degradation of the host's endosperm. In this context, the starch granule surface is a relatively unexplored domain in which unique molecular barriers may be deployed to defend against and inhibit the late stages of infection. Therefore, it was compelling to explore the starch granule surface in mature wheat seed, which revealed evidence of host-pathogen molecular interactions that may have occurred during grain development. In this study, starch granules from the soft wheat Triticum aestivum cv. AC Andrew and hard wheat T. turgidum durum were isolated and water washed 20 times, and their surface proteins were digested in situ with trypsin. The peptides liberated into the supernatant and the peptides remaining at the starch granule surface were separately examined. In this way, we demonstrated that the identified proteins have a strong affinity for the starch granule surface. Proteins with known antimicrobial activity were identified, as well as several proteins from the plant pathogens Agrobacterium tumefaciens, Pectobacterium carotovorum, Fusarium graminearum, Magnaporthe grisea, Xanthomonas axonopodis, and X. oryzae. Although most of these peptides corresponded to uncharacterized hypothetical proteins of fungal pathogens, several peptide fragments were identical to cytosolic and membrane proteins of specific microbial pathogens. During development and maturation, wheat seed appeared to have resisted infection and lysed the pathogens where, upon desiccation, the molecular evidence remained fixed at the starch granule surface.
Collapse
Affiliation(s)
- Michael L Wall
- Department of Biochemistry , Microbiology and Immunology, University of Ottawa, Ottawa, Canada
| | | | | | | | | |
Collapse
|
24
|
Joo JWJ, Na S, Baek JH, Lee C, Paek E. Target-Decoy with Mass Binning: a simple and effective validation method for shotgun proteomics using high resolution mass spectrometry. J Proteome Res 2010; 9:1150-6. [PMID: 19908919 DOI: 10.1021/pr9006377] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Shotgun proteomics using mass spectrometry (MS) has become the choice for large-scale peptide and protein identification. The recent development of high-resolution mass spectrometers such as FT-ICR or Orbitrap makes it possible to identify peptides within only a few parts per million (ppm), and it is expected to dramatically improve performance of peptide identification, as compared to low-resolution instruments. To fully exploit such significantly higher mass accuracy, however, appropriate data analysis methods are required. Here, we present a new target-decoy strategy, called Target-Decoy with Mass Binning, utilizing high mass accuracy for peptide identification validation, which remains a challenging problem in MS-based proteomics. When tested on various high-resolution MS data, our method was very effective and yet simple and showed comparable or better performance when compared with other validation methods.
Collapse
Affiliation(s)
- Jong Wha J Joo
- Korea Institute of Science and Technology, Seoul, Republic of Korea
| | | | | | | | | |
Collapse
|
25
|
Abstract
Accurate and precise methods for estimating incorrect peptide and protein identifications are crucial for effective large-scale proteome analyses by tandem mass spectrometry. The target-decoy search strategy has emerged as a simple, effective tool for generating such estimations. This strategy is based on the premise that obvious, necessarily incorrect "decoy" sequences added to the search space will correspond with incorrect search results that might otherwise be deemed to be correct. With this knowledge, it is possible not only to estimate how many incorrect results are in a final data set but also to use decoy hits to guide the design of filtering criteria that sensitively partition a data set into correct and incorrect identifications.
Collapse
|
26
|
Menschaert G, Vandekerckhove TTM, Landuyt B, Hayakawa E, Schoofs L, Luyten W, Van Criekinge W. Spectral clustering in peptidomics studies helps to unravel modification profile of biologically active peptides and enhances peptide identification rate. Proteomics 2009; 9:4381-8. [PMID: 19658089 DOI: 10.1002/pmic.200900248] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
When studying the set of biologically active peptides (the so-called peptidome) of a cell type, organ, or entire organism, the identification of peptides is mostly attempted by MS. However, identification rates are often dismally unsatisfactory. A great deal of failed or missed identifications may be attributable to the wealth of modifications on peptides, some of which may originate from in vivo post-translational processes to activate the molecule, whereas others could be introduced during the tissue preparation procedures. Preliminary knowledge of the modification profile of specific peptidome samples would greatly improve identification rates. To this end we developed an approach that performs clustering of mass spectra in a way that allows us to group spectra having similar peak patterns over significant segments. Comparing members of one spectral group enables us to assess the modifications (expressed as mass shifts in Dalton) present in a peptidome sample. The clustering algorithm in this study is called Bonanza, and it was applied to MALDI-TOF/TOF MS spectra from the mouse. Peptide identification rates went up from 17 to 36% for 278 spectra obtained from the pancreatic islets and from 21 to 43% for 163 pituitary spectra. Spectral clustering with subsequent advanced database search may result in the discovery of new biologically active peptides and modifications thereof, as shown by this report indeed.
Collapse
Affiliation(s)
- Gerben Menschaert
- Department of Molecular Biotechnology, Faculty of Bioscience Engineering, Laboratory for Bioinformatics and Computational Genomics, Ghent University, Ghent, Belgium.
| | | | | | | | | | | | | |
Collapse
|
27
|
Haussmann U, Qi SW, Wolters D, Rögner M, Liu SJ, Poetsch A. Physiological adaptation of Corynebacterium glutamicum to benzoate as alternative carbon source - a membrane proteome-centric view. Proteomics 2009; 9:3635-51. [PMID: 19639586 DOI: 10.1002/pmic.200900025] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The ability of microorganisms to assimilate aromatic substances as alternative carbon sources is the basis of biodegradation of natural as well as industrial aromatic compounds. In this study, Corynebacterium glutamicum was grown on benzoate as sole carbon and energy source. To extend the scarce knowledge about physiological adaptation processes occurring in this cell compartment, the membrane proteome was investigated under quantitative and qualitative aspects by applying shotgun proteomics to reach a comprehensive survey. Membrane proteins were relatively quantified using an internal standard metabolically labeled with (15)N. Altogether, 40 proteins were found to change their abundance during growth on benzoate in comparison to glucose. A global adaptation was observed in the membrane of benzoate-grown cells, characterized by increased abundance of proteins of the respiratory chain, by a starvation response, and by changes in sulfur metabolism involving the regulator McbR. Additional to the relative quantification, stable isotope-labeled synthetic peptides were used for the absolute quantification of the two benzoate transporters of C. glutamicum, BenK and BenE. It was found that both transporters were expressed during growth on benzoate, suggesting that both contribute substantially to benzoate uptake.
Collapse
Affiliation(s)
- Ute Haussmann
- Plant Biochemistry, Ruhr University Bochum, 44801 Bochum, Germany
| | | | | | | | | | | |
Collapse
|
28
|
Kunec D, Nanduri B, Burgess SC. Experimental annotation of channel catfish virus by probabilistic proteogenomic mapping. Proteomics 2009; 9:2634-47. [PMID: 19391180 DOI: 10.1002/pmic.200800397] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Experimental identification of expressed proteins by proteomics constitutes the most reliable approach to identify genomic location and structure of protein-coding genes and substantially complements computational genome annotation. Channel catfish herpesvirus (CCV) is a simple comparative model for understanding herpesvirus biology and the evolution of the Herpesviridae. The canonical CCV genome has 76 predicted ORF and only 12 of these have been confirmed experimentally. We describe a modification of a statistical method, which assigns significance measures, q-values, to peptide identifications based on 2-D LC ESI MS/MS, real-decoy database searches and SEQUEST XCorr and DeltaC(n) scores. We used this approach to identify CCV proteins expressed during its replication in cell culture, to determine protein composition of mature virions and, consequently, to refine the canonical CCV genome annotation. To complement trypsin, we used partial proteinase K digestion, which yielded greater proteome coverage. At FDR <5%, for peptide identifications, we identified 25/76 previously predicted ORF using trypsin and 31/76 using proteinase K. Furthermore, we identified 17 novel protein-coding regions (7 potential ATG-initiated ORF). Most of these novel ORF encode small proteins (<100 amino acids). Directed, strand-specific reverse transcription real-time PCR confirmed RNA expression from 6/7 novel ATG-initiated ORF investigated.
Collapse
Affiliation(s)
- Dusan Kunec
- College of Veterinary Medicine, Mississippi State, MS 39762, USA.
| | | | | |
Collapse
|
29
|
Ovelleiro D, Carrascal M, Casas V, Abian J. LymPHOS: Design of a phosphosite database of primary human T cells. Proteomics 2009; 9:3741-51. [DOI: 10.1002/pmic.200800701] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
30
|
Abstract
Accurate modeling of peptide fragmentation is necessary for the development of robust scoring functions for peptide-spectrum matches, which are the cornerstone of MS/MS-based identification algorithms. Unfortunately, peptide fragmentation is a complex process that can involve several competing chemical pathways, which makes it difficult to develop generative probabilistic models that describe it accurately. However, the vast amounts of MS/MS data being generated now make it possible to use data-driven machine learning methods to develop discriminative ranking-based models that predict the intensity ranks of a peptide's fragment ions. We use simple sequence-based features that get combined by a boosting algorithm into models that make peak rank predictions with high accuracy. In an accompanying manuscript, we demonstrate how these prediction models are used to significantly improve the performance of peptide identification algorithms. The models can also be useful in the design of optimal multiple reaction monitoring (MRM) transitions, in cases where there is insufficient experimental data to guide the peak selection process. The prediction algorithm can also be run independently through PepNovo+, which is available for download from http://bix.ucsd.edu/Software/PepNovo.html.
Collapse
Affiliation(s)
- Ari M Frank
- Department of Computer Science and Engineering, University of California, San Diego (UCSD), 9500 Gilman Drive, La Jolla, California 92093-0404, USA.
| |
Collapse
|
31
|
Abstract
The analysis of the large volume of tandem mass spectrometry (MS/MS) proteomics data that is generated these days relies on automated algorithms that identify peptides from their mass spectra. An essential component of these algorithms is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). In this paper, we present new approach to scoring of PSMs. We argue that since this problem is at its core a ranking task (especially in the case of de novo sequencing), it can be solved effectively using machine learning ranking algorithms. We developed a new discriminative boosting-based approach to scoring. Our scoring models draw upon a large set of diverse feature functions that measure different qualities of PSMs. Our method improves the performance of our de novo sequencing algorithm beyond the current state-of-the-art, and also greatly enhances the performance of database search programs. Furthermore, by increasing the efficiency of tag filtration and improving the sensitivity of PSM scoring, we make it practical to perform large-scale MS/MS analysis, such as proteogenomic search of a six-frame translation of the human genome (in which we achieve a reduction of the running time by a factor of 15 and a 60% increase in the number of identified peptides, compared to the InsPecT database search tool). Our scoring function is incorporated into PepNovo+ which is available for download or can be run online at http://bix.ucsd.edu.
Collapse
Affiliation(s)
- Ari M Frank
- Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, Mail Code 0404 La Jolla, California 92093-0404, USA.
| |
Collapse
|
32
|
Bianco L, Mead JA, Bessant C. Comparison of Novel Decoy Database Designs for Optimizing Protein Identification Searches Using ABRF sPRG2006 Standard MS/MS Data Sets. J Proteome Res 2009; 8:1782-91. [DOI: 10.1021/pr800792z] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Luca Bianco
- Bioinformatics Group, Building 63, Cranfield University, Cranfield, Bedfordshire, United Kingdom MK43 0AL
| | - Jennifer A. Mead
- Bioinformatics Group, Building 63, Cranfield University, Cranfield, Bedfordshire, United Kingdom MK43 0AL
| | - Conrad Bessant
- Bioinformatics Group, Building 63, Cranfield University, Cranfield, Bedfordshire, United Kingdom MK43 0AL
| |
Collapse
|
33
|
Wang G, Wu WW, Zhang Z, Masilamani S, Shen RF. Decoy methods for assessing false positives and false discovery rates in shotgun proteomics. Anal Chem 2009; 81:146-59. [PMID: 19061407 DOI: 10.1021/ac801664q] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The potential of getting a significant number of false positives (FPs) in peptide-spectrum matches (PSMs) obtained by proteomic database search has been well-recognized. Among the attempts to assess FPs, the concomitant use of target and decoy databases is widely practiced. By adjusting filtering criteria, FPs and false discovery rate (FDR) can be controlled at a desired level. Although the target-decoy approach is gaining in popularity, subtle differences in decoy construction (e.g., reversing vs stochastic methods), rate calculation (e.g., total vs unique PSMs), or searching (separate vs composite) do exist among various implementations. In the present study, we evaluated the effects of these differences on FP and FDR estimations using a rat kidney protein sample and the SEQUEST search engine as an example. On the effects of decoy construction, we found that, when a single scoring filter (XCorr) was used, stochastic methods generated a higher estimation of FPs and FDR than sequence reversing methods, likely due to an increase in unique peptides. This higher estimation could largely be attenuated by creating decoy databases similar in effective size but not by a simple normalization with a unique-peptide coefficient. When multiple filters were applied, the differences seen between reversing and stochastic methods significantly diminished, suggesting multiple filterings reduce the dependency on how a decoy is constructed. For a fixed set of filtering criteria, FDR and FPs estimated by using unique PSMs were almost twice those using total PSMs. The higher estimation seemed to be dependent on data acquisition setup. As to the differences between performing separate or composite searches, in general, FDR estimated from the separate search was about three times that from the composite search. The degree of difference gradually decreased as the filtering criteria became more stringent. Paradoxically, the estimated true positives in separate search were higher when multiple filters were used. By analyzing a standard protein mixture, we demonstrated that the higher estimation of FDR and FPs in the separate search likely reflected an overestimation, which could be corrected with a simple merging procedure. Our study illustrates the relative merits of different implementations of the target-decoy strategy, which should be worth contemplating when large-scale proteomic biomarker discovery is to be attempted.
Collapse
Affiliation(s)
- Guanghui Wang
- Proteomics Core Facility, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | | | |
Collapse
|
34
|
Swanson SK, Florens L, Washburn MP. Generation and analysis of multidimensional protein identification technology datasets. Methods Mol Biol 2009; 492:1-20. [PMID: 19241024 DOI: 10.1007/978-1-59745-493-3_1] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Systems that couple two dimensional liquid chromatography (LC/LC) with tandem mass spectrometry are widely used in modern proteomics. One such system, multidimensional protein identification technology (MudPIT), couples strong cation exchange chromatography and reversed phase chromatography to tandem mass spectrometry in a single microcapillary column. Using database searching algorithms like SEQUEST and additional computational tools, researchers are able to analyze in great detail complex peptide mixtures generated from biofluids, tissues, cells, organelles, or protein complexes. This chapter describes the use of MudPIT on modern mass spectrometry instrumentation and describes a data analysis pipeline designed to provide low false positive rates and quantitative datasets.
Collapse
|
35
|
Shao C, Sun W, Li F, Yang R, Zhang L, Gao Y. Oscore: a combined score to reduce false negative rates for peptide identification in tandem mass spectrometry analysis. JOURNAL OF MASS SPECTROMETRY : JMS 2009; 44:25-31. [PMID: 18698557 DOI: 10.1002/jms.1466] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Tandem mass spectrometry (MS/MS) has been widely used in proteomics studies. Multiple algorithms have been developed for assessing matches between MS/MS spectra and peptide sequences in databases. However, it is still a challenge to reduce false negative rates without compromising the high confidence of peptide identification. In this study, we developed the score, Oscore, by logistic regression using SEQUEST and AMASS variables to identify fully tryptic peptides. Since these variables showed complicated association with each other, combining them together rather than applying them to a threshold model improved the classification of correct and incorrect peptide identifications. Oscore achieved both a lower false negative rate and a lower false positive rate than PeptideProphet on datasets from 18 known protein mixtures and several proteome-scale samples of different complexity, database size and separation methods. By a three-way comparison among Oscore, PeptideProphet and another logistic regression model which made use of PeptideProphet's variables, the main contributor for the improvement made by Oscore is discussed.
Collapse
Affiliation(s)
- Chen Shao
- Department of Physiology and Pathophysiology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing, China
| | | | | | | | | | | |
Collapse
|
36
|
Zhang J, Ma J, Dou L, Wu S, Qian X, Xie H, Zhu Y, He F. Bayesian nonparametric model for the validation of peptide identification in shotgun proteomics. Mol Cell Proteomics 2008; 8:547-57. [PMID: 19005226 DOI: 10.1074/mcp.m700558-mcp200] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Tandem mass spectrometry combined with database searching allows high throughput identification of peptides in shotgun proteomics. However, validating database search results, a problem with a lot of solutions proposed, is still advancing in some aspects, such as the sensitivity, specificity, and generalizability of the validation algorithms. Here a Bayesian nonparametric (BNP) model for the validation of database search results was developed that incorporates several popular techniques in statistical learning, including the compression of feature space with a linear discriminant function, the flexible nonparametric probability density function estimation for the variable probability structure in complex problem, and the Bayesian method to calculate the posterior probability. Importantly the BNP model is compatible with the popular target-decoy database search strategy naturally. We tested the BNP model on standard proteins and real, complex sample data sets from multiple MS platforms and compared it with Peptide-Prophet, the cutoff-based method, and a simple nonparametric method (proposed by us previously). The performance of the BNP model was shown to be superior for all data sets searched on sensitivity and generalizability. Some high quality matches that had been filtered out by other methods were detected and assigned with high probability by the BNP model. Thus, the BNP model could be able to validate the database search results effectively and extract more information from MS/MS data.
Collapse
Affiliation(s)
- Jiyang Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing 102206, China
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Staphylococcus aureus elicits marked alterations in the airway proteome during early pneumonia. Infect Immun 2008; 76:5862-72. [PMID: 18852243 DOI: 10.1128/iai.00865-08] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Pneumonia caused by Staphylococcus aureus is a growing concern in the health care community. We hypothesized that characterization of the early innate immune response to bacteria in the lungs would provide insight into the mechanisms used by the host to protect itself from infection. An adult mouse model of Staphylococcus aureus pneumonia was utilized to define the early events in the innate immune response and to assess the changes in the airway proteome during the first 6 h of pneumonia. S. aureus actively replicated in the lungs of mice inoculated intranasally under anesthesia to cause significant morbidity and mortality. By 6 h postinoculation, the release of proinflammatory cytokines caused effective recruitment of neutrophils to the airway. Neutrophil influx, loss of alveolar architecture, and consolidated pneumonia were observed histologically 6 h postinoculation. Bronchoalveolar lavage fluids from mice inoculated with phosphate-buffered saline (PBS) or S. aureus were depleted of overabundant proteins and subjected to strong cation exchange fractionation followed by liquid chromatography and tandem mass spectrometry to identify the proteins present in the airway. No significant changes in response to PBS inoculation or 30 min following S. aureus inoculation were observed. However, a dramatic increase in extracellular proteins was observed 6 h postinoculation with S. aureus, with the increase dominated by inflammatory and coagulation proteins. The data presented here provide a comprehensive evaluation of the rapid and vigorous innate immune response mounted in the host airway during the earliest stages of S. aureus pneumonia.
Collapse
|
38
|
Abstract
The persistence of Porphyromonas gingivalis in the inflammatory environment of the periodontal pocket requires an ability to overcome oxidative stress. DNA damage is a major consequence of oxidative stress. Unlike the case for other organisms, our previous report suggests a role for a non-base excision repair mechanism for the removal of 8-oxo-7,8-dihydroguanine (8-oxo-G) in P. gingivalis. Because the uvrB gene is known to be important in nucleotide excision repair, the role of this gene in the repair of oxidative stress-induced DNA damage was investigated. A 3.1-kb fragment containing the uvrB gene was PCR amplified from the chromosomal DNA of P. gingivalis W83. This gene was insertionally inactivated using the ermF-ermAM antibiotic cassette and used to create a uvrB-deficient mutant by allelic exchange. When plated on brucella blood agar, the mutant strain, designated P. gingivalis FLL144, was similar in black pigmentation and beta-hemolysis to the parent strain. In addition, P. gingivalis FLL144 demonstrated no significant difference in growth rate, proteolytic activity, or sensitivity to hydrogen peroxide from that of the parent strain. However, in contrast to the wild type, P. gingivalis FLL144 was significantly sensitive to UV irradiation. The enzymatic removal of 8-oxo-G from duplex DNA was unaffected by the inactivation of the uvrB gene. DNA affinity fractionation identified unique proteins that preferentially bound to the oligonucleotide fragment carrying the 8-oxo-G lesion. Collectively, these results suggest that the repair of oxidative stress-induced DNA damage involving 8-oxo-G may occur by a still undescribed mechanism in P. gingivalis.
Collapse
|
39
|
Taylor RD, Saparno A, Blackwell B, Anoop V, Gleddie S, Tinker NA, Harris LJ. Proteomic analyses of Fusarium graminearum grown under mycotoxin-inducing conditions. Proteomics 2008; 8:2256-65. [PMID: 18452225 DOI: 10.1002/pmic.200700610] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Non-gel-based quantitative proteomics technology was used to profile protein expression differences when Fusarium graminearum was induced to produce trichothecenes in vitro. As F. graminearum synthesizes and secretes trichothecenes early in the cereal host invasion process, we hypothesized that proteins contributing to infection would also be induced under conditions favouring mycotoxin synthesis. Protein samples were extracted from three biological replicates of a time course study and subjected to iTRAQ (isobaric tags for relative and absolute quantification) analysis. Statistical analysis of a filtered dataset of 435 proteins revealed 130 F. graminearum proteins that exhibited significant changes in expression, of which 72 were upregulated relative to their level at the initial phase of the time course. There was good agreement between upregulated proteins identified by 2-D PAGE/MS/MS and iTRAQ. RT-PCR and northern hybridization confirmed that genes encoding proteins which were upregulated based on iTRAQ were also transcriptionally active under mycotoxin producing conditions. Numerous candidate pathogenicity proteins were identified using this technique. These will provide leads in the search for mechanisms and markers of host invasion and novel antifungal targets.
Collapse
Affiliation(s)
- Rebecca D Taylor
- Eastern Cereal and Oilseed Research Centre, Agriculture and Agri-Food Canada, Ottawa, Ontario, Canada
| | | | | | | | | | | | | |
Collapse
|
40
|
Ramaroson MF, Ruby J, Goshe MB, Liu HC. Changes in the Gallus gallus Proteome Induced by Marek’s Disease Virus. J Proteome Res 2008; 7:4346-58. [DOI: 10.1021/pr800268h] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Mialy F. Ramaroson
- Department of Molecular and Structural Biochemistry, and Department of Animal Science, North Carolina State University, Raleigh, North Carolina 27695
| | - James Ruby
- Department of Molecular and Structural Biochemistry, and Department of Animal Science, North Carolina State University, Raleigh, North Carolina 27695
| | - Michael B. Goshe
- Department of Molecular and Structural Biochemistry, and Department of Animal Science, North Carolina State University, Raleigh, North Carolina 27695
| | - Hsiao-Ching Liu
- Department of Molecular and Structural Biochemistry, and Department of Animal Science, North Carolina State University, Raleigh, North Carolina 27695
| |
Collapse
|
41
|
Kim S, Gupta N, Pevzner PA. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J Proteome Res 2008; 7:3354-63. [PMID: 18597511 PMCID: PMC2689316 DOI: 10.1021/pr8001244] [Citation(s) in RCA: 332] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
A key problem in computational proteomics is distinguishing between correct and false peptide identifications. We argue that evaluating the error rates of peptide identifications is not unlike computing generating functions in combinatorics. We show that the generating functions and their derivatives ( spectral energy and spectral probability) represent new features of tandem mass spectra that, similarly to Delta-scores, significantly improve peptide identifications. Furthermore, the spectral probability provides a rigorous solution to the problem of computing statistical significance of spectral identifications. The spectral energy/probability approach improves the sensitivity-specificity tradeoff of existing MS/MS search tools, addresses the notoriously difficult problem of "one-hit-wonders" in mass spectrometry, and often eliminates the need for decoy database searches. We therefore argue that the generating function approach has the potential to increase the number of peptide identifications in MS/MS searches.
Collapse
Affiliation(s)
- Sangtae Kim
- Department of Computer Science and Engineering, University of California San Diego, La Jolla CA 92093, USA
| | | | | |
Collapse
|
42
|
Sardana G, Jung K, Stephan C, Diamandis EP. Proteomic analysis of conditioned media from the PC3, LNCaP, and 22Rv1 prostate cancer cell lines: discovery and validation of candidate prostate cancer biomarkers. J Proteome Res 2008; 7:3329-38. [PMID: 18578523 DOI: 10.1021/pr8003216] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Early detection of prostate cancer is problematic due to the lack of a marker that has high diagnostic sensitivity and specificity. The prostate specific antigen (PSA) test, in combination with digital rectal examination, is the gold standard for prostate cancer diagnosis. However, this modality suffers from low specificity. Therefore, specific markers for clinically relevant prostate cancer are needed. Our objective was to proteomically characterize the conditioned media from three human prostate cancer cell lines of differing origin [PC3 (bone metastasis), LNCaP (lymph node metastasis), and 22Rv1 (localized to prostate)] to identify secreted proteins that could serve as novel prostate cancer biomarkers. Each cell line was cultured in triplicate, followed by a bottom-up analysis of the peptides by two-dimensional chromatography and tandem mass spectrometry. Approximately, 12% (329) of the proteins identified were classified as extracellular and 18% (504) as membrane-bound among which were known prostate cancer biomarkers such as PSA and KLK2. To select the most promising candidates for further investigation, tissue specificity, biological function, disease association based on literature searches, and comparison of protein overlap with the proteome of seminal plasma and serum were examined. On the basis of this, four novel candidates, follistatin, chemokine (C-X-C motif) ligand 16, pentraxin 3 and spondin 2, were validated in the serum of patients with and without prostate cancer. The proteins presented in this study represent a comprehensive sampling of the secreted and shed proteins expressed by prostate cancer cells, which may be useful as diagnostic, prognostic or predictive serological markers for prostate cancer.
Collapse
Affiliation(s)
- Girish Sardana
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
| | | | | | | |
Collapse
|
43
|
Song-Feng W, Xiao-Fang X, Ji-Yang Z, Wan-Tao Y, Jie M, Xiao-Hong Q, Yun-Ping Z, Fu-Chu H. Reversed-shift Database: Alternative Method for the of Evaluation Peptide Mass Fingerprint Results. CHINESE JOURNAL OF ANALYTICAL CHEMISTRY 2008. [DOI: 10.1016/s1872-2040(08)60029-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
44
|
Higdon R, Hogan JM, Kolker N, van Belle G, Kolker E. Experiment-specific estimation of peptide identification probabilities using a randomized database. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2008; 11:351-65. [PMID: 18092908 DOI: 10.1089/omi.2007.0040] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Determining the error rate for peptide and protein identification accurately and reliably is necessary to enable evaluation and crosscomparisons of high throughput proteomics experiments. Currently, peptide identification is based either on preset scoring thresholds or on probabilistic models trained on datasets that are often dissimilar to experimental results. The false discovery rates (FDR) and peptide identification probabilities for these preset thresholds or models often vary greatly across different experimental treatments, organisms, or instruments used in specific experiments. To overcome these difficulties, randomized databases have been used to estimate the FDR. However, the cumulative FDR may include low probability identifications when there are a large number of peptide identifications and exclude high probability identifications when there are few. To overcome this logical inconsistency, this study expands the use of randomized databases to generate experiment-specific estimates of peptide identification probabilities. These experiment-specific probabilities are generated by logistic and Loess regression models of the peptide scores obtained from original and reshuffled database matches. These experiment-specific probabilities are shown to very well approximate "true" probabilities based on known standard protein mixtures across different experiments. Probabilities generated by the earlier Peptide_Prophet and more recent LIPS models are shown to differ significantly from this study's experiment-specific probabilities, especially for unknown samples. The experiment-specific probabilities reliably estimate the accuracy of peptide identifications and overcome potential logical inconsistencies of the cumulative FDR. This estimation method is demonstrated using a Sequest database search, LIPS model, and a reshuffled database. However, this approach is generally applicable to any search algorithm, peptide scoring, and statistical model when using a randomized database.
Collapse
Affiliation(s)
- Roger Higdon
- Seattle Children's Hospital and Regional Medical Center, Seattle, WA 98101, USA
| | | | | | | | | |
Collapse
|
45
|
Vasilescu J, Smith JC, Zweitzig DR, Denis NJ, Haines DS, Figeys D. Systematic determination of ion score cutoffs based on calculated false positive rates: application for identifying ubiquitinated proteins by tandem mass spectrometry. JOURNAL OF MASS SPECTROMETRY : JMS 2008; 43:296-304. [PMID: 17957819 DOI: 10.1002/jms.1297] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
We report a simple approach for determining ion score cutoffs that permit the confident identification of ubiquitinated proteins by tandem mass spectrometry (MS/MS). Initial experiments involving the analysis of gel bands containing multi-Ubiquitin chains with quadrupole time-of-flight and quadrupole ion trap mass spectrometers revealed that standard ion score cutoffs used for database searching were not sufficiently stringent. We also found that false positive and false negative rates (FPR and FNR) varied significantly depending on the cutoff scores used and that appropriate cutoffs could only be determined following a systematic evaluation of false positive rates. When standard cutoff scores were used for the analysis of complex mixtures of ubiquitinated proteins, unacceptably high FPR were observed. Finally, we found that FPR for ubiquitinated proteins are affected by the size of the protein database that is searched. These observations may be applicable for the study of other post-translational modifications.
Collapse
Affiliation(s)
- Julian Vasilescu
- Ottawa Institute of Systems Biology, University of Ottawa, 451 Smyth Road, Ottawa, ON, K1H 8M5, Canada
| | | | | | | | | | | |
Collapse
|
46
|
Zhang J, Li J, Liu X, Xie H, Zhu Y, He F. A nonparametric model for quality control of database search results in shotgun proteomics. BMC Bioinformatics 2008; 9:29. [PMID: 18205957 PMCID: PMC2267700 DOI: 10.1186/1471-2105-9-29] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2007] [Accepted: 01/21/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of complex samples with tandem mass spectrometry (MS/MS) has become routine in proteomic research. However, validation of database search results creates a bottleneck in MS/MS data processing. Recently, methods based on a randomized database have become popular for quality control of database search results. However, a consequent problem is the ignorance of how to combine different database search scores to improve the sensitivity of randomized database methods. RESULTS In this paper, a multivariate nonlinear discriminate function (DF) based on the multivariate nonparametric density estimation technique was used to filter out false-positive database search results with a predictable false positive rate (FPR). Application of this method to control datasets of different instruments (LCQ, LTQ, and LTQ/FT) yielded an estimated FPR close to the actual FPR. As expected, the method was more sensitive when more features were used. Furthermore, the new method was shown to be more sensitive than two commonly used methods on 3 complex sample datasets and 3 control datasets. CONCLUSION Using the nonparametric model, a more flexible DF can be obtained, resulting in improved sensitivity and good FPR estimation. This nonparametric statistical technique is a powerful tool for tackling the complexity and diversity of datasets in shotgun proteomics.
Collapse
Affiliation(s)
- Jiyang Zhang
- College of Mechanical & Electronic Engineering and Automatization, National University of Defense Technology, Changsha, 410073, China.
| | | | | | | | | | | |
Collapse
|
47
|
Zhang J, Li J, Xie H, Zhu Y, He F. A new strategy to filter out false positive identifications of peptides in SEQUEST database search results. Proteomics 2008; 7:4036-44. [PMID: 17952874 DOI: 10.1002/pmic.200600929] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Based on the randomized database method and a linear discriminant function (LDF) model, a new strategy to filter out false positive matches in SEQUEST database search results is proposed. Given an experiment MS/MS dataset and a protein sequence database, a randomized database is constructed and merged with the original database. Then, all MS/MS spectra are searched against the combined database. For each expected false positive rate (FPR), LDFs are constructed for different charge states and used to filter out the false positive matches from the normal database. In order to investigate the error of FPR estimation, the new strategy was applied to a reference dataset. As a result, the estimated FPR was very close to the actual FPR. While applied to a human K562 cell line dataset, which is a complicated dataset from real sample, more matches could be confirmed than the traditional cutoff-based methods at the same estimated FPR. Also, though most of the results confirmed by the LDF model were consistent with those of PeptideProphet, the LDF model could still provide complementary information. These results indicate that the new method can reliably control the FPR of peptide identifications and is more sensitive than traditional cutoff-based methods.
Collapse
Affiliation(s)
- Jiyang Zhang
- College of Mechanical and Electronic Engineering and Automatization, National University of Defense Technology, Changsha, China
| | | | | | | | | |
Collapse
|
48
|
Host airway proteins interact with Staphylococcus aureus during early pneumonia. Infect Immun 2008; 76:888-98. [PMID: 18195024 DOI: 10.1128/iai.01301-07] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Staphylococcus aureus is a major cause of hospital-acquired pneumonia and is emerging as an important etiological agent of community-acquired pneumonia. Little is known about the specific host-pathogen interactions that occur when S. aureus first enters the airway. A shotgun proteomics approach was utilized to identify the airway proteins associated with S. aureus during the first 6 h of infection. Host proteins eluted from bacteria recovered from the airways of mice 30 min or 6 h following intranasal inoculation under anesthesia were subjected to liquid chromatography and tandem mass spectrometry. A total of 513 host proteins were associated with S. aureus 30 min and/or 6 h postinoculation. A majority of the identified proteins were host cytosolic proteins, suggesting that S. aureus was rapidly internalized by phagocytes in the airway and that significant host cell lysis occurred during early infection. In addition, extracellular matrix and secreted proteins, including fibronectin, antimicrobial peptides, and complement components, were associated with S. aureus at both time points. The interaction of 12 host proteins shown to bind to S. aureus in vitro was demonstrated in vivo for the first time. The association of hemoglobin, which is thought to be the primary staphylococcal iron source during infection, with S. aureus in the airway was validated by immunoblotting. Thus, we used our recently developed S. aureus pneumonia model and shotgun proteomics to validate previous in vitro findings and to identify nearly 500 other proteins that interact with S. aureus in vivo. The data presented here provide novel insights into the host-pathogen interactions that occur when S. aureus enters the airway.
Collapse
|
49
|
Wang X, Huang L. Identifying Dynamic Interactors of Protein Complexes by Quantitative Mass Spectrometry. Mol Cell Proteomics 2008; 7:46-57. [DOI: 10.1074/mcp.m700261-mcp200] [Citation(s) in RCA: 167] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
50
|
Abstract
The "Paris Guidelines" have begun the process of standardizing reporting for proteomics. New bioinformatics tools have improved the process for estimating error rates of peptide identifications. This perspective seeks to consider these advances in the context of proteomics' short history. As increasing numbers of proteomics papers come from biologists rather than technologists, developing consensus standards for estimating error will be increasingly necessary. Standardizing this assessment should be welcomed as a reflection of the growing impact of proteomic technologies.
Collapse
Affiliation(s)
- David L Tabb
- Department of Biomedical Informatics and Biochemistry, Vanderbilt University, Nashville, Tennessee 37232-8575, USA.
| |
Collapse
|