1
|
Wang X, Shen S, Rasam SS, Qu J. MS1 ion current-based quantitative proteomics: A promising solution for reliable analysis of large biological cohorts. MASS SPECTROMETRY REVIEWS 2019; 38:461-482. [PMID: 30920002 PMCID: PMC6849792 DOI: 10.1002/mas.21595] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 02/28/2019] [Indexed: 05/04/2023]
Abstract
The rapidly-advancing field of pharmaceutical and clinical research calls for systematic, molecular-level characterization of complex biological systems. To this end, quantitative proteomics represents a powerful tool but an optimal solution for reliable large-cohort proteomics analysis, as frequently involved in pharmaceutical/clinical investigations, is urgently needed. Large-cohort analysis remains challenging owing to the deteriorating quantitative quality and snowballing missing data and false-positive discovery of altered proteins when sample size increases. MS1 ion current-based methods, which have become an important class of label-free quantification techniques during the past decade, show considerable potential to achieve reproducible protein measurements in large cohorts with high quantitative accuracy/precision. Nonetheless, in order to fully unleash this potential, several critical prerequisites should be met. Here we provide an overview of the rationale of MS1-based strategies and then important considerations for experimental and data processing techniques, with the emphasis on (i) efficient and reproducible sample preparation and LC separation; (ii) sensitive, selective and high-resolution MS detection; iii)accurate chromatographic alignment; (iv) sensitive and selective generation of quantitative features; and (v) optimal post-feature-generation data quality control. Prominent technical developments in these aspects are discussed. Finally, we reviewed applications of MS1-based strategy in disease mechanism studies, biomarker discovery, and pharmaceutical investigations.
Collapse
Affiliation(s)
- Xue Wang
- Department of Cell Stress BiologyRoswell Park Cancer InstituteBuffaloNew York
| | - Shichen Shen
- Department of Pharmaceutical SciencesUniversity at BuffaloState University of New YorkNew YorkNew York
| | - Sailee Suryakant Rasam
- Department of Biochemistry, University at BuffaloState University of New YorkNew YorkNew York
| | - Jun Qu
- Department of Cell Stress BiologyRoswell Park Cancer InstituteBuffaloNew York
- Department of Pharmaceutical SciencesUniversity at BuffaloState University of New YorkNew YorkNew York
- Department of Biochemistry, University at BuffaloState University of New YorkNew YorkNew York
| |
Collapse
|
2
|
Zhou WJ, Yang H, Zeng WF, Zhang K, Chi H, He SM. pValid: Validation Beyond the Target-Decoy Approach for Peptide Identification in Shotgun Proteomics. J Proteome Res 2019; 18:2747-2758. [PMID: 31244209 DOI: 10.1021/acs.jproteome.8b00993] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
As the de facto validation method in mass spectrometry-based proteomics, the target-decoy approach determines a threshold to estimate the false discovery rate and then filters those identifications beyond the threshold. However, the incorrect identifications within the threshold are still unknown and further validation methods are needed. In this study, we characterized a framework of validation and investigated a number of common and novel validation methods. We first defined the accuracy of a validation method by its false-positive rate (FPR) and false-negative rate (FNR) and, further, proved that a validation method with lower FPR and FNR led to identifications with higher sensitivity and precision. Then we proposed a validation method named pValid that incorporated an open database search and a theoretical spectrum prediction strategy via a machine-learning technology. pValid was compared with four common validation methods as well as a synthetic peptide validation method. Tests on three benchmark data sets indicated that pValid had an FPR of 0.03% and an FNR of 1.79% on average, both superior to the other four common validation methods. Tests on a synthetic peptide data set also indicated that the FPR and FNR of pValid were better than those of the synthetic peptide validation method. Tests on a large-scale human proteome data set indicated that pValid successfully flagged the highest number of incorrect identifications among all five methods. Further considering its cost-effectiveness, pValid has the potential to be a feasible validation tool for peptide identification.
Collapse
Affiliation(s)
- Wen-Jing Zhou
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Hao Yang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Kun Zhang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| |
Collapse
|
3
|
LeDuc RD, Fellers RT, Early BP, Greer JB, Shams DP, Thomas PM, Kelleher NL. Accurate Estimation of Context-Dependent False Discovery Rates in Top-Down Proteomics. Mol Cell Proteomics 2019; 18:796-805. [PMID: 30647073 PMCID: PMC6442365 DOI: 10.1074/mcp.ra118.000993] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 01/04/2019] [Indexed: 11/06/2022] Open
Abstract
Within the last several years, top-down proteomics has emerged as a high throughput technique for protein and proteoform identification. This technique has the potential to identify and characterize thousands of proteoforms within a single study, but the absence of accurate false discovery rate (FDR) estimation could hinder the adoption and consistency of top-down proteomics in the future. In automated identification and characterization of proteoforms, FDR calculation strongly depends on the context of the search. The context includes MS data quality, the database being interrogated, the search engine, and the parameters of the search. Particular to top-down proteomics-there are four molecular levels of study: proteoform spectral match (PrSM), protein, isoform, and proteoform. Here, a context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We examined several search contexts and found that an FDR calculated at the PrSM level under-reported the true FDR at the protein level by an average of 24-fold. We present a new open-source tool, the TDCD_FDR_Calculator, which provides a scalable, context-dependent FDR calculation that can be applied post-search to enhance the quality of results in top-down proteomics from any search engine.
Collapse
Affiliation(s)
- Richard D LeDuc
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois;.
| | - Ryan T Fellers
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois
| | - Bryan P Early
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois;; §Department of Molecular Biosciences, Northwestern University, Evanston, Illinois
| | - Joseph B Greer
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois
| | - Daniel P Shams
- ¶Interdisciplinary Biological Sciences, Northwestern University, Evanston, Illinois
| | - Paul M Thomas
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois;; §Department of Molecular Biosciences, Northwestern University, Evanston, Illinois
| | - Neil L Kelleher
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois;; §Department of Molecular Biosciences, Northwestern University, Evanston, Illinois;; Department of Chemistry and the Feinberg School of Medicine, Northwestern University, Evanston, Illinois.
| |
Collapse
|
4
|
van der Ende EL, Meeter LH, Stingl C, van Rooij JGJ, Stoop MP, Nijholt DAT, Sanchez-Valle R, Graff C, Öijerstedt L, Grossman M, McMillan C, Pijnenburg YAL, Laforce R, Binetti G, Benussi L, Ghidoni R, Luider TM, Seelaar H, van Swieten JC. Novel CSF biomarkers in genetic frontotemporal dementia identified by proteomics. Ann Clin Transl Neurol 2019; 6:698-707. [PMID: 31019994 PMCID: PMC6469343 DOI: 10.1002/acn3.745] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 02/05/2019] [Accepted: 02/06/2019] [Indexed: 12/11/2022] Open
Abstract
Objective To identify novel CSF biomarkers in GRN‐associated frontotemporal dementia (FTD) by proteomics using mass spectrometry (MS). Methods Unbiased MS was applied to CSF samples from 19 presymptomatic and 9 symptomatic GRN mutation carriers and 24 noncarriers. Protein abundances were compared between these groups. Proteins were then selected for validation if identified by ≥4 peptides and if fold change was ≤0.5 or ≥2.0. Validation and absolute quantification by parallel reaction monitoring (PRM), a high‐resolution targeted MS method, was performed on an international cohort (n = 210) of presymptomatic and symptomatic GRN, C9orf72 and MAPT mutation carriers. Results Unbiased MS revealed 20 differentially abundant proteins between symptomatic mutation carriers and noncarriers and nine between symptomatic and presymptomatic carriers. Seven of these proteins fulfilled our criteria for validation. PRM analyses revealed that symptomatic GRN mutation carriers had significantly lower levels of neuronal pentraxin receptor (NPTXR), receptor‐type tyrosine‐protein phosphatase N2 (PTPRN2), neurosecretory protein VGF, chromogranin‐A (CHGA), and V‐set and transmembrane domain‐containing protein 2B (VSTM2B) than presymptomatic carriers and noncarriers. Symptomatic C9orf72 mutation carriers had lower levels of NPTXR, PTPRN2, CHGA, and VSTM2B than noncarriers, while symptomatic MAPT mutation carriers had lower levels of NPTXR and CHGA than noncarriers. Interpretation We identified and validated five novel CSF biomarkers in GRN‐associated FTD. Our results show that synaptic, secretory vesicle, and inflammatory proteins are dysregulated in the symptomatic stage and may provide new insights into the pathophysiology of genetic FTD. Further validation is needed to investigate their clinical applicability as diagnostic or monitoring biomarkers.
Collapse
Affiliation(s)
- Emma L van der Ende
- Department of Neurology Erasmus Medical Center PO Box 2040 3015 GD Rotterdam The Netherlands
| | - Lieke H Meeter
- Department of Neurology Erasmus Medical Center PO Box 2040 3015 GD Rotterdam The Netherlands
| | - Christoph Stingl
- Laboratory of Neuro-oncology Clinical and Cancer Proteomics Department of Neurology Erasmus Medical Center PO Box 2040 3000 CA Rotterdam The Netherlands
| | - Jeroen G J van Rooij
- Department of Neurology Erasmus Medical Center PO Box 2040 3015 GD Rotterdam The Netherlands.,Department of Internal Medicine Erasmus Medical Center PO Box 2040 3015 GD Rotterdam The Netherlands
| | - Marcel P Stoop
- Laboratory of Neuro-oncology Clinical and Cancer Proteomics Department of Neurology Erasmus Medical Center PO Box 2040 3000 CA Rotterdam The Netherlands
| | - Diana A T Nijholt
- Laboratory of Neuro-oncology Clinical and Cancer Proteomics Department of Neurology Erasmus Medical Center PO Box 2040 3000 CA Rotterdam The Netherlands
| | - Raquel Sanchez-Valle
- Alzheimer's Disease and Other Cognitive Disorders Unit Department of Neurology Hospital Clínic Institut d'Investigació Biomèdica August Pi i Sunyer Villarroel, 170 08036 Barcelona Spain
| | - Caroline Graff
- Division of Neurogeriatrics Department NVS Karolinska Institutet Center for Alzheimer Research Visionsgatan 4 171 64 Solna Stockholm Sweden.,Unit for Hereditary Dementias Theme Aging Karolinska University Hospital-Solna 171 64 Stockholm Sweden
| | - Linn Öijerstedt
- Division of Neurogeriatrics Department NVS Karolinska Institutet Center for Alzheimer Research Visionsgatan 4 171 64 Solna Stockholm Sweden.,Unit for Hereditary Dementias Theme Aging Karolinska University Hospital-Solna 171 64 Stockholm Sweden
| | - Murray Grossman
- Department of Neurology Penn Frontotemporal Degeneration Center University of Pennsylvania Perelman School of Medicine Philadelphia Pennsylvania
| | - Corey McMillan
- Department of Neurology Penn Frontotemporal Degeneration Center University of Pennsylvania Perelman School of Medicine Philadelphia Pennsylvania
| | - Yolande A L Pijnenburg
- Alzheimer Center and Department of Neurology Neuroscience Campus Amsterdam VU University Medical Center PO Box 7057 1007 MB Amsterdam The Netherlands
| | - Robert Laforce
- Clinique Interdisciplinaire de Mémoire (CIME) CHU de Québec Département des Sciences Neurologiques Université Laval Québec Québec Canada
| | - Giuliano Binetti
- Molecular Markers Laboratory IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli via Pilastroni 4 Brescia 25125 Italy.,MAC Memory Clinic IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli via Pilastroni 4 Brescia 25125 Italy
| | - Luisa Benussi
- Molecular Markers Laboratory IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli via Pilastroni 4 Brescia 25125 Italy
| | - Roberta Ghidoni
- Molecular Markers Laboratory IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli via Pilastroni 4 Brescia 25125 Italy
| | - Theo M Luider
- Laboratory of Neuro-oncology Clinical and Cancer Proteomics Department of Neurology Erasmus Medical Center PO Box 2040 3000 CA Rotterdam The Netherlands
| | - Harro Seelaar
- Department of Neurology Erasmus Medical Center PO Box 2040 3015 GD Rotterdam The Netherlands
| | - John C van Swieten
- Department of Neurology Erasmus Medical Center PO Box 2040 3015 GD Rotterdam The Netherlands
| |
Collapse
|
5
|
The M, Edfors F, Perez-Riverol Y, Payne SH, Hoopmann MR, Palmblad M, Forsström B, Käll L. A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms. J Proteome Res 2018; 17:1879-1886. [PMID: 29631402 DOI: 10.1021/acs.jproteome.7b00899] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A natural way to benchmark the performance of an analytical experimental setup is to use samples of known composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.
Collapse
Affiliation(s)
- Matthew The
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden
| | - Fredrik Edfors
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Samuel H Payne
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99352 , United States
| | - Michael R Hoopmann
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics , Leiden University Medical Center , 2300 RC Leiden , The Netherlands
| | - Björn Forsström
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden
| |
Collapse
|
6
|
Higdon R, Kolker E. Can "normal" protein expression ranges be estimated with high-throughput proteomics? J Proteome Res 2015; 14:2398-407. [PMID: 25877823 DOI: 10.1021/acs.jproteome.5b00176] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Although biological science discovery often involves comparing conditions to a normal state, in proteomics little is actually known about normal. Two Human Proteome studies featured in Nature offer new insights into protein expression and an opportunity to assess how high-throughput proteomics measures normal protein ranges. We use data from these studies to estimate technical and biological variability in protein expression and compare them to other expression data sets from normal tissue. Results show that measured protein expression across same-tissue replicates vary by ±4- to 10-fold for most proteins. Coefficients of variation (CV) for protein expression measurements range from 62% to 117% across different tissue experiments; however, adjusting for technical variation reduced this variability by as much as 50%. In addition, the CV could also be reduced by limiting comparisons to proteins with at least 3 or more unique peptide identifications as the CV was on average 33% lower than for proteins with 2 or fewer peptide identifications. We also selected 13 housekeeping proteins and genes that were expressed across all tissues with low variability to determine their utility as a reference set for normalization and comparative purposes. These results present the first step toward estimating normal protein ranges by determining the variability in expression measurements through combining publicly available data. They support an approach that combines standard protocols with replicates of normal tissues to estimate normal protein ranges for large numbers of proteins and tissues. This would be a tremendous resource for normal cellular physiology and comparisons of proteomics studies.
Collapse
Affiliation(s)
- Roger Higdon
- †Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington 98101, United States.,‡CDO Analytics, Seattle Children's Hospital, Seattle, Washington 98101, United States
| | - Eugene Kolker
- †Bioinformatics and High-Throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington 98101, United States.,‡CDO Analytics, Seattle Children's Hospital, Seattle, Washington 98101, United States.,§Departments of Biomedical Informatics and Medical Education and Pediatrics, University of Washington, Seattle, Washington 98195, United States.,∥Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts 02115, United States
| |
Collapse
|
7
|
Montague E, Janko I, Stanberry L, Lee E, Choiniere J, Anderson N, Stewart E, Broomall W, Higdon R, Kolker N, Kolker E. Beyond protein expression, MOPED goes multi-omics. Nucleic Acids Res 2014; 43:D1145-51. [PMID: 25404128 PMCID: PMC4383969 DOI: 10.1093/nar/gku1175] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
MOPED (Multi-Omics Profiling Expression Database; http://moped.proteinspire.org) has transitioned from solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface, MOPED presents consistently processed data for gene, protein and pathway expression. To improve data quality, consistency and use, MOPED includes metadata detailing experimental design and analysis methods. The multi-omics data are integrated through direct links between genes and proteins and further connected to pathways and experiments. MOPED now contains over 5 million records, information for approximately 75 000 genes and 50 000 proteins from four organisms (human, mouse, worm, yeast). These records correspond to 670 unique combinations of experiment, condition, localization and tissue. MOPED includes the following new features: pathway expression, Pathway Details pages, experimental metadata checklists, experiment summary statistics and more advanced searching tools. Advanced searching enables querying for genes, proteins, experiments, pathways and keywords of interest. The system is enhanced with visualizations for comparing across different data types. In the future MOPED will expand the number of organisms, increase integration with pathways and provide connections to disease.
Collapse
Affiliation(s)
- Elizabeth Montague
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Imre Janko
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Larissa Stanberry
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Elaine Lee
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - John Choiniere
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Nathaniel Anderson
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Elizabeth Stewart
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - William Broomall
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Roger Higdon
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Natali Kolker
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Eugene Kolker
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101 Departments of Biomedical Informatics and Medical Education and Pediatrics, University of Washington, Seattle, WA, USA 98109 Department of Chemistry and Chemical Biology, College of Science, Northeastern University, Boston, MA 02115
| |
Collapse
|
8
|
Montague E, Stanberry L, Higdon R, Janko I, Lee E, Anderson N, Choiniere J, Stewart E, Yandl G, Broomall W, Kolker N, Kolker E. MOPED 2.5--an integrated multi-omics resource: multi-omics profiling expression database now includes transcriptomics data. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:335-43. [PMID: 24910945 PMCID: PMC4048574 DOI: 10.1089/omi.2014.0061] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Multi-omics data-driven scientific discovery crucially rests on high-throughput technologies and data sharing. Currently, data are scattered across single omics repositories, stored in varying raw and processed formats, and are often accompanied by limited or no metadata. The Multi-Omics Profiling Expression Database (MOPED, http://moped.proteinspire.org ) version 2.5 is a freely accessible multi-omics expression database. Continual improvement and expansion of MOPED is driven by feedback from the Life Sciences Community. In order to meet the emergent need for an integrated multi-omics data resource, MOPED 2.5 now includes gene relative expression data in addition to protein absolute and relative expression data from over 250 large-scale experiments. To facilitate accurate integration of experiments and increase reproducibility, MOPED provides extensive metadata through the Data-Enabled Life Sciences Alliance (DELSA Global, http://delsaglobal.org ) metadata checklist. MOPED 2.5 has greatly increased the number of proteomics absolute and relative expression records to over 500,000, in addition to adding more than four million transcriptomics relative expression records. MOPED has an intuitive user interface with tabs for querying different types of omics expression data and new tools for data visualization. Summary information including expression data, pathway mappings, and direct connection between proteins and genes can be viewed on Protein and Gene Details pages. These connections in MOPED provide a context for multi-omics expression data exploration. Researchers are encouraged to submit omics data which will be consistently processed into expression summaries. MOPED as a multi-omics data resource is a pivotal public database, interdisciplinary knowledge resource, and platform for multi-omics understanding.
Collapse
Affiliation(s)
- Elizabeth Montague
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Larissa Stanberry
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Roger Higdon
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Imre Janko
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Elaine Lee
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Nathaniel Anderson
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - John Choiniere
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Elizabeth Stewart
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Gregory Yandl
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - William Broomall
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Natali Kolker
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Eugene Kolker
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Departments of Biomedical Informatics and Medical Education and Pediatrics, University of Washington, Seattle, Washington
| |
Collapse
|
9
|
Stingl C, Söderquist M, Karlsson O, Borén M, Luider TM. Uncovering Effects of Ex Vivo Protease Activity during Proteomics and Peptidomics Sample Extraction in Rat Brain Tissue by Oxygen-18 Labeling. J Proteome Res 2014; 13:2807-17. [DOI: 10.1021/pr401232e] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Christoph Stingl
- Department
of Neurology, Erasmus University Medical Center, 3000 CA Rotterdam, The Netherlands
| | | | - Oskar Karlsson
- Department
of Pharmaceutical Biosciences, Uppsala University, 751 05 Uppsala, Sweden
| | | | - Theo M. Luider
- Department
of Neurology, Erasmus University Medical Center, 3000 CA Rotterdam, The Netherlands
| |
Collapse
|
10
|
Higdon R, Stewart E, Stanberry L, Haynes W, Choiniere J, Montague E, Anderson N, Yandl G, Janko I, Broomall W, Fishilevich S, Lancet D, Kolker N, Kolker E. MOPED enables discoveries through consistently processed proteomics data. J Proteome Res 2013; 13:107-13. [PMID: 24350770 DOI: 10.1021/pr400884c] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The Model Organism Protein Expression Database (MOPED, http://moped.proteinspire.org) is an expanding proteomics resource to enable biological and biomedical discoveries. MOPED aggregates simple, standardized and consistently processed summaries of protein expression and metadata from proteomics (mass spectrometry) experiments from human and model organisms (mouse, worm, and yeast). The latest version of MOPED adds new estimates of protein abundance and concentration as well as relative (differential) expression data. MOPED provides a new updated query interface that allows users to explore information by organism, tissue, localization, condition, experiment, or keyword. MOPED supports the Human Proteome Project's efforts to generate chromosome- and diseases-specific proteomes by providing links from proteins to chromosome and disease information as well as many complementary resources. MOPED supports a new omics metadata checklist to harmonize data integration, analysis, and use. MOPED's development is driven by the user community, which spans 90 countries and guides future development that will transform MOPED into a multiomics resource. MOPED encourages users to submit data in a simple format. They can use the metadata checklist to generate a data publication for this submission. As a result, MOPED will provide even greater insights into complex biological processes and systems and enable deeper and more comprehensive biological and biomedical discoveries.
Collapse
|
11
|
Higdon R, Haynes W, Stanberry L, Stewart E, Yandl G, Howard C, Broomall W, Kolker N, Kolker E. Unraveling the Complexities of Life Sciences Data. BIG DATA 2013; 1:42-50. [PMID: 27447037 DOI: 10.1089/big.2012.1505] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The life sciences have entered into the realm of big data and data-enabled science, where data can either empower or overwhelm. These data bring the challenges of the 5 Vs of big data: volume, veracity, velocity, variety, and value. Both independently and through our involvement with DELSA Global (Data-Enabled Life Sciences Alliance, DELSAglobal.org), the Kolker Lab ( kolkerlab.org ) is creating partnerships that identify data challenges and solve community needs. We specialize in solutions to complex biological data challenges, as exemplified by the community resource of MOPED (Model Organism Protein Expression Database, MOPED.proteinspire.org ) and the analysis pipeline of SPIRE (Systematic Protein Investigative Research Environment, PROTEINSPIRE.org ). Our collaborative work extends into the computationally intensive tasks of analysis and visualization of millions of protein sequences through innovative implementations of sequence alignment algorithms and creation of the Protein Sequence Universe tool (PSU). Pushing into the future together with our collaborators, our lab is pursuing integration of multi-omics data and exploration of biological pathways, as well as assigning function to proteins and porting solutions to the cloud. Big data have come to the life sciences; discovering the knowledge in the data will bring breakthroughs and benefits.
Collapse
Affiliation(s)
- Roger Higdon
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Winston Haynes
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Larissa Stanberry
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Elizabeth Stewart
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Gregory Yandl
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Chris Howard
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 5 Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
| | - William Broomall
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Natali Kolker
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
| | - Eugene Kolker
- 1 Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute , Seattle, Washington
- 2 High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute , Seattle, Washington
- 3 Predictive Analytics, Seattle Children's , Seattle, Washington
- 4 Data-Enabled Life Sciences Alliance (DELSA Global) , Seattle, Washington
- 6 Departments of Biomedical Informatics & Medical Education and Pediatrics, University of Washington , Seattle, Washington
| |
Collapse
|
12
|
|
13
|
Weisbrod CR, Eng JK, Hoopmann MR, Baker T, Bruce JE. Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification. J Proteome Res 2012; 11:1621-32. [PMID: 22288382 DOI: 10.1021/pr2008175] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Fourier transform-all reaction monitoring (FT-ARM) is a novel approach for the identification and quantification of peptides that relies upon the selectivity of high mass accuracy data and the specificity of peptide fragmentation patterns. An FT-ARM experiment involves continuous, data-independent, high mass accuracy MS/MS acquisition spanning a defined m/z range. Custom software was developed to search peptides against the multiplexed fragmentation spectra by comparing theoretical or empirical fragment ions against every fragmentation spectrum across the entire acquisition. A dot product score is calculated against each spectrum to generate a score chromatogram used for both identification and quantification. Chromatographic elution profile characteristics are not used to cluster precursor peptide signals to their respective fragment ions. FT-ARM identifications are demonstrated to be complementary to conventional data-dependent shotgun analysis, especially in cases where the data-dependent method fails because of fragmenting multiple overlapping precursors. The sensitivity, robustness, and specificity of FT-ARM quantification are shown to be analogous to selected reaction monitoring-based peptide quantification with the added benefit of minimal assay development. Thus, FT-ARM is demonstrated to be a novel and complementary data acquisition, identification, and quantification method for the large scale analysis of peptides.
Collapse
Affiliation(s)
- Chad R Weisbrod
- Department of Genome Sciences, University of Washington , 815 Mercer Street, Seattle, Washington 98109, United States
| | | | | | | | | |
Collapse
|
14
|
Kolker E, Higdon R, Haynes W, Welch D, Broomall W, Lancet D, Stanberry L, Kolker N. MOPED: Model Organism Protein Expression Database. Nucleic Acids Res 2012; 40:D1093-9. [PMID: 22139914 PMCID: PMC3245040 DOI: 10.1093/nar/gkr1177] [Citation(s) in RCA: 90] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Revised: 11/10/2011] [Accepted: 11/11/2011] [Indexed: 01/14/2023] Open
Abstract
Large numbers of mass spectrometry proteomics studies are being conducted to understand all types of biological processes. The size and complexity of proteomics data hinders efforts to easily share, integrate, query and compare the studies. The Model Organism Protein Expression Database (MOPED, htttp://moped.proteinspire.org) is a new and expanding proteomics resource that enables rapid browsing of protein expression information from publicly available studies on humans and model organisms. MOPED is designed to simplify the comparison and sharing of proteomics data for the greater research community. MOPED uniquely provides protein level expression data, meta-analysis capabilities and quantitative data from standardized analysis. Data can be queried for specific proteins, browsed based on organism, tissue, localization and condition and sorted by false discovery rate and expression. MOPED empowers users to visualize their own expression data and compare it with existing studies. Further, MOPED links to various protein and pathway databases, including GeneCards, Entrez, UniProt, KEGG and Reactome. The current version of MOPED contains over 43,000 proteins with at least one spectral match and more than 11 million high certainty spectra.
Collapse
Affiliation(s)
- Eugene Kolker
- Bioinformatics and High-throughput Analysis Laboratory, High-throughput Analysis Core, Center for Developmental Therapeutics, Seattle Children's Research Institute, Predicitive Analytics, Seattle Children's Hospital, Seattle, WA 98105, USA.
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Mayne L, Kan ZY, Chetty PS, Ricciuti A, Walters BT, Englander SW. Many overlapping peptides for protein hydrogen exchange experiments by the fragment separation-mass spectrometry method. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2011; 22:1898-905. [PMID: 21952777 PMCID: PMC3396559 DOI: 10.1007/s13361-011-0235-4] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Revised: 08/11/2011] [Accepted: 08/12/2011] [Indexed: 05/19/2023]
Abstract
Measurement of the naturally occurring hydrogen exchange (HX) behavior of proteins can in principle provide highly resolved thermodynamic and kinetic information on protein structure, dynamics, and interactions. The HX fragment separation-mass spectrometry method (HX-MS) is able to measure hydrogen exchange in biologically important protein systems that are not accessible to NMR methods. In order to achieve high structural resolution in HX-MS experiments, it will be necessary to obtain many sequentially overlapping peptide fragments and be able to identify and analyze them efficiently and accurately by mass spectrometry. This paper describes operations which, when applied to four different proteins ranging in size from 140 to 908 residues, routinely provides hundreds of useful unique peptides, covering the entire protein length many times over. Coverage in terms of the average number of peptide fragments that span each amino acid exceeds 10. The ability to achieve these results required the integrated application of experimental methods that are described here and a computer analysis program, called ExMS, described in a following paper.
Collapse
Affiliation(s)
- Leland Mayne
- Johnson Research Foundation, Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, 1006 Stellar-Chance Labs, 422 Curie Boulevard, Philadelphia, PA 19104, USA.
| | | | | | | | | | | |
Collapse
|
16
|
Higdon R, Reiter L, Hather G, Haynes W, Kolker N, Stewart E, Bauman AT, Picotti P, Schmidt A, van Belle G, Aebersold R, Kolker E. IPM: An integrated protein model for false discovery rate estimation and identification in high-throughput proteomics. J Proteomics 2011; 75:116-21. [PMID: 21718813 DOI: 10.1016/j.jprot.2011.06.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2011] [Revised: 05/28/2011] [Accepted: 06/02/2011] [Indexed: 12/19/2022]
Abstract
In high-throughput mass spectrometry proteomics, peptides and proteins are not simply identified as present or not present in a sample, rather the identifications are associated with differing levels of confidence. The false discovery rate (FDR) has emerged as an accepted means for measuring the confidence associated with identifications. We have developed the Systematic Protein Investigative Research Environment (SPIRE) for the purpose of integrating the best available proteomics methods. Two successful approaches to estimating the FDR for MS protein identifications are the MAYU and our current SPIRE methods. We present here a method to combine these two approaches to estimating the FDR for MS protein identifications into an integrated protein model (IPM). We illustrate the high quality performance of this IPM approach through testing on two large publicly available proteomics datasets. MAYU and SPIRE show remarkable consistency in identifying proteins in these datasets. Still, IPM results in a more robust FDR estimation approach and additional identifications, particularly among low abundance proteins. IPM is now implemented as a part of the SPIRE system.
Collapse
Affiliation(s)
- Roger Higdon
- Bioinformatics & High-throughput Analysis Laboratory, Seattle, WA, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Kolker E, Higdon R, Morgan P, Sedensky M, Welch D, Bauman A, Stewart E, Haynes W, Broomall W, Kolker N. SPIRE: Systematic protein investigative research environment. J Proteomics 2011; 75:122-6. [PMID: 21609792 DOI: 10.1016/j.jprot.2011.05.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2011] [Revised: 05/03/2011] [Accepted: 05/05/2011] [Indexed: 12/21/2022]
Abstract
The SPIRE (Systematic Protein Investigative Research Environment) provides web-based experiment-specific mass spectrometry (MS) proteomics analysis (https://www.proteinspire.org). Its emphasis is on usability and integration of the best analytic tools. SPIRE provides an easy to use web-interface and generates results in both interactive and simple data formats. In contrast to run-based approaches, SPIRE conducts the analysis based on the experimental design. It employs novel methods to generate false discovery rates and local false discovery rates (FDR, LFDR) and integrates the best and complementary open-source search and data analysis methods. The SPIRE approach of integrating X!Tandem, OMSSA and SpectraST can produce an increase in protein IDs (52-88%) over current combinations of scoring and single search engines while also providing accurate multi-faceted error estimation. One of SPIRE's primary assets is combining the results with data on protein function, pathways and protein expression from model organisms. We demonstrate some of SPIRE's capabilities by analyzing mitochondrial proteins from the wild type and 3 mutants of C. elegans. SPIRE also connects results to publically available proteomics data through its Model Organism Protein Expression Database (MOPED). SPIRE can also provide analysis and annotation for user supplied protein ID and expression data.
Collapse
Affiliation(s)
- Eugene Kolker
- Bioinformatics & High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, WA, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Demartini DR, Jain R, Agrawal G, Thelen JJ. Proteomic comparison of plastids from developing embryos and leaves of Brassica napus. J Proteome Res 2011; 10:2226-37. [PMID: 21417358 DOI: 10.1021/pr101047y] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Plastids are highly specialized organelles, responsible for photosynthesis and biosynthesis of various phytochemicals. To better understand plastid diversity and metabolism, a quantitative proteomic study of two plastid forms from Brassica napus (oilseed rape) was performed. Plastids were isolated from leaves (chloroplasts) of two-week-old plants and developing embryos (embryoplasts) three-weeks after flowering, using an approach avoiding protein storage vacuole contamination. Proteins from five different plastid preparations were prefractionated by SDS-PAGE and sectioned into multiple bands, and in-gel proteins were subjected to trypsin digestion. Tryptic peptides from each band were eluted and analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) and spectra were searched against a comprehensive plant database. Proteins were quantified based on MS/MS spectral counting of unique, nonhomologous peptides. Functional classification and quantitative comparison of over 2000 redundant proteins (compiled to 675 nonredundant proteins) determined that light reaction proteins are more prominent in chloroplasts, while many Calvin cycle enzymes are more prominent in embryoplasts. Embryoplasts also contain a diversity of other metabolic enzymes undetected in chloroplasts. Many enzymes involved in de novo fatty acid and amino acid biosynthesis were detected in embryoplasts but not chloroplasts. Additionally, protein synthesis-related proteins were prominent in embryoplasts. Collectively, these results indicate that these two plastid types are distinct.
Collapse
Affiliation(s)
- Diogo Ribeiro Demartini
- Department of Biochemistry and Interdisciplinary Plant Group, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, United States
| | | | | | | |
Collapse
|
19
|
Bauman A, Higdon R, Rapson S, Loiue B, Hogan J, Stacy R, Napuli A, Guo W, van Voorhis W, Roach J, Lu V, Landorf E, Stewart E, Kolker N, Collart F, Myler P, van Belle G, Kolker E. Design and initial characterization of the SC-200 proteomics standard mixture. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2011; 15:73-82. [PMID: 21250827 DOI: 10.1089/omi.2010.0118] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
High-throughput (HTP) proteomics studies generate large amounts of data. Interpretation of these data requires effective approaches to distinguish noise from biological signal, particularly as instrument and computational capacity increase and studies become more complex. Resolving this issue requires validated and reproducible methods and models, which in turn requires complex experimental and computational standards. The absence of appropriate standards and data sets for validating experimental and computational workflows hinders the development of HTP proteomics methods. Most protein standards are simple mixtures of proteins or peptides, or undercharacterized reference standards in which the identity and concentration of the constituent proteins is unknown. The Seattle Children's 200 (SC-200) proposed proteomics standard mixture is the next step toward developing realistic, fully characterized HTP proteomics standards. The SC-200 exhibits a unique modular design to extend its functionality, and consists of 200 proteins of known identities and molar concentrations from 6 microbial genomes, distributed into 10 molar concentration tiers spanning a 1,000-fold range. We describe the SC-200's design, potential uses, and initial characterization. We identified 84% of SC-200 proteins with an LTQ-Orbitrap and 65% with an LTQ-Velos (false discovery rate = 1% for both). There were obvious trends in success rate, sequence coverage, and spectral counts with protein concentration; however, protein identification, sequence coverage, and spectral counts vary greatly within concentration levels.
Collapse
Affiliation(s)
- Andrew Bauman
- Seattle Children's Research Institute, Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute, High-throughput Analysis Core, Seattle, Washington 98109, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Paddock MN, Bauman AT, Higdon R, Kolker E, Takeda S, Scharenberg AM. Competition between PARP-1 and Ku70 control the decision between high-fidelity and mutagenic DNA repair. DNA Repair (Amst) 2011; 10:338-43. [PMID: 21256093 DOI: 10.1016/j.dnarep.2010.12.005] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2010] [Revised: 11/29/2010] [Accepted: 12/13/2010] [Indexed: 12/26/2022]
Abstract
Affinity maturation of antibodies requires a unique process of targeted mutation that allows changes to accumulate in the antibody genes while the rest of the genome is protected from off-target mutations that can be oncogenic. This targeting requires that the same deamination event be repaired either by a mutagenic or a high-fidelity pathway depending on the genomic location. We have previously shown that the BRCT domain of the DNA-damage sensor PARP-1 is required for mutagenic repair occurring in the context of IgH and IgL diversification in the chicken B cell line DT40. Here we show that immunoprecipitation of the BRCT domain of PARP-1 pulls down Ku70 and the DNA-PK complex although the BRCT domain of PARP-1 does not bind DNA, suggesting that this interaction is not DNA dependent. Through sequencing the IgL variable region in PARP-1(-/-) cells that also lack Ku70 or Lig4, we show that Ku70 or Lig4 deficiency restores GCV to PARP-1(-/-) cells and conclude that the mechanism by which PARP-1 is promoting mutagenic repair is by inhibiting high-fidelity repair which would otherwise be mediated by Ku70 and Lig4.
Collapse
Affiliation(s)
- M N Paddock
- Seattle Children's Hospital Research Institute, 1900 9th Ave., Seattle, WA 98101, USA
| | | | | | | | | | | |
Collapse
|
21
|
Louie B, Higdon R, Kolker E. The necessity of adjusting tests of protein category enrichment in discovery proteomics. ACTA ACUST UNITED AC 2010; 26:3007-11. [PMID: 21068002 DOI: 10.1093/bioinformatics/btq541] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Enrichment tests are used in high-throughput experimentation to measure the association between gene or protein expression and membership in groups or pathways. The Fisher's exact test is commonly used. We specifically examined the associations produced by the Fisher test between protein identification by mass spectrometry discovery proteomics, and their Gene Ontology (GO) term assignments in a large yeast dataset. We found that direct application of the Fisher test is misleading in proteomics due to the bias in mass spectrometry to preferentially identify proteins based on their biochemical properties. False inference about associations can be made if this bias is not corrected. Our method adjusts Fisher tests for these biases and produces associations more directly attributable to protein expression rather than experimental bias. RESULTS Using logistic regression, we modeled the association between protein identification and GO term assignments while adjusting for identification bias in mass spectrometry. The model accounts for five biochemical properties of peptides: (i) hydrophobicity, (ii) molecular weight, (iii) transfer energy, (iv) beta turn frequency and (v) isoelectric point. The model was fit on 181 060 peptides from 2678 proteins identified in 24 yeast proteomics datasets with a 1% false discovery rate. In analyzing the association between protein identification and their GO term assignments, we found that 25% (134 out of 544) of Fisher tests that showed significant association (q-value ≤0.05) were non-significant after adjustment using our model. Simulations generating yeast protein sets enriched for identification propensity show that unadjusted enrichment tests were biased while our approach worked well.
Collapse
Affiliation(s)
- Brenton Louie
- Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | | | | |
Collapse
|
22
|
Higdon R, Haynes W, Kolker E. Meta-analysis for protein identification: a case study on yeast data. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010; 14:309-14. [PMID: 20569183 DOI: 10.1089/omi.2010.0034] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Large amounts of mass spectrometry (MS) proteomics data are now publicly available; however, little attention has been given to how to best combine these data and assess the error rates for protein identification. The objective of this article is to show how variation in the type and amount of data included with each study impacts coverage of the yeast proteome and estimation of the false discovery rate (FDR). Our analysis of a subset of the publicly available yeast data showed that failure to reevaluate the FDR when combining protein IDs from different experiments resulted in an underestimation of the FDR by approximately threefold. A worst-case approximation of the FDR was only slightly larger than estimating the FDR by randomized database matches. The use of a weighted model to emphasize the most informative experimental data provided an increase in the number of IDs at a 1% FDR when compared to other meta-analysis approaches. Also, using an FDR higher than 1% results in a very high rate of false discoveries for IDs above the 1% threshold. Ideally, raw MS data will be made publicly available for complete and consistent reanalysis. In the circumstance that raw data is not available, determining a combined FDR on the basis of the worst-case estimation provides a reasonable approximation of the FDR. When combining experimental results, adding additional experiments results in diminishing and in some cases negative returns on protein identifications. It may be beneficial to include only those experiments generating the most unique identifications due to solid experimental design and sensitive instrumentation.
Collapse
Affiliation(s)
- Roger Higdon
- Bioinformatics & High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, Washington 98101, USA
| | | | | |
Collapse
|