51
|
Shearer JJ, Wold EA, Umbaugh CS, Lichti CF, Nilsson CL, Figueiredo ML. Inorganic Arsenic-Related Changes in the Stromal Tumor Microenvironment in a Prostate Cancer Cell-Conditioned Media Model. ENVIRONMENTAL HEALTH PERSPECTIVES 2016; 124:1009-15. [PMID: 26588813 PMCID: PMC4937864 DOI: 10.1289/ehp.1510090] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 11/12/2015] [Indexed: 05/18/2023]
Abstract
BACKGROUND The tumor microenvironment plays an important role in the progression of cancer by mediating stromal-epithelial paracrine signaling, which can aberrantly modulate cellular proliferation and tumorigenesis. Exposure to environmental toxicants, such as inorganic arsenic (iAs), has also been implicated in the progression of prostate cancer. OBJECTIVE The role of iAs exposure in stromal signaling in the tumor microenvironment has been largely unexplored. Our objective was to elucidate molecular mechanisms of iAs-induced changes to stromal signaling by an enriched prostate tumor microenvironment cell population, adipose-derived mesenchymal stem/stromal cells (ASCs). RESULTS ASC-conditioned media (CM) collected after 1 week of iAs exposure increased prostate cancer cell viability, whereas CM from ASCs that received no iAs exposure decreased cell viability. Cytokine array analysis suggested changes to cytokine signaling associated with iAs exposure. Subsequent proteomic analysis suggested a concentration-dependent alteration to the HMOX1/THBS1/TGFβ signaling pathway by iAs. These results were validated by quantitative reverse transcriptase-polymerase chain reaction (RT-PCR) and Western blotting, confirming a concentration-dependent increase in HMOX1 and a decrease in THBS1 expression in ASC following iAs exposure. Subsequently, we used a TGFβ pathway reporter construct to confirm a decrease in stromal TGFβ signaling in ASC following iAs exposure. CONCLUSIONS Our results suggest a concentration-dependent alteration of stromal signaling: specifically, attenuation of stromal-mediated TGFβ signaling following exposure to iAs. Our results indicate iAs may enhance prostate cancer cell viability through a previously unreported stromal-based mechanism. These findings indicate that the stroma may mediate the effects of iAs in tumor progression, which may have future therapeutic implications. CITATION Shearer JJ, Wold EA, Umbaugh CS, Lichti CF, Nilsson CL, Figueiredo ML. 2016. Inorganic arsenic-related changes in the stromal tumor microenvironment in a prostate cancer cell-conditioned media model. Environ Health Perspect 124:1009-1015; http://dx.doi.org/10.1289/ehp.1510090.
Collapse
Affiliation(s)
- Joseph J. Shearer
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Eric A. Wold
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Charles S. Umbaugh
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Cheryl F. Lichti
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Carol L. Nilsson
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Marxa L. Figueiredo
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| |
Collapse
|
52
|
Lubet RA, Townsend R, Clapper ML, Juliana MM, Steele VE, McCormick DL, Grubbs CJ. 5MeCDDO Blocks Metabolic Activation but not Progression of Breast, Intestine, and Tongue Cancers. Is Antioxidant Response Element a Prevention Target? Cancer Prev Res (Phila) 2016; 9:616-23. [PMID: 27150634 PMCID: PMC4930704 DOI: 10.1158/1940-6207.capr-15-0294] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 04/20/2016] [Indexed: 12/17/2022]
Abstract
The preventive efficacy of the triterpenoid 5MeCDDO was tested in two models of mammary cancer, the Min model of intestinal cancer, and a chemically induced model of head and neck cancer. In one model of mammary cancer, female Sprague-Dawley rats were administered MNU at 50 days of age, and 5MeCDDO (27 ppm) was administered in the diet beginning 5 days later for the duration of the study; 5MeCDDO was ineffective. In contrast, in a model examining initiation of mammary cancers by the procarcinogen dimethyl-benzanthracene, 5, 6-benzoflavone (500 ppm, an Ah receptor agonist) or 5MeCDDO (27 or 2.7 ppm) decreased tumor multiplicity by 90%, 80%, and 50%, respectively. This anti-initiating effect which is presumably mediated by altered metabolic activation parallels our observation that 5MeCDDO induced proteins of various antioxidant response element (ARE)-related phase II drug-metabolizing enzymes [e.g., GST Pi, AKR 7A3 (aflatoxicol), epoxide hydrolase, and quinone reductase] in the liver. 5MeCDDO tested in the 4-nitroquinoline-l-oxide (4-NQO) head and neck cancer model failed to decrease tumor incidence or invasiveness. In the Min mouse model of intestinal cancer, a high dose of 5MeCDDO (80 ppm) was weakly effective in reducing adenoma multiplicity [∼30% (P < 0.05)]; however, a lower dose was totally ineffective. These findings question whether measuring increased levels of certain ARE-related genes (e.g., quinone reductase, GST Pi), indicating decreased carcinogen activation are sufficient to imply general chemopreventive efficacy of a given agent or mixture. Cancer Prev Res; 9(7); 616-23. ©2016 AACR.
Collapse
Affiliation(s)
- Ronald A Lubet
- Division of Cancer Prevention, Chemoprevention Agent Development Research Group, National Cancer Institute, Bethesda, Maryland
| | - Reid Townsend
- Department of Cell Biology & Physiology and Department of Medicine, Washington University School of Medicine, St. Louis, Missouri
| | - Margie L Clapper
- Cancer Prevention and Control Program, Fox Chase Cancer Center, Philadelphia, Pennsylvania
| | - M Margaret Juliana
- Chemoprevention Center, University of Alabama at Birmingham, Birmingham, Alabama
| | - Vernon E Steele
- Division of Cancer Prevention, Chemoprevention Agent Development Research Group, National Cancer Institute, Bethesda, Maryland
| | | | - Clinton J Grubbs
- Chemoprevention Center, University of Alabama at Birmingham, Birmingham, Alabama.
| |
Collapse
|
53
|
Blein-Nicolas M, Zivy M. Thousand and one ways to quantify and compare protein abundances in label-free bottom-up proteomics. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2016; 1864:883-95. [PMID: 26947242 DOI: 10.1016/j.bbapap.2016.02.019] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 01/21/2016] [Accepted: 02/24/2016] [Indexed: 11/18/2022]
Abstract
How to process and analyze MS data to quantify and statistically compare protein abundances in bottom-up proteomics has been an open debate for nearly fifteen years. Two main approaches are generally used: the first is based on spectral data generated during the process of identification (e.g. peptide counting, spectral counting), while the second makes use of extracted ion currents to quantify chromatographic peaks and infer protein abundances based on peptide quantification. These two approaches actually refer to multiple methods which have been developed during the last decade, but were submitted to deep evaluations only recently. In this paper, we compiled these different methods as exhaustively as possible. We also summarized the way they address the different problems raised by bottom-up protein quantification such as normalization, the presence of shared peptides, unequal peptide measurability and missing data. This article is part of a Special Issue entitled: Plant Proteomics--a bridge between fundamental processes and crop production, edited by Dr. Hans-Peter Mock.
Collapse
Affiliation(s)
- Mélisande Blein-Nicolas
- GQE-Le Moulon, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, F-91190 Gif-sur-Yvette, France
| | - Michel Zivy
- GQE-Le Moulon, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, F-91190 Gif-sur-Yvette, France.
| |
Collapse
|
54
|
Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J Proteome Res 2016; 15:1116-25. [DOI: 10.1021/acs.jproteome.5b00981] [Citation(s) in RCA: 232] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Cosmin Lazar
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| | - Laurent Gatto
- Computational Proteomics Unit, Cambridge CB2 1GA, United Kingdom
- Cambridge Center for Proteomics, Cambridge CB2 1GA, United Kingdom
| | - Myriam Ferro
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| | - Christophe Bruley
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| | - Thomas Burger
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CNRS, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| |
Collapse
|
55
|
Abstract
Liquid chromatography coupled with mass spectrometry (LC-MS) has been widely used for profiling protein expression levels. This chapter is focused on LC-MS data preprocessing, which is a crucial step in the analysis of LC-MS based proteomics. We provide a high-level overview, highlight associated challenges, and present a step-by-step example for analysis of data from LC-MS based untargeted proteomic study. Furthermore, key procedures and relevant issues with the subsequent analysis by multiple reaction monitoring (MRM) are discussed.
Collapse
Affiliation(s)
- Tsung-Heng Tsai
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, 20057, USA.
- Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA, 22203, USA.
| | - Minkun Wang
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, 20057, USA
- Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA, 22203, USA
| | - Habtom W Ressom
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, 20057, USA
| |
Collapse
|
56
|
Gunawardena HP, O'Brien J, Wrobel JA, Xie L, Davies SR, Li S, Ellis MJ, Qaqish BF, Chen X. QuantFusion: Novel Unified Methodology for Enhanced Coverage and Precision in Quantifying Global Proteomic Changes in Whole Tissues. Mol Cell Proteomics 2015; 15:740-51. [PMID: 26598639 DOI: 10.1074/mcp.o115.049791] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Indexed: 11/06/2022] Open
Abstract
Single quantitative platforms such as label-based or label-free quantitation (LFQ) present compromises in accuracy, precision, protein sequence coverage, and speed of quantifiable proteomic measurements. To maximize the quantitative precision and the number of quantifiable proteins or the quantifiable coverage of tissue proteomes, we have developed a unified approach, termed QuantFusion, that combines the quantitative ratios of all peptides measured by both LFQ and label-based methodologies. Here, we demonstrate the use of QuantFusion in determining the proteins differentially expressed in a pair of patient-derived tumor xenografts (PDXs) representing two major breast cancer (BC) subtypes, basal and luminal. Label-based in-spectra quantitative peptides derived from amino acid-coded tagging (AACT, also known as SILAC) of a non-malignant mammary cell line were uniformly added to each xenograft with a constant predefined ratio, from which Ratio-of-Ratio estimates were obtained for the label-free peptides paired with AACT peptides in each PDX tumor. A mixed model statistical analysis was used to determine global differential protein expression by combining complementary quantifiable peptide ratios measured by LFQ and Ratio-of-Ratios, respectively. With minimum number of replicates required for obtaining the statistically significant ratios, QuantFusion uses the distinct mechanisms to "rescue" the missing data inherent to both LFQ and label-based quantitation. Combined quantifiable peptide data from both quantitative schemes increased the overall number of peptide level measurements and protein level estimates. In our analysis of the PDX tumor proteomes, QuantFusion increased the number of distinct peptide ratios by 65%, representing differentially expressed proteins between the BC subtypes. This quantifiable coverage improvement, in turn, not only increased the number of measurable protein fold-changes by 8% but also increased the average precision of quantitative estimates by 181% so that some BC subtypically expressed proteins were rescued by QuantFusion. Thus, incorporating data from multiple quantitative approaches while accounting for measurement variability at both the peptide and global protein levels make QuantFusion unique for obtaining increased coverage and quantitative precision for tissue proteomes.
Collapse
Affiliation(s)
- Harsha P Gunawardena
- From the ‡Department of Biochemistry and Biophysics, §Lineberger Comprehensive Cancer Center, and
| | - Jonathon O'Brien
- ¶Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - John A Wrobel
- From the ‡Department of Biochemistry and Biophysics, §Lineberger Comprehensive Cancer Center, and
| | - Ling Xie
- From the ‡Department of Biochemistry and Biophysics, §Lineberger Comprehensive Cancer Center, and
| | - Sherri R Davies
- ‖Division of Oncology, Washington University, St. Louis, Missouri 63110
| | - Shunqiang Li
- ‖Division of Oncology, Washington University, St. Louis, Missouri 63110
| | - Matthew J Ellis
- **Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030
| | - Bahjat F Qaqish
- ¶Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - Xian Chen
- From the ‡Department of Biochemistry and Biophysics, §Lineberger Comprehensive Cancer Center, and
| |
Collapse
|
57
|
Goeminne LJE, Gevaert K, Clement L. Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics. Mol Cell Proteomics 2015; 15:657-68. [PMID: 26566788 DOI: 10.1074/mcp.m115.055897] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Indexed: 01/22/2023] Open
Abstract
Peptide intensities from mass spectra are increasingly used for relative quantitation of proteins in complex samples. However, numerous issues inherent to the mass spectrometry workflow turn quantitative proteomic data analysis into a crucial challenge. We and others have shown that modeling at the peptide level outperforms classical summarization-based approaches, which typically also discard a lot of proteins at the data preprocessing step. Peptide-based linear regression models, however, still suffer from unbalanced datasets due to missing peptide intensities, outlying peptide intensities and overfitting. Here, we further improve upon peptide-based models by three modular extensions: ridge regression, improved variance estimation by borrowing information across proteins with empirical Bayes and M-estimation with Huber weights. We illustrate our method on the CPTAC spike-in study and on a study comparing wild-type and ArgP knock-out Francisella tularensis proteomes. We show that the fold change estimates of our robust approach are more precise and more accurate than those from state-of-the-art summarization-based methods and peptide-based regression models, which leads to an improved sensitivity and specificity. We also demonstrate that ionization competition effects come already into play at very low spike-in concentrations and confirm that analyses with peptide-based regression methods on peptide intensity values aggregated by charge state and modification status (e.g. MaxQuant's peptides.txt file) are slightly superior to analyses on raw peptide intensity values (e.g. MaxQuant's evidence.txt file).
Collapse
Affiliation(s)
- Ludger J E Goeminne
- From the ‡Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium; §VIB Medical Biotechnology Center, Ghent University, Belgium; ¶Department of Biochemistry, Ghent University, Belgium
| | - Kris Gevaert
- §VIB Medical Biotechnology Center, Ghent University, Belgium; ¶Department of Biochemistry, Ghent University, Belgium
| | - Lieven Clement
- From the ‡Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium;
| |
Collapse
|
58
|
Suomi T, Corthals GL, Nevalainen OS, Elo LL. Using Peptide-Level Proteomics Data for Detecting Differentially Expressed Proteins. J Proteome Res 2015; 14:4564-70. [PMID: 26380941 DOI: 10.1021/acs.jproteome.5b00363] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The expression of proteins can be quantified in high-throughput means using different types of mass spectrometers. In recent years, there have emerged label-free methods for determining protein abundance. Although the expression is initially measured at the peptide level, a common approach is to combine the peptide-level measurements into protein-level values before differential expression analysis. However, this simple combination is prone to inconsistencies between peptides and may lose valuable information. To this end, we introduce here a method for detecting differentially expressed proteins by combining peptide-level expression-change statistics. Using controlled spike-in experiments, we show that the approach of averaging peptide-level expression changes yields more accurate lists of differentially expressed proteins than does the conventional protein-level approach. This is particularly true when there are only few replicate samples or the differences between the sample groups are small. The proposed technique is implemented in the Bioconductor package PECA, and it can be downloaded from http://www.bioconductor.org.
Collapse
Affiliation(s)
| | - Garry L Corthals
- Van't Hoff Institute for Molecular Sciences, University of Amsterdam , 1090 GD Amsterdam , The Netherlands
| | | | | |
Collapse
|
59
|
Lichti CF, Wildburger NC, Shavkunov AS, Mostovenko E, Liu H, Sulman EP, Nilsson CL. The proteomic landscape of glioma stem-like cells. EUPA OPEN PROTEOMICS 2015. [DOI: 10.1016/j.euprot.2015.06.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
60
|
Choi H, Kim S, Fermin D, Tsou CC, Nesvizhskii AI. QPROT: Statistical method for testing differential expression using protein-level intensity data in label-free quantitative proteomics. J Proteomics 2015; 129:121-126. [PMID: 26254008 DOI: 10.1016/j.jprot.2015.07.036] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 07/29/2015] [Accepted: 07/30/2015] [Indexed: 01/09/2023]
Abstract
UNLABELLED We introduce QPROT, a statistical framework and computational tool for differential protein expression analysis using protein intensity data. QPROT is an extension of the QSPEC suite, originally developed for spectral count data, adapted for the analysis using continuously measured protein-level intensity data. QPROT offers a new intensity normalization procedure and model-based differential expression analysis, both of which account for missing data. Determination of differential expression of each protein is based on the standardized Z-statistic based on the posterior distribution of the log fold change parameter, guided by the false discovery rate estimated by a well-known Empirical Bayes method. We evaluated the classification performance of QPROT using the quantification calibration data from the clinical proteomic technology assessment for cancer (CPTAC) study and a recently published Escherichia coli benchmark dataset, with evaluation of FDR accuracy in the latter. BIOLOGICAL SIGNIFICANCE QPROT is a statistical framework with computational software tool for comparative quantitative proteomics analysis. It features various extensions of QSPEC method originally built for spectral count data analysis, including probabilistic treatment of missing values in protein intensity data. With the increasing popularity of label-free quantitative proteomics data, the proposed method and accompanying software suite will be immediately useful for many proteomics laboratories. This article is part of a Special Issue entitled: Computational Proteomics.
Collapse
Affiliation(s)
- Hyungwon Choi
- Saw Swee Hock School of Public Health, National University of Singapore
| | - Sinae Kim
- Department of Biostatistics, Rutgers University
| | | | | | - Alexey I Nesvizhskii
- Departments of Pathology and Computational Medicine and Bioinformatics, University of Michigan Medical School
| |
Collapse
|
61
|
Goeminne LJE, Argentini A, Martens L, Clement L. Summarization vs Peptide-Based Models in Label-Free Quantitative Proteomics: Performance, Pitfalls, and Data Analysis Guidelines. J Proteome Res 2015; 14:2457-65. [PMID: 25827922 DOI: 10.1021/pr501223t] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Quantitative label-free mass spectrometry is increasingly used to analyze the proteomes of complex biological samples. However, the choice of appropriate data analysis methods remains a major challenge. We therefore provide a rigorous comparison between peptide-based models and peptide-summarization-based pipelines. We show that peptide-based models outperform summarization-based pipelines in terms of sensitivity, specificity, accuracy, and precision. We also demonstrate that the predefined FDR cutoffs for the detection of differentially regulated proteins can become problematic when differentially expressed (DE) proteins are highly abundant in one or more samples. Care should therefore be taken when data are interpreted from samples with spiked-in internal controls and from samples that contain a few very highly abundant proteins. We do, however, show that specific diagnostic plots can be used for assessing differentially expressed proteins and the overall quality of the obtained fold change estimates. Finally, our study also illustrates that imputation under the "missing by low abundance" assumption is beneficial for the detection of differential expression in proteins with low abundance, but it negatively affects moderately to highly abundant proteins. Hence, imputation strategies that are commonly implemented in standard proteomics software should be used with care.
Collapse
Affiliation(s)
- Ludger J E Goeminne
- ∥Department of Plant Systems Biology, VIB, Ghent University, 9052 Ghent, Belgium
| | | | | | | |
Collapse
|
62
|
Webb-Robertson BJM, Wiberg HK, Matzke MM, Brown JN, Wang J, McDermott JE, Smith RD, Rodland KD, Metz TO, Pounds JG, Waters KM. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J Proteome Res 2015; 14:1993-2001. [PMID: 25855118 DOI: 10.1021/pr501138h] [Citation(s) in RCA: 167] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In this review, we apply selected imputation strategies to label-free liquid chromatography-mass spectrometry (LC-MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC-MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. On the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.
Collapse
Affiliation(s)
| | - Holli K Wiberg
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Melissa M Matzke
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Joseph N Brown
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Jing Wang
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Jason E McDermott
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Richard D Smith
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Karin D Rodland
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Thomas O Metz
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Joel G Pounds
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Katrina M Waters
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| |
Collapse
|
63
|
Zhan X, Patterson AD, Ghosh D. Kernel approaches for differential expression analysis of mass spectrometry-based metabolomics data. BMC Bioinformatics 2015; 16:77. [PMID: 25887233 PMCID: PMC4359587 DOI: 10.1186/s12859-015-0506-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 02/20/2015] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Data generated from metabolomics experiments are different from other types of "-omics" data. For example, a common phenomenon in mass spectrometry (MS)-based metabolomics data is that the data matrix frequently contains missing values, which complicates some quantitative analyses. One way to tackle this problem is to treat them as absent. Hence there are two types of information that are available in metabolomics data: presence/absence of a metabolite and a quantitative value of the abundance level of a metabolite if it is present. Combining these two layers of information poses challenges to the application of traditional statistical approaches in differential expression analysis. RESULTS In this article, we propose a novel kernel-based score test for the metabolomics differential expression analysis. In order to simultaneously capture both the continuous pattern and discrete pattern in metabolomics data, two new kinds of kernels are designed. One is the distance-based kernel and the other is the stratified kernel. While we initially describe the procedures in the case of single-metabolite analysis, we extend the methods to handle metabolite sets as well. CONCLUSIONS Evaluation based on both simulated data and real data from a liver cancer metabolomics study indicates that our kernel method has a better performance than some existing alternatives. An implementation of the proposed kernel method in the R statistical computing environment is available at http://works.bepress.com/debashis_ghosh/60/ .
Collapse
Affiliation(s)
- Xiang Zhan
- Department of Statistics, Pennsylvania State University, 325 Thomas Building, University Park, 16802, PA, USA.
| | - Andrew D Patterson
- Department of Molecular Toxicology, Pennsylvania State University, 322 Life Sciences Bldg, University Park, 16802, PA, USA.
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, 13001 East 17th Place, Aurora, 80045, CO, USA.
| |
Collapse
|
64
|
Franks AM, Csárdi G, Drummond DA, Airoldi EM. Estimating a structured covariance matrix from multi-lab measurements in high-throughput biology. J Am Stat Assoc 2015; 110:27-44. [PMID: 25954056 PMCID: PMC4418505 DOI: 10.1080/01621459.2014.964404] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
We consider the problem of quantifying the degree of coordination between transcription and translation, in yeast. Several studies have reported a surprising lack of coordination over the years, in organisms as different as yeast and human, using diverse technologies. However, a close look at this literature suggests that the lack of reported correlation may not reflect the biology of regulation. These reports do not control for between-study biases and structure in the measurement errors, ignore key aspects of how the data connect to the estimand, and systematically underestimate the correlation as a consequence. Here, we design a careful meta-analysis of 27 yeast data sets, supported by a multilevel model, full uncertainty quantification, a suite of sensitivity analyses and novel theory, to produce a more accurate estimate of the correlation between mRNA and protein levels-a proxy for coordination. From a statistical perspective, this problem motivates new theory on the impact of noise, model mis-specifications and non-ignorable missing data on estimates of the correlation between high dimensional responses. We find that the correlation between mRNA and protein levels is quite high under the studied conditions, in yeast, suggesting that post-transcriptional regulation plays a less prominent role than previously thought.
Collapse
|
65
|
Glaab E, Schneider R. RepExplore: addressing technical replicate variance in proteomics and metabolomics data analysis. ACTA ACUST UNITED AC 2015; 31:2235-7. [PMID: 25717197 PMCID: PMC4481852 DOI: 10.1093/bioinformatics/btv127] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 02/22/2015] [Indexed: 11/13/2022]
Abstract
Summary: High-throughput omics datasets often contain technical replicates included to account for technical sources of noise in the measurement process. Although summarizing these replicate measurements by using robust averages may help to reduce the influence of noise on downstream data analysis, the information on the variance across the replicate measurements is lost in the averaging process and therefore typically disregarded in subsequent statistical analyses. We introduce RepExplore, a web-service dedicated to exploit the information captured in the technical replicate variance to provide more reliable and informative differential expression and abundance statistics for omics datasets. The software builds on previously published statistical methods, which have been applied successfully to biomedical omics data but are difficult to use without prior experience in programming or scripting. RepExplore facilitates the analysis by providing a fully automated data processing and interactive ranking tables, whisker plot, heat map and principal component analysis visualizations to interpret omics data and derived statistics. Availability and implementation: Freely available at http://www.repexplore.tk Contact:enrico.glaab@uni.lu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Enrico Glaab
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
66
|
Rotival M, Ko JH, Srivastava PK, Kerloc'h A, Montoya A, Mauro C, Faull P, Cutillas PR, Petretto E, Behmoaras J. Integrating phosphoproteome and transcriptome reveals new determinants of macrophage multinucleation. Mol Cell Proteomics 2014; 14:484-98. [PMID: 25532521 PMCID: PMC4349971 DOI: 10.1074/mcp.m114.043836] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Macrophage multinucleation (MM) is essential for various biological processes such as osteoclast-mediated bone resorption and multinucleated giant cell-associated inflammatory reactions. Here we study the molecular pathways underlying multinucleation in the rat through an integrative approach combining MS-based quantitative phosphoproteomics (LC-MS/MS) and transcriptome (high-throughput RNA-sequencing) to identify new regulators of MM. We show that a strong metabolic shift toward HIF1-mediated glycolysis occurs at transcriptomic level during MM, together with modifications in phosphorylation of over 50 proteins including several ARF GTPase activators and polyphosphate inositol phosphatases. We use shortest-path analysis to link differential phosphorylation with the transcriptomic reprogramming of macrophages and identify LRRFIP1, SMARCA4, and DNMT1 as novel regulators of MM. We experimentally validate these predictions by showing that knock-down of these latter reduce macrophage multinucleation. These results provide a new framework for the combined analysis of transcriptional and post-translational changes during macrophage multinucleation, prioritizing essential genes, and revealing the sequential events leading to the multinucleation of macrophages.
Collapse
Affiliation(s)
- Maxime Rotival
- From the ‡Integrative Genomics and Medicine, MRC Clinical Sciences Centre, Imperial College London, UK
| | - Jeong-Hun Ko
- §Centre for Complement and Inflammation Research (CCIR), Imperial College London, UK
| | - Prashant K Srivastava
- From the ‡Integrative Genomics and Medicine, MRC Clinical Sciences Centre, Imperial College London, UK
| | - Audrey Kerloc'h
- §Centre for Complement and Inflammation Research (CCIR), Imperial College London, UK
| | - Alex Montoya
- ‖Biological Mass Spectrometry and Proteomics Laboratory, MRC Clinical Sciences Centre, Imperial College London, UK
| | - Claudio Mauro
- ¶William Harvey Research Institute, Queen Mary University of London, UK
| | - Peter Faull
- ‖Biological Mass Spectrometry and Proteomics Laboratory, MRC Clinical Sciences Centre, Imperial College London, UK
| | - Pedro R Cutillas
- **Integrative Cell Signaling and Proteomics, Barts Cancer Institute, Queen Mary University of London, UK
| | - Enrico Petretto
- From the ‡Integrative Genomics and Medicine, MRC Clinical Sciences Centre, Imperial College London, UK;
| | - Jacques Behmoaras
- §Centre for Complement and Inflammation Research (CCIR), Imperial College London, UK;
| |
Collapse
|
67
|
Elo LL, Karjalainen R, Ohman T, Hintsanen P, Nyman TA, Heckman CA, Aittokallio T. Statistical detection of quantitative protein biomarkers provides insights into signaling networks deregulated in acute myeloid leukemia. Proteomics 2014; 14:2443-53. [PMID: 25211154 DOI: 10.1002/pmic.201300460] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Revised: 07/31/2014] [Accepted: 09/08/2014] [Indexed: 12/12/2022]
Abstract
The increasing coverage and sensitivity of LC-MS/MS-based proteomics have expanded its applications in systems medicine. In particular, label-free quantitation approaches are enabling biomarker discovery in terms of statistical comparison of proteomic profiles across large numbers of clinical samples. However, it still remains poorly understood how much protein markers can add novel insights compared to markers derived from mRNA transcriptomic profiling. Using paired label-free LC-MS/MS and gene expression microarray measurements from primary samples of patients with acute myeloid leukemia (AML), we demonstrate here that while the quantitative proteomic and transcriptomic profiles were highly correlated, in general, the marker panels showing statistically significant expression changes across the disease and healthy groups were profoundly different between protein and mRNA levels. In particular, the proteomic assay enabled unique links to known leukemic processes, which were missed when using the transcriptomic profiling alone, as well as identified additional links to metabolic regulators and chromatin remodelers, such as GPX1, fumarate hydratase, and SET oncogene, which have subsequently been evaluated in independent AML samples. Overall, these results highlighted the complementary and informative view obtained from the quantitative LC-MS/MS approach into the AML deregulated signaling networks.
Collapse
Affiliation(s)
- Laura L Elo
- Department of Mathematics and Statistics, University of Turku, Turku, Finland; Turku Centre for Biotechnology, Turku, Finland
| | | | | | | | | | | | | |
Collapse
|
68
|
Jow H, Boys RJ, Wilkinson DJ. Bayesian identification of protein differential expression in multi-group isobaric labelled mass spectrometry data. Stat Appl Genet Mol Biol 2014; 13:531-51. [PMID: 25153608 DOI: 10.1515/sagmb-2012-0066] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this paper we develop a Bayesian statistical inference approach to the unified analysis of isobaric labelled MS/MS proteomic data across multiple experiments. An explicit probabilistic model of the log-intensity of the isobaric labels' reporter ions across multiple pre-defined groups and experiments is developed. This is then used to develop a full Bayesian statistical methodology for the identification of differentially expressed proteins, with respect to a control group, across multiple groups and experiments. This methodology is implemented and then evaluated on simulated data and on two model experimental datasets (for which the differentially expressed proteins are known) that use a TMT labelling protocol.
Collapse
|
69
|
Vedell PT, Townsend RR, You M, Malone JP, Grubbs CJ, Bland KI, Muccio DD, Atigadda VR, Chen Y, Vignola K, Lubet RA. Global molecular changes in rat livers treated with
RXR
agonists: a comparison using transcriptomics and proteomics. Pharmacol Res Perspect 2014. [DOI: 10.1002/prp2.74] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Affiliation(s)
- Peter T. Vedell
- Department of Pharmacology Medical College of Wisconsin Cancer Center Milwaukee Wisconsin 53226
| | - Reid R. Townsend
- Department of Internal Medicine Washington University School of Medicine St. Louis Missouri 63110
| | - Ming You
- Department of Pharmacology Medical College of Wisconsin Cancer Center Milwaukee Wisconsin 53226
| | - James P. Malone
- Department of Internal Medicine Washington University School of Medicine St. Louis Missouri 63110
| | - Clinton J. Grubbs
- Department of Surgery University of Alabama at Birmingham Birmingham Alabama 35294
| | - Kirby I. Bland
- Department of Surgery University of Alabama at Birmingham Birmingham Alabama 35294
| | - Donald D. Muccio
- Department of Chemistry University of Alabama at Birmingham Birmingham Alabama 35294
| | - Venkatram R. Atigadda
- Department of Chemistry University of Alabama at Birmingham Birmingham Alabama 35294
| | - Yang Chen
- Department of Science Development Metabolon Research Triangle Park North Carolina 27709
| | - Katie Vignola
- Department of Science Development Metabolon Research Triangle Park North Carolina 27709
| | - Ronald A. Lubet
- Chemoprevention Agent Development Research Group National Cancer Institute Rockville Maryland 20892
| |
Collapse
|
70
|
Taylor SL, Leiserowitz GS, Kim K. Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies. Stat Appl Genet Mol Biol 2014; 12:703-22. [PMID: 24246290 DOI: 10.1515/sagmb-2013-0021] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a compound is absent from a sample or is present but at a concentration below the detection limit. Several strategies are available for statistically analyzing data with missing values. The accelerated failure time (AFT) model assumes all missing values result from censoring below a detection limit. Under a mixture model, missing values can result from a combination of censoring and the absence of a compound. We compare power and estimation of a mixture model to an AFT model. Based on simulated data, we found the AFT model to have greater power to detect differences in means and point mass proportions between groups. However, the AFT model yielded biased estimates with the bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and matched controls.
Collapse
|
71
|
Koopmans F, Cornelisse LN, Heskes T, Dijkstra TMH. Empirical Bayesian Random Censoring Threshold Model Improves Detection of Differentially Abundant Proteins. J Proteome Res 2014; 13:3871-80. [DOI: 10.1021/pr500171u] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Frank Koopmans
- Department
of Functional Genomics, Center for Neurogenomics and Cognitive Research, VU University, 1081 HV Amsterdam, The Netherlands
- Machine
Learning Group, Institute for Computing and Information Sciences, Radboud University, 6525 HP Nijmegen, The Netherlands
| | - L. Niels Cornelisse
- Department
of Functional Genomics, Center for Neurogenomics and Cognitive Research, VU University, 1081 HV Amsterdam, The Netherlands
| | - Tom Heskes
- Machine
Learning Group, Institute for Computing and Information Sciences, Radboud University, 6525 HP Nijmegen, The Netherlands
| | - Tjeerd M. H. Dijkstra
- Machine
Learning Group, Institute for Computing and Information Sciences, Radboud University, 6525 HP Nijmegen, The Netherlands
| |
Collapse
|
72
|
Lichti CF, Fan X, English RD, Zhang Y, Li D, Kong F, Sinha M, Andersen CR, Spratt H, Luxon BA, Green TA. Environmental enrichment alters protein expression as well as the proteomic response to cocaine in rat nucleus accumbens. Front Behav Neurosci 2014; 8:246. [PMID: 25100957 PMCID: PMC4104784 DOI: 10.3389/fnbeh.2014.00246] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Accepted: 06/30/2014] [Indexed: 11/13/2022] Open
Abstract
Prior research demonstrated that environmental enrichment creates individual differences in behavior leading to a protective addiction phenotype in rats. Understanding the mechanisms underlying this phenotype will guide selection of targets for much-needed novel pharmacotherapeutics. The current study investigates differences in proteome expression in the nucleus accumbens of enriched and isolated rats and the proteomic response to cocaine self-administration using a liquid chromatography mass spectrometry (LCMS) technique to quantify 1917 proteins. Results of complementary Ingenuity Pathways Analyses (IPA) and gene set enrichment analyses (GSEA), both performed using protein quantitative data, demonstrate that cocaine increases vesicular transporters for dopamine and glutamate as well as increasing proteins in the RhoA pathway. Further, cocaine regulates proteins related to ERK, CREB and AKT signaling. Environmental enrichment altered expression of a large number of proteins implicated in a diverse number of neuronal functions (e.g., energy production, mRNA splicing, and ubiquitination), molecular cascades (e.g., protein kinases), psychiatric disorders (e.g., mood disorders), and neurodegenerative diseases (e.g., Huntington's and Alzheimer's diseases). Upregulation of energy metabolism components in EC rats was verified using RNA sequencing. Most of the biological functions and pathways listed above were also identified in the Cocaine X Enrichment interaction analysis, providing clear evidence that enriched and isolated rats respond quite differently to cocaine exposure. The overall impression of the current results is that enriched saline-administering rats have a unique proteomic complement compared to enriched cocaine-administering rats as well as saline and cocaine-taking isolated rats. These results identify possible mechanisms of the protective phenotype and provide fertile soil for developing novel pharmacotherapeutics. Proteomics data are available via ProteomeXchange with identifier PXD000990.
Collapse
Affiliation(s)
- Cheryl F Lichti
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch Galveston, TX, USA
| | - Xiuzhen Fan
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch Galveston, TX, USA ; Center for Addiction Research, The University of Texas Medical Branch Galveston, TX, USA
| | - Robert D English
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch Galveston, TX, USA
| | - Yafang Zhang
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch Galveston, TX, USA ; Center for Addiction Research, The University of Texas Medical Branch Galveston, TX, USA
| | - Dingge Li
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch Galveston, TX, USA ; Center for Addiction Research, The University of Texas Medical Branch Galveston, TX, USA
| | - Fanping Kong
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch Galveston, TX, USA
| | - Mala Sinha
- Sealy Center for Molecular Medicine, Institute for Translational Science, The University of Texas Medical Branch Galveston, TX, USA
| | - Clark R Andersen
- Sealy Center for Molecular Medicine, Institute for Translational Science, The University of Texas Medical Branch Galveston, TX, USA
| | - Heidi Spratt
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch Galveston, TX, USA ; Sealy Center for Molecular Medicine, Institute for Translational Science, The University of Texas Medical Branch Galveston, TX, USA ; Department of Preventative Medicine and Community Health, The University of Texas Medical Branch Galveston, TX, USA
| | - Bruce A Luxon
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch Galveston, TX, USA ; Sealy Center for Molecular Medicine, Institute for Translational Science, The University of Texas Medical Branch Galveston, TX, USA
| | - Thomas A Green
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch Galveston, TX, USA ; Center for Addiction Research, The University of Texas Medical Branch Galveston, TX, USA
| |
Collapse
|
73
|
Mayampurath A, Song E, Mathur A, Yu CY, Hammoud Z, Mechref Y, Tang H. Label-free glycopeptide quantification for biomarker discovery in human sera. J Proteome Res 2014; 13:4821-32. [PMID: 24946017 DOI: 10.1021/pr500242m] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Glycan moieties of glycoproteins modulate many biological processes in mammals, such as immune response, inflammation, and cell signaling. Numerous studies show that many human diseases are correlated with quantitative alteration of protein glycosylation. In some cases, these changes can occur for certain types of glycans over specific sites in a glycoprotein rather than on the global abundance of the glycoprotein. Conventional analytical techniques that analyze the abundance of glycans cleaved from glycoproteins cannot reveal these subtle effects. Here we present a novel statistical method to quantify the site-specific glycosylation of glycoproteins in complex samples using label-free mass spectrometric techniques. Abundance variations between sites of a glycoprotein as well as different glycoforms, that is, glycopeptides with different glycans attached to the same site, can be detected using these techniques. We applied our method to an esophageal cancer study based on blood serum samples from cancer patients in an attempt to detect potential biomarkers of site-specific N-linked glycosylation. A few glycoproteins, including vitronectin, showed significantly different site-specific glycosylations within cancer/control samples, indicating that our method is ready to be used for the discovery of glycosylated biomarkers.
Collapse
Affiliation(s)
- Anoop Mayampurath
- School of Informatics & Computing, Indiana University , 901 East 10th Street, Bloomington, Indiana 47408, United States
| | | | | | | | | | | | | |
Collapse
|
74
|
Ali A, Alexandersson E, Sandin M, Resjö S, Lenman M, Hedley P, Levander F, Andreasson E. Quantitative proteomics and transcriptomics of potato in response to Phytophthora infestans in compatible and incompatible interactions. BMC Genomics 2014; 15:497. [PMID: 24947944 PMCID: PMC4079953 DOI: 10.1186/1471-2164-15-497] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 06/10/2014] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND In order to get global molecular understanding of one of the most important crop diseases worldwide, we investigated compatible and incompatible interactions between Phytophthora infestans and potato (Solanum tuberosum). We used the two most field-resistant potato clones under Swedish growing conditions, which have the greatest known local diversity of P. infestans populations, and a reference compatible cultivar. RESULTS Quantitative label-free proteomics of 51 apoplastic secretome samples (PXD000435) in combination with genome-wide transcript analysis by 42 microarrays (E-MTAB-1515) were used to capture changes in protein abundance and gene expression at 6, 24 and 72 hours after inoculation with P. infestans. To aid mass spectrometry analysis we generated cultivar-specific RNA-seq data (E-MTAB-1712), which increased peptide identifications by 17%. Components induced only during incompatible interactions, which are candidates for hypersensitive response initiation, include a Kunitz-like protease inhibitor, transcription factors and an RCR3-like protein. More secreted proteins had lower abundance in the compatible interaction compared to the incompatible interactions. Based on this observation and because the well-characterized effector-target C14 protease follows this pattern, we suggest 40 putative effector targets. CONCLUSIONS In summary, over 17000 transcripts and 1000 secreted proteins changed in abundance in at least one time point, illustrating the dynamics of plant responses to a hemibiotroph. Half of the differentially abundant proteins showed a corresponding change at the transcript level. Many putative hypersensitive and effector-target proteins were single representatives of large gene families.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Erik Andreasson
- Department of Plant Protection Biology, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| |
Collapse
|
75
|
Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics 2014; 13:2513-26. [PMID: 24942700 PMCID: PMC4159666 DOI: 10.1074/mcp.m113.031591] [Citation(s) in RCA: 3422] [Impact Index Per Article: 342.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Protein quantification without isotopic labels has been a long-standing interest in the proteomics field. However, accurate and robust proteome-wide quantification with label-free approaches remains a challenge. We developed a new intensity determination and normalization procedure called MaxLFQ that is fully compatible with any peptide or protein separation prior to LC-MS analysis. Protein abundance profiles are assembled using the maximum possible information from MS signals, given that the presence of quantifiable peptides varies from sample to sample. For a benchmark dataset with two proteomes mixed at known ratios, we accurately detected the mixing ratio over the entire protein expression range, with greater precision for abundant proteins. The significance of individual label-free quantifications was obtained via a t test approach. For a second benchmark dataset, we accurately quantify fold changes over several orders of magnitude, a task that is challenging with label-based methods. MaxLFQ is a generic label-free quantification technology that is readily applicable to many biological questions; it is compatible with standard statistical analysis workflows, and it has been validated in many and diverse biological projects. Our algorithms can handle very large experiments of 500+ samples in a manageable computing time. It is implemented in the freely available MaxQuant computational proteomics platform and works completely seamlessly at the click of a button.
Collapse
Affiliation(s)
- Jürgen Cox
- From the ‡Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany
| | - Marco Y Hein
- From the ‡Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany
| | - Christian A Luber
- From the ‡Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany
| | - Igor Paron
- From the ‡Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany
| | - Nagarjuna Nagaraj
- From the ‡Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany
| | - Matthias Mann
- From the ‡Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany
| |
Collapse
|
76
|
Ryu SY, Qian WJ, Camp DG, Smith RD, Tompkins RG, Davis RW, Xiao W. Detecting differential protein expression in large-scale population proteomics. ACTA ACUST UNITED AC 2014; 30:2741-6. [PMID: 24928210 DOI: 10.1093/bioinformatics/btu341] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
MOTIVATION Mass spectrometry (MS)-based high-throughput quantitative proteomics shows great potential in large-scale clinical biomarker studies, identifying and quantifying thousands of proteins in biological samples. However, there are unique challenges in analyzing the quantitative proteomics data. One issue is that the quantification of a given peptide is often missing in a subset of the experiments, especially for less abundant peptides. Another issue is that different MS experiments of the same study have significantly varying numbers of peptides quantified, which can result in more missing peptide abundances in an experiment that has a smaller total number of quantified peptides. To detect as many biomarker proteins as possible, it is necessary to develop bioinformatics methods that appropriately handle these challenges. RESULTS We propose a Significance Analysis for Large-scale Proteomics Studies (SALPS) that handles missing peptide intensity values caused by the two mechanisms mentioned above. Our model has a robust performance in both simulated data and proteomics data from a large clinical study. Because varying patients' sample qualities and deviating instrument performances are not avoidable for clinical studies performed over the course of several years, we believe that our approach will be useful to analyze large-scale clinical proteomics data. AVAILABILITY AND IMPLEMENTATION R codes for SALPS are available at http://www.stanford.edu/%7eclairesr/software.html.
Collapse
Affiliation(s)
- So Young Ryu
- Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Wei-Jun Qian
- Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - David G Camp
- Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Richard D Smith
- Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Ronald G Tompkins
- Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Ronald W Davis
- Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Wenzhong Xiao
- Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| |
Collapse
|
77
|
Tomizioli M, Lazar C, Brugière S, Burger T, Salvi D, Gatto L, Moyet L, Breckels LM, Hesse AM, Lilley KS, Seigneurin-Berny D, Finazzi G, Rolland N, Ferro M. Deciphering thylakoid sub-compartments using a mass spectrometry-based approach. Mol Cell Proteomics 2014; 13:2147-67. [PMID: 24872594 DOI: 10.1074/mcp.m114.040923] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Photosynthesis has shaped atmospheric and ocean chemistries and probably changed the climate as well, as oxygen is released from water as part of the photosynthetic process. In photosynthetic eukaryotes, this process occurs in the chloroplast, an organelle containing the most abundant biological membrane, the thylakoids. The thylakoids of plants and some green algae are structurally inhomogeneous, consisting of two main domains: the grana, which are piles of membranes gathered by stacking forces, and the stroma-lamellae, which are unstacked thylakoids connecting the grana. The major photosynthetic complexes are unevenly distributed within these compartments because of steric and electrostatic constraints. Although proteomic analysis of thylakoids has been instrumental to define its protein components, no extensive proteomic study of subthylakoid localization of proteins in the BBY (grana) and the stroma-lamellae fractions has been achieved so far. To fill this gap, we performed a complete survey of the protein composition of these thylakoid subcompartments using thylakoid membrane fractionations. We employed semiquantitative proteomics coupled with a data analysis pipeline and manual annotation to differentiate genuine BBY and stroma-lamellae proteins from possible contaminants. About 300 thylakoid (or potentially thylakoid) proteins were shown to be enriched in either the BBY or the stroma-lamellae fractions. Overall, present findings corroborate previous observations obtained for photosynthetic proteins that used nonproteomic approaches. The originality of the present proteomic relies in the identification of photosynthetic proteins whose differential distribution in the thylakoid subcompartments might explain already observed phenomenon such as LHCII docking. Besides, from the present localization results we can suggest new molecular actors for photosynthesis-linked activities. For instance, most PsbP-like subunits being differently localized in stroma-lamellae, these proteins could be linked to the PSI-NDH complex in the context of cyclic electron flow around PSI. In addition, we could identify about a hundred new likely minor thylakoid (or chloroplast) proteins, some of them being potential regulators of the chloroplast physiology.
Collapse
Affiliation(s)
- Martino Tomizioli
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; §CNRS, UMR5168, F-38054 Grenoble, France; ¶CEA, iRTSV, Laboratoire Physiologie Cellulaire & Végétale, F-38054 Grenoble, France; ‖INRA, USC 1359, F-38054 Grenoble, France
| | - Cosmin Lazar
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; **CEA, iRTSV, Laboratoire Biologie à Grande Echelle, F-38054 Grenoble, France; ‡‡ INSERM, U1038, F-38054 Grenoble, France
| | - Sabine Brugière
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; **CEA, iRTSV, Laboratoire Biologie à Grande Echelle, F-38054 Grenoble, France; ‡‡ INSERM, U1038, F-38054 Grenoble, France
| | - Thomas Burger
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; **CEA, iRTSV, Laboratoire Biologie à Grande Echelle, F-38054 Grenoble, France; ‡‡ INSERM, U1038, F-38054 Grenoble, France; §§CNRS, FR3425, F-38054 Grenoble, France
| | - Daniel Salvi
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; §CNRS, UMR5168, F-38054 Grenoble, France; ¶CEA, iRTSV, Laboratoire Physiologie Cellulaire & Végétale, F-38054 Grenoble, France; ‖INRA, USC 1359, F-38054 Grenoble, France
| | - Laurent Gatto
- ¶¶Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1QR, United Kingdom
| | - Lucas Moyet
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; §CNRS, UMR5168, F-38054 Grenoble, France; ¶CEA, iRTSV, Laboratoire Physiologie Cellulaire & Végétale, F-38054 Grenoble, France; ‖INRA, USC 1359, F-38054 Grenoble, France
| | - Lisa M Breckels
- ¶¶Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1QR, United Kingdom
| | - Anne-Marie Hesse
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; **CEA, iRTSV, Laboratoire Biologie à Grande Echelle, F-38054 Grenoble, France; ‡‡ INSERM, U1038, F-38054 Grenoble, France
| | - Kathryn S Lilley
- ¶¶Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1QR, United Kingdom
| | - Daphné Seigneurin-Berny
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; §CNRS, UMR5168, F-38054 Grenoble, France; ¶CEA, iRTSV, Laboratoire Physiologie Cellulaire & Végétale, F-38054 Grenoble, France; ‖INRA, USC 1359, F-38054 Grenoble, France
| | - Giovanni Finazzi
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; §CNRS, UMR5168, F-38054 Grenoble, France; ¶CEA, iRTSV, Laboratoire Physiologie Cellulaire & Végétale, F-38054 Grenoble, France; ‖INRA, USC 1359, F-38054 Grenoble, France
| | - Norbert Rolland
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; §CNRS, UMR5168, F-38054 Grenoble, France; ¶CEA, iRTSV, Laboratoire Physiologie Cellulaire & Végétale, F-38054 Grenoble, France; ‖INRA, USC 1359, F-38054 Grenoble, France;
| | - Myriam Ferro
- From the ‡Univ. Grenoble Alpes, F-38000 Grenoble, France; **CEA, iRTSV, Laboratoire Biologie à Grande Echelle, F-38054 Grenoble, France; ‡‡ INSERM, U1038, F-38054 Grenoble, France;
| |
Collapse
|
78
|
Liu C, Song CQ, Yuan ZF, Fu Y, Chi H, Wang LH, Fan SB, Zhang K, Zeng WF, He SM, Dong MQ, Sun RX. pQuant Improves Quantitation by Keeping out Interfering Signals and Evaluating the Accuracy of Calculated Ratios. Anal Chem 2014; 86:5286-94. [DOI: 10.1021/ac404246w] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Chao Liu
- Key
Lab of Intelligent
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chun-Qing Song
- National
Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Zuo-Fei Yuan
- Key
Lab of Intelligent
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China
| | - Yan Fu
- Key
Lab of Intelligent
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China
| | - Hao Chi
- Key
Lab of Intelligent
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Le-Heng Wang
- Key
Lab of Intelligent
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China
| | - Sheng-Bo Fan
- Key
Lab of Intelligent
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kun Zhang
- Key
Lab of Intelligent
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wen-Feng Zeng
- Key
Lab of Intelligent
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Si-Min He
- Key
Lab of Intelligent
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Meng-Qiu Dong
- National
Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Rui-Xiang Sun
- Key
Lab of Intelligent
Information Processing of Chinese Academy of Sciences (CAS), Institute
of Computing Technology, CAS, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
79
|
Lu W, Yin X, Liu X, Yan G, Yang P. Response of peptide intensity to concentration in ESI-MS-based proteome. Sci China Chem 2014. [DOI: 10.1007/s11426-014-5096-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
80
|
Navarro P, Trevisan-Herraz M, Bonzon-Kulichenko E, Núñez E, Martínez-Acedo P, Pérez-Hernández D, Jorge I, Mesa R, Calvo E, Carrascal M, Hernáez ML, García F, Bárcena JA, Ashman K, Abian J, Gil C, Redondo JM, Vázquez J. General statistical framework for quantitative proteomics by stable isotope labeling. J Proteome Res 2014; 13:1234-47. [PMID: 24512137 DOI: 10.1021/pr4006958] [Citation(s) in RCA: 137] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
The combination of stable isotope labeling (SIL) with mass spectrometry (MS) allows comparison of the abundance of thousands of proteins in complex mixtures. However, interpretation of the large data sets generated by these techniques remains a challenge because appropriate statistical standards are lacking. Here, we present a generally applicable model that accurately explains the behavior of data obtained using current SIL approaches, including (18)O, iTRAQ, and SILAC labeling, and different MS instruments. The model decomposes the total technical variance into the spectral, peptide, and protein variance components, and its general validity was demonstrated by confronting 48 experimental distributions against 18 different null hypotheses. In addition to its general applicability, the performance of the algorithm was at least similar than that of other existing methods. The model also provides a general framework to integrate quantitative and error information fully, allowing a comparative analysis of the results obtained from different SIL experiments. The model was applied to the global analysis of protein alterations induced by low H₂O₂ concentrations in yeast, demonstrating the increased statistical power that may be achieved by rigorous data integration. Our results highlight the importance of establishing an adequate and validated statistical framework for the analysis of high-throughput data.
Collapse
Affiliation(s)
- Pedro Navarro
- Centro de Biología Molecular Severo Ochoa, CSIC-UAM , 28049 Madrid, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
81
|
Ryu SY. Bioinformatics tools to identify and quantify proteins using mass spectrometry data. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:1-17. [PMID: 24629183 DOI: 10.1016/b978-0-12-800168-4.00001-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Proteomics tries to understand biological function of an organism by studying its protein expressions. Mass spectrometry is used in the field of shotgun proteomics, and it generates mass spectra that are used to identify and quantify proteins in biological samples. In this chapter, we discuss the bioinformatics algorithms to analyze mass spectrometry data. After briefly describing how mass spectrometry generates data, we illustrate the bioinformatics algorithms and software for protein identification such as de novo approach and database-searching approach. We also discuss the bioinformatics algorithms and software to quantify proteins and detect the differential proteins using isotope-coded affinity tags and label-free mass spectrometry data.
Collapse
Affiliation(s)
- So Young Ryu
- Stanford Genome Technology Center, Biochemistry Department, Stanford University, Stanford, California, USA.
| |
Collapse
|
82
|
Contemporary network proteomics and its requirements. BIOLOGY 2013; 3:22-38. [PMID: 24833333 PMCID: PMC4009760 DOI: 10.3390/biology3010022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Revised: 12/15/2013] [Accepted: 12/16/2013] [Indexed: 01/10/2023]
Abstract
The integration of networks with genomics (network genomics) is a familiar field. Conventional network analysis takes advantage of the larger coverage and relative stability of gene expression measurements. Network proteomics on the other hand has to develop further on two critical factors: (1) expanded data coverage and consistency, and (2) suitable reference network libraries, and data mining from them. Concerning (1) we discuss several contemporary themes that can improve data quality, which in turn will boost the outcome of downstream network analysis. For (2), we focus on network analysis developments, specifically, the need for context-specific networks and essential considerations for localized network analysis.
Collapse
|
83
|
Lichti CF, Liu H, Shavkunov AS, Mostovenko E, Sulman EP, Ezhilarasan R, Wang Q, Kroes RA, Moskal JC, Fenyö D, Oksuz BA, Conrad CA, Lang FF, Berven FS, Végvári A, Rezeli M, Marko-Varga G, Hober S, Nilsson CL. Integrated chromosome 19 transcriptomic and proteomic data sets derived from glioma cancer stem-cell lines. J Proteome Res 2013; 13:191-9. [PMID: 24266786 DOI: 10.1021/pr400786s] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
One subproject within the global Chromosome 19 Consortium is to define chromosome 19 gene and protein expression in glioma-derived cancer stem cells (GSCs). Chromosome 19 is notoriously linked to glioma by 1p/19q codeletions, and clinical tests are established to detect that specific aberration. GSCs are tumor-initiating cells and are hypothesized to provide a repository of cells in tumors that can self-replicate and be refractory to radiation and chemotherapeutic agents developed for the treatment of tumors. In this pilot study, we performed RNA-Seq, label-free quantitative protein measurements in six GSC lines, and targeted transcriptomic analysis using a chromosome 19-specific microarray in an additional six GSC lines. The data have been deposited to the ProteomeXchange with identifier PXD000563. Here we present insights into differences in GSC gene and protein expression, including the identification of proteins listed as having no or low evidence at the protein level in the Human Protein Atlas, as correlated to chromosome 19 and GSC subtype. Furthermore, the upregulation of proteins downstream of adenovirus-associated viral integration site 1 (AAVS1) in GSC11 in response to oncolytic adenovirus treatment was demonstrated. Taken together, our results may indicate new roles for chromosome 19, beyond the 1p/19q codeletion, in the future of personalized medicine for glioma patients.
Collapse
Affiliation(s)
- Cheryl F Lichti
- Department of Pharmacology and Toxicology, UTMB Cancer Center, University of Texas Medical Branch , 301 University Boulevard, Galveston, Texas 77555, United States
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
84
|
Chen YY, Chambers MC, Li M, Ham AJL, Turner JL, Zhang B, Tabb DL. IDPQuantify: combining precursor intensity with spectral counts for protein and peptide quantification. J Proteome Res 2013; 12:4111-21. [PMID: 23879310 DOI: 10.1021/pr400438q] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Differentiating and quantifying protein differences in complex samples produces significant challenges in sensitivity and specificity. Label-free quantification can draw from two different information sources: precursor intensities and spectral counts. Intensities are accurate for calculating protein relative abundance, but values are often missing due to peptides that are identified sporadically. Spectral counting can reliably reproduce difference lists, but differentiating peptides or quantifying all but the most concentrated protein changes is usually beyond its abilities. Here we developed new software, IDPQuantify, to align multiple replicates using principal component analysis, extract accurate precursor intensities from MS data, and combine intensities with spectral counts for significant gains in differentiation and quantification. We have applied IDPQuantify to three comparative proteomic data sets featuring gold standard protein differences spiked in complicated backgrounds. The software is able to associate peptides with peaks that are otherwise left unidentified to increase the efficiency of protein quantification, especially for low-abundance proteins. By combing intensities with spectral counts from IDPicker, it gains an average of 30% more true positive differences among top differential proteins. IDPQuantify quantifies protein relative abundance accurately in these test data sets to produce good correlations between known and measured concentrations.
Collapse
Affiliation(s)
- Yao-Yi Chen
- Department of Biomedical Informatics, Vanderbilt University Medical School, Nashville, Tennessee 37232-8575, United States
| | | | | | | | | | | | | |
Collapse
|
85
|
D'haeseleer P, Gladden JM, Allgaier M, Chain PSG, Tringe SG, Malfatti SA, Aldrich JT, Nicora CD, Robinson EW, Paša-Tolić L, Hugenholtz P, Simmons BA, Singer SW. Proteogenomic analysis of a thermophilic bacterial consortium adapted to deconstruct switchgrass. PLoS One 2013; 8:e68465. [PMID: 23894306 PMCID: PMC3716776 DOI: 10.1371/journal.pone.0068465] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Accepted: 05/29/2013] [Indexed: 12/02/2022] Open
Abstract
Thermophilic bacteria are a potential source of enzymes for the deconstruction of lignocellulosic biomass. However, the complement of proteins used to deconstruct biomass and the specific roles of different microbial groups in thermophilic biomass deconstruction are not well-explored. Here we report on the metagenomic and proteogenomic analyses of a compost-derived bacterial consortium adapted to switchgrass at elevated temperature with high levels of glycoside hydrolase activities. Near-complete genomes were reconstructed for the most abundant populations, which included composite genomes for populations closely related to sequenced strains of Thermus thermophilus and Rhodothermus marinus, and for novel populations that are related to thermophilic Paenibacilli and an uncultivated subdivision of the little-studied Gemmatimonadetes phylum. Partial genomes were also reconstructed for a number of lower abundance thermophilic Chloroflexi populations. Identification of genes for lignocellulose processing and metabolic reconstructions suggested Rhodothermus, Paenibacillus and Gemmatimonadetes as key groups for deconstructing biomass, and Thermus as a group that may primarily metabolize low molecular weight compounds. Mass spectrometry-based proteomic analysis of the consortium was used to identify >3000 proteins in fractionated samples from the cultures, and confirmed the importance of Paenibacillus and Gemmatimonadetes to biomass deconstruction. These studies also indicate that there are unexplored proteins with important roles in bacterial lignocellulose deconstruction.
Collapse
Affiliation(s)
- Patrik D'haeseleer
- Joint BioEnergy Institute, Emeryville, California, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
86
|
Song X, Lichti CF, Townsend RR, Mueckler M. Single point mutations result in the miss-sorting of Glut4 to a novel membrane compartment associated with stress granule proteins. PLoS One 2013; 8:e68516. [PMID: 23874650 PMCID: PMC3713040 DOI: 10.1371/journal.pone.0068516] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 05/29/2013] [Indexed: 01/16/2023] Open
Abstract
Insulin increases cellular glucose uptake and metabolism in the postprandial state by acutely stimulating the translocation of the Glut4 glucose transporter from intracellular membrane compartments to the cell surface in muscle and fat cells. The intracellular targeting of Glut4 is dictated by specific structural motifs within cytoplasmic domains of the transporter. We demonstrate that two leucine residues at the extreme C-terminus of Glut4 are critical components of a motif (IRM, insulin responsive motif) involved in the sorting of the transporter to insulin responsive vesicles in 3T3L1 adipocytes. Light microscopy, immunogold electron microscopy, subcellular fractionation, and sedimentation analysis indicate that mutations in the IRM cause the aberrant targeting of Glut4 to large dispersed membrane vesicles that are not insulin responsive. Proteomic characterization of rapidly and slowly sedimenting membrane vesicles (RSVs and SSVs) that were highly enriched by immunoadsorption for either wild-type Glut4 or an IRM mutant revealed that the major vesicle fraction containing the mutant transporter (IRM-RSVs) possessed a relatively small and highly distinct protein population that was enriched for proteins associated with stress granules. We suggest that the IRM is critical for an early step in the sorting of Glut4 to insulin-responsive subcellular membrane compartments and that IRM mutants are miss-targeted to relatively large, amorphous membrane vesicles that may be involved in a degradation pathway for miss-targeted or miss-folded proteins or represent a transitional membrane compartment that Glut4 traverses en route to insulin responsive storage compartments.
Collapse
Affiliation(s)
- XiaoMei Song
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Cheryl F. Lichti
- Department of Pharmacology & Toxicology, University of Texas, Galveston, Texas, United States of America
| | - R. Reid Townsend
- Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Mike Mueckler
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| |
Collapse
|
87
|
Henao R, Thompson JW, Moseley MA, Ginsburg GS, Carin L, Lucas JE. Latent protein trees. Ann Appl Stat 2013. [DOI: 10.1214/13-aoas639] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
88
|
Perrin RJ, Payton JE, Malone JP, Gilmore P, Davis AE, Xiong C, Fagan AM, Townsend RR, Holtzman DM. Quantitative label-free proteomics for discovery of biomarkers in cerebrospinal fluid: assessment of technical and inter-individual variation. PLoS One 2013; 8:e64314. [PMID: 23700471 PMCID: PMC3659127 DOI: 10.1371/journal.pone.0064314] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 04/11/2013] [Indexed: 01/11/2023] Open
Abstract
Background Biomarkers are required for pre-symptomatic diagnosis, treatment, and monitoring of neurodegenerative diseases such as Alzheimer's disease. Cerebrospinal fluid (CSF) is a favored source because its proteome reflects the composition of the brain. Ideal biomarkers have low technical and inter-individual variability (subject variance) among control subjects to minimize overlaps between clinical groups. This study evaluates a process of multi-affinity fractionation (MAF) and quantitative label-free liquid chromatography tandem mass spectrometry (LC-MS/MS) for CSF biomarker discovery by (1) identifying reparable sources of technical variability, (2) assessing subject variance and residual technical variability for numerous CSF proteins, and (3) testing its ability to segregate samples on the basis of desired biomarker characteristics. Methods/Results Fourteen aliquots of pooled CSF and two aliquots from six cognitively normal individuals were randomized, enriched for low-abundance proteins by MAF, digested endoproteolytically, randomized again, and analyzed by nano-LC-MS. Nano-LC-MS data were time and m/z aligned across samples for relative peptide quantification. Among 11,433 aligned charge groups, 1360 relatively abundant ones were annotated by MS2, yielding 823 unique peptides. Analyses, including Pearson correlations of annotated LC-MS ion chromatograms, performed for all pairwise sample comparisons, identified several sources of technical variability: i) incomplete MAF and keratins; ii) globally- or segmentally-decreased ion current in isolated LC-MS analyses; and iii) oxidized methionine-containing peptides. Exclusion of these sources yielded 609 peptides representing 81 proteins. Most of these proteins showed very low coefficients of variation (CV<5%) whether they were quantified from the mean of all or only the 2 most-abundant peptides. Unsupervised clustering, using only 24 proteins selected for high subject variance, yielded perfect segregation of pooled and individual samples. Conclusions Quantitative label-free LC-MS/MS can measure scores of CSF proteins with low technical variability and can segregate samples according to desired criteria. Thus, this technique shows potential for biomarker discovery for neurological diseases.
Collapse
Affiliation(s)
- Richard J Perrin
- Division of Neuropathology, Washington University School of Medicine, St. Louis, Missouri, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
89
|
Cortes DF, Landis MK, Ottens AK. High-capacity peptide-centric platform to decode the proteomic response to brain injury. Electrophoresis 2012. [PMID: 23160985 DOI: 10.1002/elph.201200341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/14/2023]
Abstract
Traumatic brain injury (TBI) is a progressive disease process underlain by dynamic and interactive biochemical mechanisms; thus, large-scale and unbiased assessments are needed to fully understand its highly complex pathobiology. Here, we report on a new high-capacity label-free proteomic platform to evaluate the post-TBI neuroproteome. Six orthogonal separation stages and data-independent MS were employed, affording reproducible quantitative assessment on 18 651 peptides across biological replicates. From these data 3587 peptides were statistically responsive to TBI of which 18% were post-translationally modified. Results revealed as many as 484 proteins in the post-TBI neuroproteome, which was fully nine times the number determined from our prior study of focal cortical injury. Yet, these data were generated using 25 times less brain tissue per animal relative to former methodology, permitting greater anatomical specificity and proper biological replication for increased statistical power. Exemplified by these data, we discuss benefits of peptide-centric differential analysis to more accurately infer novel biological findings testable in future hypothesis-driven research. The high-capacity label-free proteomic platform is designed for multi-factor studies aimed at expanding our knowledge on the molecular underpinnings of TBI and to develop better diagnostics and therapeutics.
Collapse
Affiliation(s)
- Diego F Cortes
- Department of Anatomy & Neurobiology, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | | | | |
Collapse
|
90
|
Cortes DF, Landis MK, Ottens AK. High-capacity peptide-centric platform to decode the proteomic response to brain injury. Electrophoresis 2012; 33:3712-9. [PMID: 23160985 DOI: 10.1002/elps.201200341] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Revised: 09/12/2012] [Accepted: 09/20/2012] [Indexed: 02/05/2023]
Abstract
Traumatic brain injury (TBI) is a progressive disease process underlain by dynamic and interactive biochemical mechanisms; thus, large-scale and unbiased assessments are needed to fully understand its highly complex pathobiology. Here, we report on a new high-capacity label-free proteomic platform to evaluate the post-TBI neuroproteome. Six orthogonal separation stages and data-independent MS were employed, affording reproducible quantitative assessment on 18 651 peptides across biological replicates. From these data 3587 peptides were statistically responsive to TBI of which 18% were post-translationally modified. Results revealed as many as 484 proteins in the post-TBI neuroproteome, which was fully nine times the number determined from our prior study of focal cortical injury. Yet, these data were generated using 25 times less brain tissue per animal relative to former methodology, permitting greater anatomical specificity and proper biological replication for increased statistical power. Exemplified by these data, we discuss benefits of peptide-centric differential analysis to more accurately infer novel biological findings testable in future hypothesis-driven research. The high-capacity label-free proteomic platform is designed for multi-factor studies aimed at expanding our knowledge on the molecular underpinnings of TBI and to develop better diagnostics and therapeutics.
Collapse
Affiliation(s)
- Diego F Cortes
- Department of Anatomy & Neurobiology, Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | | | | |
Collapse
|
91
|
Matzke MM, Brown JN, Gritsenko MA, Metz TO, Pounds JG, Rodland KD, Shukla AK, Smith RD, Waters KM, McDermott JE, Webb-Robertson BJ. A comparative analysis of computational approaches to relative protein quantification using peptide peak intensities in label-free LC-MS proteomics experiments. Proteomics 2012; 13:493-503. [PMID: 23019139 DOI: 10.1002/pmic.201200269] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Revised: 08/14/2012] [Accepted: 08/22/2012] [Indexed: 12/24/2022]
Abstract
Liquid chromatography coupled with mass spectrometry (LC-MS) is widely used to identify and quantify peptides in complex biological samples. In particular, label-free shotgun proteomics is highly effective for the identification of peptides and subsequently obtaining a global protein profile of a sample. As a result, this approach is widely used for discovery studies. Typically, the objective of these discovery studies is to identify proteins that are affected by some condition of interest (e.g. disease, exposure). However, for complex biological samples, label-free LC-MS proteomics experiments measure peptides and do not directly yield protein quantities. Thus, protein quantification must be inferred from one or more measured peptides. In recent years, many computational approaches to relative protein quantification of label-free LC-MS data have been published. In this review, we examine the most commonly employed quantification approaches to relative protein abundance from peak intensity values, evaluate their individual merits, and discuss challenges in the use of the various computational approaches.
Collapse
|
92
|
Karpievitch YV, Dabney AR, Smith RD. Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 2012; 13 Suppl 16:S5. [PMID: 23176322 PMCID: PMC3489534 DOI: 10.1186/1471-2105-13-s16-s5] [Citation(s) in RCA: 196] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.
Collapse
Affiliation(s)
- Yuliya V Karpievitch
- School of Mathematics and Physics, University of Tasmania, Hobart, Tasmania, Australia.
| | | | | |
Collapse
|
93
|
Clough T, Thaminy S, Ragg S, Aebersold R, Vitek O. Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs. BMC Bioinformatics 2012; 13 Suppl 16:S6. [PMID: 23176351 PMCID: PMC3489535 DOI: 10.1186/1471-2105-13-s16-s6] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is widely used for quantitative proteomic investigations. The typical output of such studies is a list of identified and quantified peptides. The biological and clinical interest is, however, usually focused on quantitative conclusions at the protein level. Furthermore, many investigations ask complex biological questions by studying multiple interrelated experimental conditions. Therefore, there is a need in the field for generic statistical models to quantify protein levels even in complex study designs. RESULTS We propose a general statistical modeling approach for protein quantification in arbitrary complex experimental designs, such as time course studies, or those involving multiple experimental factors. The approach summarizes the quantitative experimental information from all the features and all the conditions that pertain to a protein. It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions. We implement the approach in an open-source R-based software package MSstats suitable for researchers with a limited statistics and programming background. CONCLUSIONS We demonstrate, using as examples two experimental investigations with complex designs, that a simultaneous statistical modeling of all the relevant features and conditions yields a higher sensitivity of protein significance analysis and a higher accuracy of protein quantification as compared to commonly employed alternatives. The software is available at http://www.stat.purdue.edu/~ovitek/Software.html.
Collapse
Affiliation(s)
- Timothy Clough
- Department of Statistics, Purdue University, West Lafayette, IN, USA.
| | | | | | | | | |
Collapse
|
94
|
Lee AR, Lamb RR, Chang JH, Erdmann-Gilmore P, Lichti CF, Rohrs HW, Malone JP, Wairkar YP, DiAntonio A, Townsend RR, Culican SM. Identification of potential mediators of retinotopic mapping: a comparative proteomic analysis of optic nerve from WT and Phr1 retinal knockout mice. J Proteome Res 2012; 11:5515-26. [PMID: 22985349 DOI: 10.1021/pr300767a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Retinal ganglion cells (RGCs) transmit visual information topographically from the eye to the brain, creating a map of visual space in retino-recipient nuclei (retinotopy). This process is affected by retinal activity and by activity-independent molecular cues. Phr1, which encodes a presumed E3 ubiquitin ligase (PHR1), is required presynaptically for proper placement of RGC axons in the lateral geniculate nucleus and the superior colliculus, suggesting that increased levels of PHR1 target proteins may be instructive for retinotopic mapping of retinofugal projections. To identify potential target proteins, we conducted a proteomic analysis of optic nerve to identify differentially abundant proteins in the presence or absence of Phr1 in RGCs. 1D gel electrophoresis identified a specific band in controls that was absent in mutants. Targeted proteomic analysis of this band demonstrated the presence of PHR1. Additionally, we conducted an unbiased proteomic analysis that identified 30 proteins as being significantly different between the two genotypes. One of these, heterogeneous nuclear ribonucleoprotein M (hnRNP-M), regulates antero-posterior patterning in invertebrates and can function as a cell surface adhesion receptor in vertebrates. Thus, we have demonstrated that network analysis of quantitative proteomic data is a useful approach for hypothesis generation and for identifying biologically relevant targets in genetically altered biological models.
Collapse
Affiliation(s)
- Andrew R Lee
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
95
|
Nikolovski N, Rubtsov D, Segura MP, Miles GP, Stevens TJ, Dunkley TP, Munro S, Lilley KS, Dupree P. Putative glycosyltransferases and other plant Golgi apparatus proteins are revealed by LOPIT proteomics. PLANT PHYSIOLOGY 2012; 160:1037-51. [PMID: 22923678 PMCID: PMC3461528 DOI: 10.1104/pp.112.204263] [Citation(s) in RCA: 124] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 08/22/2012] [Indexed: 05/18/2023]
Abstract
The Golgi apparatus is the central organelle in the secretory pathway and plays key roles in glycosylation, protein sorting, and secretion in plants. Enzymes involved in the biosynthesis of complex polysaccharides, glycoproteins, and glycolipids are located in this organelle, but the majority of them remain uncharacterized. Here, we studied the Arabidopsis (Arabidopsis thaliana) membrane proteome with a focus on the Golgi apparatus using localization of organelle proteins by isotope tagging. By applying multivariate data analysis to a combined data set of two new and two previously published localization of organelle proteins by isotope tagging experiments, we identified the subcellular localization of 1,110 proteins with high confidence. These include 197 Golgi apparatus proteins, 79 of which have not been localized previously by a high-confidence method, as well as the localization of 304 endoplasmic reticulum and 208 plasma membrane proteins. Comparison of the hydrophobic domains of the localized proteins showed that the single-span transmembrane domains have unique properties in each organelle. Many of the novel Golgi-localized proteins belong to uncharacterized protein families. Structure-based homology analysis identified 12 putative Golgi glycosyltransferase (GT) families that have no functionally characterized members and, therefore, are not yet assigned to a Carbohydrate-Active Enzymes database GT family. The substantial numbers of these putative GTs lead us to estimate that the true number of plant Golgi GTs might be one-third above those currently annotated. Other newly identified proteins are likely to be involved in the transport and interconversion of nucleotide sugar substrates as well as polysaccharide and protein modification.
Collapse
|
96
|
Taverner T, Karpievitch YV, Polpitiya AD, Brown JN, Dabney AR, Anderson GA, Smith RD. DanteR: an extensible R-based tool for quantitative analysis of -omics data. ACTA ACUST UNITED AC 2012; 28:2404-6. [PMID: 22815360 DOI: 10.1093/bioinformatics/bts449] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION The size and complex nature of mass spectrometry-based proteomics datasets motivate development of specialized software for statistical data analysis and exploration. We present DanteR, a graphical R package that features extensive statistical and diagnostic functions for quantitative proteomics data analysis, including normalization, imputation, hypothesis testing, interactive visualization and peptide-to-protein rollup. More importantly, users can easily extend the existing functionality by including their own algorithms under the Add-On tab. AVAILABILITY DanteR and its associated user guide are available for download free of charge at http://omics.pnl.gov/software/. We have an updated binary source for the DanteR package up on our website together with a vignettes document. For Windows, a single click automatically installs DanteR along with the R programming environment. For Linux and Mac OS X, users must install R and then follow instructions on the DanteR website for package installation. CONTACT rds@pnnl.gov.
Collapse
Affiliation(s)
- Tom Taverner
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | | | | | | | | | | | | |
Collapse
|
97
|
Vasilj A, Gentzel M, Ueberham E, Gebhardt R, Shevchenko A. Tissue proteomics by one-dimensional gel electrophoresis combined with label-free protein quantification. J Proteome Res 2012; 11:3680-9. [PMID: 22671763 DOI: 10.1021/pr300147z] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Label-free methods streamline quantitative proteomics of tissues by alleviating the need for metabolic labeling of proteins with stable isotopes. Here we detail and implement solutions to common problems in label-free data processing geared toward tissue proteomics by one-dimensional gel electrophoresis followed by liquid chromatography tandem mass spectrometry (geLC MS/MS). Our quantification pipeline showed high levels of performance in terms of duplicate reproducibility, linear dynamic range, and number of proteins identified and quantified. When applied to the liver of an adenomatous polyposis coli (APC) knockout mouse, we demonstrated an 8-fold increase in the number of statistically significant changing proteins compared to alternative approaches, including many more previously unidentified hydrophobic proteins. Better proteome coverage and quantification accuracy revealed molecular details of the perturbed energy metabolism.
Collapse
Affiliation(s)
- Andrej Vasilj
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | | | | | | | | |
Collapse
|
98
|
Liu NQ, Braakman RBH, Stingl C, Luider TM, Martens JWM, Foekens JA, Umar A. Proteomics pipeline for biomarker discovery of laser capture microdissected breast cancer tissue. J Mammary Gland Biol Neoplasia 2012; 17:155-64. [PMID: 22644111 PMCID: PMC3428526 DOI: 10.1007/s10911-012-9252-6] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2012] [Accepted: 05/01/2012] [Indexed: 01/15/2023] Open
Abstract
Mass spectrometry (MS)-based label-free proteomics offers an unbiased approach to screen biomarkers related to disease progression and therapy-resistance of breast cancer on the global scale. However, multi-step sample preparation can introduce large variation in generated data, while inappropriate statistical methods will lead to false positive hits. All these issues have hampered the identification of reliable protein markers. A workflow, which integrates reproducible and robust sample preparation and data handling methods, is highly desirable in clinical proteomics investigations. Here we describe a label-free tissue proteomics pipeline, which encompasses laser capture microdissection (LCM) followed by nanoscale liquid chromatography and high resolution MS. This pipeline routinely identifies on average ∼10,000 peptides corresponding to ∼1,800 proteins from sub-microgram amounts of protein extracted from ∼4,000 LCM breast cancer epithelial cells. Highly reproducible abundance data were generated from different technical and biological replicates. As a proof-of-principle, comparative proteome analysis was performed on estrogen receptor α positive or negative (ER+/-) samples, and commonly known differentially expressed proteins related to ER expression in breast cancer were identified. Therefore, we show that our tissue proteomics pipeline is robust and applicable for the identification of breast cancer specific protein markers.
Collapse
Affiliation(s)
- Ning Qing Liu
- Department of Medical Oncology and Daniel Den Hoed Cancer Center, Erasmus University Medical Center, Dr. Molewaterplein 50, Be-401, P.O. Box 2040, 3000 CA Rotterdam, the Netherlands
- Netherlands Proteomics Center, Rotterdam, the Netherlands
| | - René B. H. Braakman
- Department of Medical Oncology and Daniel Den Hoed Cancer Center, Erasmus University Medical Center, Dr. Molewaterplein 50, Be-401, P.O. Box 2040, 3000 CA Rotterdam, the Netherlands
- Center for Translational Molecular Medicine, Rotterdam, the Netherlands
| | - Christoph Stingl
- Department of Neurology, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Theo M. Luider
- Department of Neurology, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - John W. M. Martens
- Department of Medical Oncology and Daniel Den Hoed Cancer Center, Erasmus University Medical Center, Dr. Molewaterplein 50, Be-401, P.O. Box 2040, 3000 CA Rotterdam, the Netherlands
- Center for Translational Molecular Medicine, Rotterdam, the Netherlands
- Cancer Genomics Centre, Rotterdam, the Netherlands
| | - John A. Foekens
- Department of Medical Oncology and Daniel Den Hoed Cancer Center, Erasmus University Medical Center, Dr. Molewaterplein 50, Be-401, P.O. Box 2040, 3000 CA Rotterdam, the Netherlands
- Netherlands Proteomics Center, Rotterdam, the Netherlands
- Center for Translational Molecular Medicine, Rotterdam, the Netherlands
- Cancer Genomics Centre, Rotterdam, the Netherlands
| | - Arzu Umar
- Department of Medical Oncology and Daniel Den Hoed Cancer Center, Erasmus University Medical Center, Dr. Molewaterplein 50, Be-401, P.O. Box 2040, 3000 CA Rotterdam, the Netherlands
- Netherlands Proteomics Center, Rotterdam, the Netherlands
- Center for Translational Molecular Medicine, Rotterdam, the Netherlands
- Cancer Genomics Centre, Rotterdam, the Netherlands
| |
Collapse
|
99
|
Tekwe CD, Carroll RJ, Dabney AR. Application of survival analysis methodology to the quantitative analysis of LC-MS proteomics data. Bioinformatics 2012; 28:1998-2003. [PMID: 22628520 DOI: 10.1093/bioinformatics/bts306] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
MOTIVATION Protein abundance in quantitative proteomics is often based on observed spectral features derived from liquid chromatography mass spectrometry (LC-MS) or LC-MS/MS experiments. Peak intensities are largely non-normal in distribution. Furthermore, LC-MS-based proteomics data frequently have large proportions of missing peak intensities due to censoring mechanisms on low-abundance spectral features. Recognizing that the observed peak intensities detected with the LC-MS method are all positive, skewed and often left-censored, we propose using survival methodology to carry out differential expression analysis of proteins. Various standard statistical techniques including non-parametric tests such as the Kolmogorov-Smirnov and Wilcoxon-Mann-Whitney rank sum tests, and the parametric survival model and accelerated failure time-model with log-normal, log-logistic and Weibull distributions were used to detect any differentially expressed proteins. The statistical operating characteristics of each method are explored using both real and simulated datasets. RESULTS Survival methods generally have greater statistical power than standard differential expression methods when the proportion of missing protein level data is 5% or more. In particular, the AFT models we consider consistently achieve greater statistical power than standard testing procedures, with the discrepancy widening with increasing missingness in the proportions. AVAILABILITY The testing procedures discussed in this article can all be performed using readily available software such as R. The R codes are provided as supplemental materials. CONTACT ctekwe@stat.tamu.edu.
Collapse
Affiliation(s)
- Carmen D Tekwe
- Department of Statistics, 3143 TAMU, College Station, TX 77843-3143, USA.
| | | | | |
Collapse
|
100
|
Andreev VP, Petyuk VA, Brewer HM, Karpievitch YV, Xie F, Clarke J, Camp D, Smith RD, Lieberman AP, Albin RL, Nawaz Z, El Hokayem J, Myers AJ. Label-free quantitative LC-MS proteomics of Alzheimer's disease and normally aged human brains. J Proteome Res 2012; 11:3053-67. [PMID: 22559202 DOI: 10.1021/pr3001546] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Quantitative proteomics analysis of cortical samples of 10 Alzheimer's disease (AD) brains versus 10 normally aged brains was performed by following the accurate mass and time tag (AMT) approach with the high resolution LTQ Orbitrap mass spectrometer. More than 1400 proteins were identified and quantitated. A conservative approach of selecting only the consensus results of four normalization methods was suggested and used. A total of 197 proteins were shown to be significantly differentially abundant (p-values <0.05, corrected for multiplicity of testing) in AD versus control brain samples. Thirty-seven of these proteins were reported as differentially abundant or modified in AD in previous proteomics and transcriptomics publications. The rest to the best of our knowledge are new. Mapping of the discovered proteins with bioinformatic tools revealed significant enrichment with differentially abundant proteins of pathways and processes known to be important in AD, including signal transduction, regulation of protein phosphorylation, immune response, cytoskeleton organization, lipid metabolism, energy production, and cell death.
Collapse
Affiliation(s)
- Victor P Andreev
- Department of Psychiatry and Behavioral Sciences, §Department of Biochemistry and Molecular Biology, ⊥Department of Epidemiology and Public Health, ▽Division of Neuroscience, and ○Department of Human Genetics and Genomics, University of Miami Miller School of Medicine , Miami, Florida, United States
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|