1
|
Niksirat H, Siino V, Steinbach C, Levander F. The quantification of zebrafish ocular-associated proteins provides hints for sex-biased visual impairments and perception. Heliyon 2024; 10:e33057. [PMID: 38994070 PMCID: PMC11238053 DOI: 10.1016/j.heliyon.2024.e33057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/11/2024] [Accepted: 06/13/2024] [Indexed: 07/13/2024] Open
Abstract
Biochemical differences between sexes can also be seen in non-sexual organs and may affect organ functions and susceptibility to diseases. It has been shown that there are sex-biased visual perceptions and impairments. Abundance differences of eye proteins could provide explanations for some of these. Exploration of the ocular proteome was performed to find sex-based protein abundance differences in zebrafish Danio rerio. A label-free protein quantification workflow using high-resolution mass spectrometry was employed to find proteins with significant differences between the sexes. In total, 3740 unique master proteins were identified and quantified, and 49 proteins showed significant abundance differences between the eyes of male and female zebrafish. Those proteins belong to lipoproteins, immune system, blood coagulation, antioxidants, iron and heme-binding proteins, ion channels, pumps and exchangers, neuronal and photoreceptor proteins, and the cytoskeleton. An extensive literature review provided clues for the possible links between the sex-biased level of proteins and visual perception and impairments. In conclusion, sexual dimorphism at the protein level was discovered for the first time in the eye of zebrafish and should be accounted for in ophthalmological studies. Data are available via ProteomeXchange with identifier PXD033338.
Collapse
Affiliation(s)
- Hamid Niksirat
- Faculty of Fisheries and Protection of Waters, CENAKVA, University of South Bohemia in České Budějovice, Vodňany, Czech Republic
| | - Valentina Siino
- Department of Immunotechnology, Lund University, Lund, Sweden
| | - Christoph Steinbach
- Faculty of Fisheries and Protection of Waters, CENAKVA, University of South Bohemia in České Budějovice, Vodňany, Czech Republic
| | - Fredrik Levander
- Department of Immunotechnology, Lund University, Lund, Sweden
- National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Lund University, Lund, Sweden
| |
Collapse
|
2
|
Gaither KA, Garcia WL, Tyrrell KJ, Wright AT, Smith JN. Activity-Based Protein Profiling to Probe Relationships between Cytochrome P450 Enzymes and Early-Age Metabolism of Two Polycyclic Aromatic Hydrocarbons (PAHs): Phenanthrene and Retene. Chem Res Toxicol 2024; 37:711-722. [PMID: 38602333 DOI: 10.1021/acs.chemrestox.3c00424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
A growing body of literature has linked early-life exposures to polycyclic aromatic hydrocarbons (PAH) with adverse neurodevelopmental effects. Once in the body, metabolism serves as a powerful mediator of PAH toxicity by bioactivating and detoxifying PAH metabolites. Since enzyme expression and activity vary considerably throughout human development, we evaluated infant metabolism of PAHs as a potential contributing factor to PAH susceptibility. We measured and compared rates of phenanthrene and retene (two primary PAH constituents of woodsmoke) metabolism in human hepatic microsomes from individuals ≤21 months of age to a pooled sample (n = 200) consisting primarily of adults. We used activity-based protein profiling (ABPP) to characterize cytochrome P450 enzymes (CYPs) in the same hepatic microsome samples. Once incubated in microsomes, phenanthrene demonstrated rapid depletion. Best-fit models for phenanthrene metabolism demonstrated either 1 or 2 phases, depending on the sample, indicating that multiple enzymes could metabolize phenanthrene. We observed no statistically significant differences in phenanthrene metabolism as a function of age, although samples from the youngest individuals had the slowest phenanthrene metabolism rates. We observed slower rates of retene metabolism compared with phenanthrene also in multiple phases. Rates of retene metabolism increased in an age-dependent manner until adult (pooled) metabolism rates were achieved at ∼12 months. ABPP identified 28 unique CYPs among all samples, and we observed lower amounts of active CYPs in individuals ≤21 months of age compared to the pooled sample. Phenanthrene metabolism correlated to CYPs 1A1, 1A2, 2C8, 4A22, 3A4, and 3A43 and retene metabolism correlated to CYPs 1A1, 1A2, and 2C8 measured by ABPP and vendor-supplied substrate marker activities. These results will aid efforts to determine human health risk and susceptibility to PAHs exposure during early life.
Collapse
Affiliation(s)
- Kari A Gaither
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Whitney L Garcia
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
- Department of Biology, Baylor University, Waco, Texas 76706, United States
| | - Kimberly J Tyrrell
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Aaron T Wright
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
- Department of Chemistry and Biochemistry, Baylor University, Waco, Texas 76706, United States
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon 97331, United States
| | - Jordan N Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon 97331, United States
| |
Collapse
|
3
|
Larsson S, Holmgren S, Jenndahl L, Ulfenborg B, Strehl R, Synnergren J, Ghosheh N. Proteome of Personalized Tissue-Engineered Veins. ACS OMEGA 2024; 9:14805-14817. [PMID: 38585136 PMCID: PMC10993322 DOI: 10.1021/acsomega.3c07098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 02/22/2024] [Accepted: 02/27/2024] [Indexed: 04/09/2024]
Abstract
Vascular diseases are the largest cause of death globally and impose a major global burden on healthcare. The gold standard for treating vascular diseases is the transplantation of autologous veins, if applicable. Alternative treatments still suffer from shortcomings, including low patency, lack of growth potential, the need for repeated intervention, and a substantial risk of developing infections. The use of a vascular ECM scaffold reconditioned with the patient's own cells has shown successful results in preclinical and clinical studies. In this study, we have compared the proteomes of personalized tissue-engineered veins of humans and pigs. By applying tandem mass tag (TMT) labeling LC/MS-MS, we have investigated the proteome of decellularized (DC) veins from humans and pigs and reconditioned (RC) DC veins produced through perfusion with the patient's whole blood in STEEN solution, applying the same technology as used in the preclinical studies. The results revealed high similarity between the proteomes of human and pig DC and RC veins, including the ECM texture after decellularization and reconditioning. In addition, functional enrichment analysis showed similarities in signaling pathways and biological processes involved in the immune system response. Furthermore, the classification of proteins involved in immune response activity that were detected in human and pig RC veins revealed proteins that evoke immunogenic responses, which may lead to graft rejection, thrombosis, and inflammation. However, the results from this study imply the initiation of wound healing rather than an immunogenic response, as both systems share the same processes, and no immunogenic response was reported in the preclinical and clinical studies. Finally, our study assessed the application of STEEN solution in tissue engineering and identified proteins that may be useful for the prediction of successful transplantations.
Collapse
Affiliation(s)
- Susanna Larsson
- Systems
Biology Research Center, School of Bioscience, University of Skövde, SE-541 28 Skövde, Sweden
| | - Sandra Holmgren
- VERIGRAFT, Arvid Wallgrens Backe 20, SE-413 46 Gothenburg, Sweden
| | - Lachmi Jenndahl
- VERIGRAFT, Arvid Wallgrens Backe 20, SE-413 46 Gothenburg, Sweden
| | - Benjamin Ulfenborg
- Systems
Biology Research Center, School of Bioscience, University of Skövde, SE-541 28 Skövde, Sweden
| | - Raimund Strehl
- VERIGRAFT, Arvid Wallgrens Backe 20, SE-413 46 Gothenburg, Sweden
| | - Jane Synnergren
- Systems
Biology Research Center, School of Bioscience, University of Skövde, SE-541 28 Skövde, Sweden
- Department
of Molecular and Clinical Medicine, Institute
of Medicine, Sahlgrenska Academy at University of Gothenburg, SE-413 45 Gothenburg, Sweden
| | - Nidal Ghosheh
- Systems
Biology Research Center, School of Bioscience, University of Skövde, SE-541 28 Skövde, Sweden
| |
Collapse
|
4
|
Abid MSR, Qiu H, Checco JW. Label-Free Quantitation of Endogenous Peptides. Methods Mol Biol 2024; 2758:125-150. [PMID: 38549012 PMCID: PMC11027169 DOI: 10.1007/978-1-0716-3646-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Liquid chromatography-mass spectrometry (LC-MS)-based peptidomics methods allow for the detection and identification of many peptides in a complex biological mixture in an untargeted manner. Quantitative peptidomics approaches allow for comparisons of peptide abundance between different samples, allowing one to draw conclusions about peptide differences as a function of experimental treatment or physiology. While stable isotope labeling is a powerful approach for quantitative proteomics and peptidomics, advances in mass spectrometry instrumentation and analysis tools have allowed label-free methods to gain popularity in recent years. In a general label-free quantitative peptidomics experiment, peak intensity information for each peptide is compared across multiple LC-MS runs. Here, we outline a general approach for label-free quantitative peptidomics experiments, including steps for sample preparation, LC-MS data acquisition, data processing, and statistical analysis. Special attention is paid to address run-to-run variability, which can lead to several major problems in label-free experiments. Overall, our method provides researchers with a framework for the development of their own quantitative peptidomics workflows applicable to quantitation of peptides from a wide variety of different biological sources.
Collapse
Affiliation(s)
| | - Haowen Qiu
- Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, NE, USA
- The Nebraska Center for Integrated Biomolecular Communication (NCIBC), University of Nebraska-Lincoln, Lincoln, NE, USA
| | - James W Checco
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE, USA.
- The Nebraska Center for Integrated Biomolecular Communication (NCIBC), University of Nebraska-Lincoln, Lincoln, NE, USA.
| |
Collapse
|
5
|
Rodriguez Gallo MC, Li Q, Talasila M, Uhrig RG. Quantitative Time-Course Analysis of Osmotic and Salt Stress in Arabidopsis thaliana Using Short Gradient Multi-CV FAIMSpro BoxCar DIA. Mol Cell Proteomics 2023; 22:100638. [PMID: 37704098 PMCID: PMC10663867 DOI: 10.1016/j.mcpro.2023.100638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 08/22/2023] [Accepted: 08/27/2023] [Indexed: 09/15/2023] Open
Abstract
A major limitation when undertaking quantitative proteomic time-course experimentation is the tradeoff between depth-of-analysis and speed-of-analysis. In high complexity and high dynamic range sample types, such as plant extracts, balance between resolution and time is especially apparent. To address this, we evaluate multiple compensation voltage (CV) high field asymmetric waveform ion mobility spectrometry (FAIMSpro) settings using the latest label-free single-shot Orbitrap-based DIA acquisition workflows for their ability to deeply quantify the Arabidopsis thaliana seedling proteome. Using a BoxCarDIA acquisition workflow with a -30 -50 -70 CV FAIMSpro setting, we were able to consistently quantify >5000 Arabidopsis seedling proteins over a 21-min gradient, facilitating the analysis of ∼42 samples per day. Utilizing this acquisition approach, we then quantified proteome-level changes occurring in Arabidopsis seedling shoots and roots over 24 h of salt and osmotic stress, to identify early and late stress response proteins and reveal stress response overlaps. Here, we successfully quantify >6400 shoot and >8500 root protein groups, respectively, quantifying nearly ∼9700 unique protein groups in total across the study. Collectively, we pioneer a short gradient, multi-CV FAIMSpro BoxCarDIA acquisition workflow that represents an exciting new analysis approach for undertaking quantitative proteomic time-course experimentation in plants.
Collapse
Affiliation(s)
- M C Rodriguez Gallo
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Q Li
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - M Talasila
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - R G Uhrig
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada; Department of Biochemistry, University of Alberta, Edmonton, Alberta, Canada.
| |
Collapse
|
6
|
Hutchings C, Dawson CS, Krueger T, Lilley KS, Breckels LM. A Bioconductor workflow for processing, evaluating, and interpreting expression proteomics data. F1000Res 2023; 12:1402. [PMID: 38021401 PMCID: PMC10683783 DOI: 10.12688/f1000research.139116.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/15/2023] [Indexed: 12/01/2023] Open
Abstract
Background: Expression proteomics involves the global evaluation of protein abundances within a system. In turn, differential expression analysis can be used to investigate changes in protein abundance upon perturbation to such a system. Methods: Here, we provide a workflow for the processing, analysis and interpretation of quantitative mass spectrometry-based expression proteomics data. This workflow utilizes open-source R software packages from the Bioconductor project and guides users end-to-end and step-by-step through every stage of the analyses. As a use-case we generated expression proteomics data from HEK293 cells with and without a treatment. Of note, the experiment included cellular proteins labelled using tandem mass tag (TMT) technology and secreted proteins quantified using label-free quantitation (LFQ). Results: The workflow explains the software infrastructure before focusing on data import, pre-processing and quality control. This is done individually for TMT and LFQ datasets. The application of statistical differential expression analysis is demonstrated, followed by interpretation via gene ontology enrichment analysis. Conclusions: A comprehensive workflow for the processing, analysis and interpretation of expression proteomics is presented. The workflow is a valuable resource for the proteomics community and specifically beginners who are at least familiar with R who wish to understand and make data-driven decisions with regards to their analyses.
Collapse
Affiliation(s)
- Charlotte Hutchings
- Cambridge Centre for Proteomics, University of Cambridge, Cambridge, CB2 1QR, UK
| | - Charlotte S. Dawson
- Cambridge Centre for Proteomics, University of Cambridge, Cambridge, CB2 1QR, UK
| | - Thomas Krueger
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QR, UK
| | - Kathryn S. Lilley
- Cambridge Centre for Proteomics, University of Cambridge, Cambridge, CB2 1QR, UK
| | - Lisa M. Breckels
- Cambridge Centre for Proteomics, University of Cambridge, Cambridge, CB2 1QR, UK
| |
Collapse
|
7
|
Sharman K, Patterson NH, Weiss A, Neumann EK, Guiberson ER, Ryan DJ, Gutierrez DB, Spraggins JM, Van de Plas R, Skaar EP, Caprioli RM. Rapid Multivariate Analysis Approach to Explore Differential Spatial Protein Profiles in Tissue. J Proteome Res 2023; 22:1394-1405. [PMID: 35849531 PMCID: PMC9845430 DOI: 10.1021/acs.jproteome.2c00206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Spatially targeted proteomics analyzes the proteome of specific cell types and functional regions within tissue. While spatial context is often essential to understanding biological processes, interpreting sub-region-specific protein profiles can pose a challenge due to the high-dimensional nature of the data. Here, we develop a multivariate approach for rapid exploration of differential protein profiles acquired from distinct tissue regions and apply it to analyze a published spatially targeted proteomics data set collected from Staphylococcus aureus-infected murine kidney, 4 and 10 days postinfection. The data analysis process rapidly filters high-dimensional proteomic data to reveal relevant differentiating species among hundreds to thousands of measured molecules. We employ principal component analysis (PCA) for dimensionality reduction of protein profiles measured by microliquid extraction surface analysis mass spectrometry. Subsequently, k-means clustering of the PCA-processed data groups samples by chemical similarity. Cluster center interpretation revealed a subset of proteins that differentiate between spatial regions of infection over two time points. These proteins appear involved in tricarboxylic acid metabolomic pathways, calcium-dependent processes, and cytoskeletal organization. Gene ontology analysis further uncovered relationships to tissue damage/repair and calcium-related defense mechanisms. Applying our analysis in infectious disease highlighted differential proteomic changes across abscess regions over time, reflecting the dynamic nature of host-pathogen interactions.
Collapse
Affiliation(s)
- Kavya Sharman
- Mass Spectrometry Research Center, Vanderbilt University, Nashville, Tennessee 37235, United States
- Program in Chemical & Physical Biology, Vanderbilt University Medical Center, Nashville, Tennessee 37232, United States
| | - Nathan Heath Patterson
- Mass Spectrometry Research Center, Vanderbilt University, Nashville, Tennessee 37235, United States
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee 37232, United States
| | - Andy Weiss
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee 37212, United States
| | - Elizabeth K Neumann
- Mass Spectrometry Research Center, Vanderbilt University, Nashville, Tennessee 37235, United States
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee 37232, United States
| | - Emma R Guiberson
- Mass Spectrometry Research Center, Vanderbilt University, Nashville, Tennessee 37235, United States
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Daniel J Ryan
- Pfizer Inc., Chesterfield, Missouri 63017, United States
| | - Danielle B Gutierrez
- Mass Spectrometry Research Center, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Jeffrey M Spraggins
- Mass Spectrometry Research Center, Vanderbilt University, Nashville, Tennessee 37235, United States
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee 37232, United States
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
- Department of Cell and Developmental Biology, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Raf Van de Plas
- Mass Spectrometry Research Center, Vanderbilt University, Nashville, Tennessee 37235, United States
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee 37232, United States
- Delft Center for Systems and Control, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Eric P Skaar
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee 37212, United States
- Department of Medicine, Vanderbilt University, Nashville, Tennessee 37232, United States
| | - Richard M Caprioli
- Mass Spectrometry Research Center, Vanderbilt University, Nashville, Tennessee 37235, United States
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee 37232, United States
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
- Department of Pharmacology, Vanderbilt University, Nashville, Tennessee 37232, United States
| |
Collapse
|
8
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States,*Correspondence: Lisa M. Bramer ✉
| |
Collapse
|
9
|
Buyukozkan M, Benedetti E, Krumsiek J. rox: A Statistical Model for Regression with Missing Values. Metabolites 2023; 13:metabo13010127. [PMID: 36677052 PMCID: PMC9861384 DOI: 10.3390/metabo13010127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 11/15/2022] [Accepted: 11/17/2022] [Indexed: 01/18/2023] Open
Abstract
High-dimensional omics datasets frequently contain missing data points, which typically occur due to concentrations below the limit of detection (LOD) of the profiling platform. The presence of such missing values significantly limits downstream statistical analysis and result interpretation. Two common techniques to deal with this issue include the removal of samples with missing values and imputation approaches that substitute the missing measurements with reasonable estimates. Both approaches, however, suffer from various shortcomings and pitfalls. In this paper, we present "rox", a novel statistical model for the analysis of omics data with missing values without the need for imputation. The model directly incorporates missing values as "low" concentrations into the calculation. We show the superiority of rox over common approaches on simulated data and on six metabolomics datasets. Fully leveraging the information contained in LOD-based missing values, rox provides a powerful tool for the statistical analysis of omics data.
Collapse
|
10
|
Zhang Y, Dreyer B, Govorukhina N, Heberle AM, Končarević S, Krisp C, Opitz CA, Pfänder P, Bischoff R, Schlüter H, Kwiatkowski M, Thedieck K, Horvatovich PL. Comparative Assessment of Quantification Methods for Tumor Tissue Phosphoproteomics. Anal Chem 2022; 94:10893-10906. [PMID: 35880733 PMCID: PMC9366746 DOI: 10.1021/acs.analchem.2c01036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
![]()
With increasing sensitivity and accuracy in mass spectrometry,
the tumor phosphoproteome is getting into reach. However, the selection
of quantitation techniques best-suited to the biomedical question
and diagnostic requirements remains a trial and error decision as
no study has directly compared their performance for tumor tissue
phosphoproteomics. We compared label-free quantification (LFQ), spike-in-SILAC
(stable isotope labeling by amino acids in cell culture), and tandem
mass tag (TMT) isobaric tandem mass tags technology for quantitative
phosphosite profiling in tumor tissue. Compared to the classic SILAC
method, spike-in-SILAC is not limited to cell culture analysis, making
it suitable for quantitative analysis of tumor tissue samples. TMT
offered the lowest accuracy and the highest precision and robustness
toward different phosphosite abundances and matrices. Spike-in-SILAC
offered the best compromise between these features but suffered from
a low phosphosite coverage. LFQ offered the lowest precision but the
highest number of identifications. Both spike-in-SILAC and LFQ presented
susceptibility to matrix effects. Match between run (MBR)-based analysis
enhanced the phosphosite coverage across technical replicates in LFQ
and spike-in-SILAC but further reduced the precision and robustness
of quantification. The choice of quantitative methodology is critical
for both study design such as sample size in sample groups and quantified
phosphosites and comparison of published cancer phosphoproteomes.
Using ovarian cancer tissue as an example, our study builds a resource
for the design and analysis of quantitative phosphoproteomic studies
in cancer research and diagnostics.
Collapse
Affiliation(s)
- Yang Zhang
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, 9713 AV Groningen, The Netherlands.,Institute of Biochemistry and Center for Molecular Biosciences Innsbruck, University of Innsbruck, 6020 Innsbruck, Austria.,Laboratory of Pediatrics, Section Systems Medicine of Metabolism and Signaling, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands
| | - Benjamin Dreyer
- Section/Core Facility Mass Spectrometry and Proteomics, Institute of Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf, Martinistraße 52, 20246 Hamburg, Germany
| | - Natalia Govorukhina
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, 9713 AV Groningen, The Netherlands
| | - Alexander M Heberle
- Institute of Biochemistry and Center for Molecular Biosciences Innsbruck, University of Innsbruck, 6020 Innsbruck, Austria.,Laboratory of Pediatrics, Section Systems Medicine of Metabolism and Signaling, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands
| | - Saša Končarević
- Proteome Sciences R&D GmbH & Co. KG, Altenhöferallee 3, 60438 Frankfurt/Main, Germany
| | - Christoph Krisp
- Section/Core Facility Mass Spectrometry and Proteomics, Institute of Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf, Martinistraße 52, 20246 Hamburg, Germany
| | - Christiane A Opitz
- Metabolic Crosstalk in Cancer, German Consortium of Translational Cancer Research (DKTK), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.,Department of Neurology, National Center for Tumor Diseases, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Pauline Pfänder
- Metabolic Crosstalk in Cancer, German Consortium of Translational Cancer Research (DKTK), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.,Faculty of Bioscience, Heidelberg University, 69117 Heidelberg, Germany
| | - Rainer Bischoff
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, 9713 AV Groningen, The Netherlands
| | - Hartmut Schlüter
- Section/Core Facility Mass Spectrometry and Proteomics, Institute of Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf, Martinistraße 52, 20246 Hamburg, Germany
| | - Marcel Kwiatkowski
- Institute of Biochemistry and Center for Molecular Biosciences Innsbruck, University of Innsbruck, 6020 Innsbruck, Austria.,Department of Molecular Pharmacology, Groningen Research Institute for Pharmacy, University of Groningen, Groningen 9700 AD, The Netherlands.,Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen 9700 AD, The Netherlands
| | - Kathrin Thedieck
- Institute of Biochemistry and Center for Molecular Biosciences Innsbruck, University of Innsbruck, 6020 Innsbruck, Austria.,Laboratory of Pediatrics, Section Systems Medicine of Metabolism and Signaling, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands.,Department of Neuroscience, School of Medicine and Health Sciences, Carl von Ossietzky University Oldenburg, 26129 Oldenburg, Germany
| | - Peter L Horvatovich
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, 9713 AV Groningen, The Netherlands
| |
Collapse
|
11
|
Plancade S, Berland M, Blein-Nicolas M, Langella O, Bassignani A, Juste C. A combined test for feature selection on sparse metaproteomics data-an alternative to missing value imputation. PeerJ 2022; 10:e13525. [PMID: 35769140 PMCID: PMC9235818 DOI: 10.7717/peerj.13525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 05/11/2022] [Indexed: 01/18/2023] Open
Abstract
One of the difficulties encountered in the statistical analysis of metaproteomics data is the high proportion of missing values, which are usually treated by imputation. Nevertheless, imputation methods are based on restrictive assumptions regarding missingness mechanisms, namely "at random" or "not at random". To circumvent these limitations in the context of feature selection in a multi-class comparison, we propose a univariate selection method that combines a test of association between missingness and classes, and a test for difference of observed intensities between classes. This approach implicitly handles both missingness mechanisms. We performed a quantitative and qualitative comparison of our procedure with imputation-based feature selection methods on two experimental data sets, as well as simulated data with various scenarios regarding the missingness mechanisms and the nature of the difference of expression (differential intensity or differential presence). Whereas we observed similar performances in terms of prediction on the experimental data set, the feature ranking and selection from various imputation-based methods were strongly divergent. We showed that the combined test reaches a compromise by correlating reasonably with other methods, and remains efficient in all simulated scenarios unlike imputation-based feature selection methods.
Collapse
Affiliation(s)
- Sandra Plancade
- UR875 MIAT, Université fédérale de Toulouse, INRAE, Castanet-Tolosan, France
| | - Magali Berland
- Université Paris-Saclay, INRAE, MGP, Jouy en Josas, France
| | - Mélisande Blein-Nicolas
- Université Paris-Saclay, CNRS, INRAE, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette, France,Université Paris-Saclay, CNRS, INRAE, AgroParisTech, PAPPSO, Gif-sur-Yvette, France
| | - Olivier Langella
- Université Paris-Saclay, CNRS, INRAE, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette, France,Université Paris-Saclay, CNRS, INRAE, AgroParisTech, PAPPSO, Gif-sur-Yvette, France
| | - Ariane Bassignani
- Université Paris-Saclay, INRAE, MGP, Jouy en Josas, France,Université Paris-Saclay, CNRS, INRAE, AgroParisTech, PAPPSO, Gif-sur-Yvette, France
| | - Catherine Juste
- Micalis Institute, Université Paris-Saclay, INRAE, AgroParis Tech, Jouy-en-Josas, France
| |
Collapse
|
12
|
Garcia WL, Miller CJ, Lomas GX, Gaither KA, Tyrrell KJ, Smith JN, Brandvold KR, Wright AT. Profiling How the Gut Microbiome Modulates Host Xenobiotic Metabolism in Response to Benzo[ a]pyrene and 1-Nitropyrene Exposure. Chem Res Toxicol 2022; 35:585-596. [PMID: 35347982 PMCID: PMC9878584 DOI: 10.1021/acs.chemrestox.1c00360] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The gut microbiome is a key contributor to xenobiotic metabolism. Polycyclic aromatic hydrocarbons (PAHs) are an abundant class of environmental contaminants that have varying levels of carcinogenicity depending on their individual structures. Little is known about how the gut microbiome affects the rates of PAH metabolism. This study sought to determine the role that the gut microbiome has in determining the various aspects of metabolism in the liver, before and after exposure to two structurally different PAHs, benzo[a]pyrene and 1-nitropyrene. Following exposures, the metabolic rates of PAH metabolism were measured, and activity-based protein profiling was performed. We observed differences in PAH metabolism rates between germ-free and conventional mice under both unexposed and exposed conditions. Our activity-based protein profiling (ABPP) analysis showed that, under unexposed conditions, there were only minor differences in total P450 activity in germ-free mice relative to conventional mice. However, we observed distinct activity profiles in response to corn oil vehicle and PAH treatment, primarily in the case of 1-NP treatment. This study revealed that the repertoire of active P450s in the liver is impacted by the presence of the gut microbiome, which modifies PAH metabolism in a substrate-specific fashion.
Collapse
Affiliation(s)
- Whitney L. Garcia
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA),Biological Systems Engineering Department, CAHNRS, Washington State University, Pullman, WA 99163 (USA)
| | - Carson J. Miller
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA)
| | - Gerard X. Lomas
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA)
| | - Kari A. Gaither
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA)
| | - Kimberly J. Tyrrell
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA)
| | - Jordan N. Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA),Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331 (USA)
| | - Kristoffer R. Brandvold
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA),Elson S. Floyd College of Medicine, Washington State University, Spokane, WA 99202 (USA),Corresponding Authors: Kristoffer R. Brandvold - Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA); , Aaron T. Wright - Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA);
| | - Aaron T. Wright
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA),The Gene and Linda Voiland School of Chemical Engineering and Bioengineering, Washington State University, Pullman, WA 99163 (USA),Corresponding Authors: Kristoffer R. Brandvold - Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA); , Aaron T. Wright - Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352 (USA);
| |
Collapse
|
13
|
Suomi T, Elo LL. Statistical and machine learning methods to study human CD4+ T cell proteome profiles. Immunol Lett 2022; 245:8-17. [DOI: 10.1016/j.imlet.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/11/2022] [Accepted: 03/15/2022] [Indexed: 11/05/2022]
|
14
|
Huang Z, Wang C. A Review on Differential Abundance Analysis Methods for Mass Spectrometry-Based Metabolomic Data. Metabolites 2022; 12:305. [PMID: 35448492 PMCID: PMC9032534 DOI: 10.3390/metabo12040305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 03/26/2022] [Accepted: 03/27/2022] [Indexed: 12/04/2022] Open
Abstract
This review presents an overview of the statistical methods on differential abundance (DA) analysis for mass spectrometry (MS)-based metabolomic data. MS has been widely used for metabolomic abundance profiling in biological samples. The high-throughput data produced by MS often contain a large fraction of zero values caused by the absence of certain metabolites and the technical detection limits of MS. Various statistical methods have been developed to characterize the zero-inflated metabolomic data and perform DA analysis, ranging from simple tests to more complex models including parametric, semi-parametric, and non-parametric approaches. In this article, we discuss and compare DA analysis methods regarding their assumptions and statistical modeling techniques.
Collapse
Affiliation(s)
- Zhengyan Huang
- Everest Clinical Research Corporation, Little Falls, NJ 07424, USA
| | - Chi Wang
- Markey Cancer Center, Department of Internal Medicine, University of Kentucky, Lexington, KY 40536, USA
| |
Collapse
|
15
|
Zhang H, Ao M, Boja A, Schnaubelt M, Hu Y. OmicsOne: associate omics data with phenotypes in one-click. Clin Proteomics 2021; 18:29. [PMID: 34895137 PMCID: PMC8903648 DOI: 10.1186/s12014-021-09334-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 11/22/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The rapid advancements of high throughput "omics" technologies have brought a massive amount of data to process during and after experiments. Multi-omic analysis facilitates a deeper interrogation of a dataset and the discovery of interesting genes, proteins, lipids, glycans, metabolites, or pathways related to the corresponding phenotypes in a study. Many individual software tools have been developed for data analysis and visualization. However, it still lacks an efficient way to investigate the phenotypes with multiple omics data. Here, we present OmicsOne as an interactive web-based framework for rapid phenotype association analysis of multi-omic data by integrating quality control, statistical analysis, and interactive data visualization on 'one-click'. MATERIALS AND METHODS OmicsOne was applied on the previously published proteomic and glycoproteomic data sets of high-grade serous ovarian carcinoma (HGSOC) and the published proteome data set of lung squamous cell carcinoma (LSCC) to confirm its performance. The data was analyzed through six main functional modules implemented in OmicsOne: (1) phenotype profiling, (2) data preprocessing and quality control, (3) knowledge annotation, (4) phenotype associated features discovery, (5) correlation and regression model analysis for phenotype association analysis on individual features, and (6) enrichment analysis for phenotype association analysis on interested feature sets. RESULTS We developed an integrated software solution, OmicsOne, for the phenotype association analysis on multi-omics data sets. The application of OmicsOne on the public data set of ovarian cancer data showed that the software could confirm the previous observations consistently and discover new evidence for HNRNPU and a glycopeptide of HYOU1 as potential biomarkers for HGSOC data sets. The performance of OmicsOne was further demonstrated in the Tumor and NAT comparison study on the proteome data set of LSCC. CONCLUSIONS OmicsOne can effectively simplify data analysis and reveal the significant associations between phenotypes and potential biomarkers, including genes, proteins, and glycopeptides, in minutes to assist users to understand aberrant biological processes.
Collapse
Affiliation(s)
- Hui Zhang
- School of Medicine, Johns Hopkins University, Baltimore, MD, 21287, USA
| | - Minghui Ao
- School of Medicine, Johns Hopkins University, Baltimore, MD, 21287, USA
| | - Arianna Boja
- Mount Hebron High School, Ellicott City, MD, 21042, USA
| | | | - Yingwei Hu
- School of Medicine, Johns Hopkins University, Baltimore, MD, 21287, USA.
| |
Collapse
|
16
|
Quast JP, Schuster D, Picotti P. protti: an R package for comprehensive data analysis of peptide- and protein-centric bottom-up proteomics data. BIOINFORMATICS ADVANCES 2021; 2:vbab041. [PMID: 36699412 PMCID: PMC9710675 DOI: 10.1093/bioadv/vbab041] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 10/28/2021] [Accepted: 12/06/2021] [Indexed: 01/28/2023]
Abstract
Summary We present a flexible, user-friendly R package called protti for comprehensive quality control, analysis and interpretation of quantitative bottom-up proteomics data. protti supports the analysis of protein-centric data such as those associated with protein expression analyses, as well as peptide-centric data such as those resulting from limited proteolysis-coupled mass spectrometry analysis. Due to its flexible design, it supports analysis of label-free, data-dependent, data-independent and targeted proteomics datasets. protti can be run on the output of any search engine and software package commonly used for bottom-up proteomics experiments such as Spectronaut, Skyline, MaxQuant or Proteome Discoverer, adequately exported to table format. Availability and implementation protti is implemented as an open-source R package. Release versions are available via CRAN (https://CRAN.R-project.org/package=protti) and work on all major operating systems. The development version is maintained on GitHub (https://github.com/jpquast/protti). Full documentation including examples is provided in the form of vignettes on our package website (jpquast.github.io/protti/).
Collapse
Affiliation(s)
- Jan-Philipp Quast
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland
| | - Dina Schuster
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland,Department of Biology, Institute of Molecular Biology and Biophysics, ETH Zurich, Zurich 8093, Switzerland,Laboratory of Biomolecular Research, Division of Biology and Chemistry, Paul Scherrer Institute, Villigen 5232, Switzerland
| | - Paola Picotti
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland,To whom correspondence should be addressed.
| |
Collapse
|
17
|
Stoddard EG, Nag S, Martin J, Tyrrell KJ, Gibbins T, Anderson KA, Shukla AK, Corley R, Wright AT, Smith JN. Exposure to an Environmental Mixture of Polycyclic Aromatic Hydrocarbons Induces Hepatic Cytochrome P450 Enzymes in Mice. Chem Res Toxicol 2021; 34:2145-2156. [PMID: 34472326 DOI: 10.1021/acs.chemrestox.1c00235] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Cytochrome P450 enzymes (CYPs) play an important role in bioactivating or detoxifying polycyclic aromatic hydrocarbons (PAHs), common environmental contaminants. While it is widely accepted that exposure to PAHs induces CYPs, effectively increasing rates of xenobiotic metabolism, dose- and time-response patterns of CYP induction are not well-known. In order to better understand dose- and time-response relationships of individual CYPs following induction, we exposed B6129SF1/J mice to single or repeated doses (2-180 μmol/kg/d) of benzo[a]pyrene (BaP) or Supermix-10, a mixture of the top 10 most abundant PAHs found at the Portland Harbor Superfund Site. In hepatic microsomes from exposed mice, we measured amounts of active CYPs using activity-based protein profiling and total CYP expression using global proteomics. We observed rapid Cyp1a1 induction after 6 h at the lowest PAH exposures and broad induction of many CYPs after 3 daily PAH doses at 72 h following the first dose. Using samples displaying Cyp1a1 induction, we observed significantly higher metabolic affinity for BaP metabolism (Km reduced 3-fold), 3-fold higher intrinsic clearance, but no changes to the Vmax. Mice dosed with the highest PAH exposures exhibited 1.7-5-fold higher intrinsic clearance rates for BaP compared to controls and higher Vmax values indicating greater amounts of enzymes capable of metabolizing BaP. This study demonstrates exposure to PAHs found at superfund sites induces enzymes in dose- and time-dependent patterns in mice. Accounting for specific changes in enzyme profiles, relative rates of PAH bioactivation and detoxification, and resulting risk will help translate internal dosimetry of animal models to humans and improve risk assessments of PAHs at superfund sites.
Collapse
Affiliation(s)
- Ethan G Stoddard
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Subhasree Nag
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Jude Martin
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Kimberly J Tyrrell
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Teresa Gibbins
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Kim A Anderson
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon 97331, United States
| | - Anil K Shukla
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Richard Corley
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Aaron T Wright
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States.,The Gene and Linda Voiland School of Chemical Engineering and Bioengineering, Washington State University, Pullman, Washington 99163, United States
| | - Jordan N Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States.,Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon 97331, United States
| |
Collapse
|
18
|
Gardner ML, Freitas MA. Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics. Int J Mol Sci 2021; 22:ijms22179650. [PMID: 34502557 PMCID: PMC8431783 DOI: 10.3390/ijms22179650] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 08/28/2021] [Accepted: 08/31/2021] [Indexed: 01/15/2023] Open
Abstract
Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of samples and varying rates of “missing not at random” (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.
Collapse
Affiliation(s)
- Miranda L. Gardner
- Ohio State Biochemistry Program, Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA;
- Cancer Biology and Genetics, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
| | - Michael A. Freitas
- Ohio State Biochemistry Program, Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA;
- Cancer Biology and Genetics, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
- Correspondence: or
| |
Collapse
|
19
|
Taylor S, Ponzini M, Wilson M, Kim K. Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data. Brief Bioinform 2021; 23:6361033. [PMID: 34472591 DOI: 10.1093/bib/bbab353] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 07/27/2021] [Accepted: 08/10/2021] [Indexed: 11/14/2022] Open
Abstract
Missing values are common in high-throughput mass spectrometry data. Two strategies are available to address missing values: (i) eliminate or impute the missing values and apply statistical methods that require complete data and (ii) use statistical methods that specifically account for missing values without imputation (imputation-free methods). This study reviews the effect of sample size and percentage of missing values on statistical inference for multiple methods under these two strategies. With increasing missingness, the ability of imputation and imputation-free methods to identify differentially and non-differentially regulated compounds in a two-group comparison study declined. Random forest and k-nearest neighbor imputation combined with a Wilcoxon test performed well in statistical testing for up to 50% missingness with little bias in estimating the effect size. Quantile regression imputation accompanied with a Wilcoxon test also had good statistical testing outcomes but substantially distorted the difference in means between groups. None of the imputation-free methods performed consistently better for statistical testing than imputation methods.
Collapse
Affiliation(s)
- Sandra Taylor
- Division of Biostatistics, School of Medicine at the University of California, Davis, 2921 Stockton Boulevard, Suite 1400, Sacramento, CA 95817, USA
| | - Matthew Ponzini
- Division of Biostatistics, School of Medicine at the University of California, Davis, 2921 Stockton Boulevard, Suite 1400, Sacramento, CA 95817, USA
| | - Machelle Wilson
- Division of Biostatistics, School of Medicine at the University of California, Davis, 2921 Stockton Boulevard, Suite 1400, Sacramento, CA 95817, USA
| | - Kyoungmi Kim
- Division of Biostatistics, School of Medicine at the University of California, Davis, 2921 Stockton Boulevard, Suite 1400, Sacramento, CA 95817, USA
| |
Collapse
|
20
|
Herbers J, Miller R, Walther A, Schindler L, Schmidt K, Gao W, Rupprecht F. How to deal with non-detectable and outlying values in biomarker research: Best practices and recommendations for univariate imputation approaches. COMPREHENSIVE PSYCHONEUROENDOCRINOLOGY 2021; 7:100052. [PMID: 35757062 PMCID: PMC9216349 DOI: 10.1016/j.cpnec.2021.100052] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/22/2021] [Indexed: 12/22/2022] Open
Abstract
Non-detectable (ND) and outlying concentration values (OV) are a common challenge of biomarker investigations. However, best practices on how to aptly deal with the affected cases are still missing. The high methodological heterogeneity in biomarker-oriented research, as for example, in the field of psychoneuroendocrinology, and the statistical bias in some of the applied methods may compromise the robustness, comparability, and generalizability of research findings. In this paper, we describe the occurrence of ND and OV in terms of a model that considers them as censored data, for instance due to measurement error cutoffs. We then present common univariate approaches in handling ND and OV by highlighting their respective strengths and drawbacks. In a simulation study with lognormal distributed data, we compare the performance of six selected methods, ranging from simple and commonly used to more sophisticated imputation procedures, in four scenarios with varying patterns of censored values as well as for a broad range of cutoffs. Especially deletion, but also fixed-value imputations bear a high risk of biased and pseudo-precise parameter estimates. We also introduce censored regressions as a more sophisticated option for a direct modeling of the censored data. Our analyses demonstrate the impact of ND and OV handling methods on the results of biomarker-oriented research, supporting the need for transparent reporting and the implementation of best practices. In our simulations, the use of imputed data from the censored intervals of a fitted lognormal distribution shows preferable properties regarding our established criteria. We provide the algorithm for this favored routine for a direct application in R on the Open Science Framework (https://osf.io/spgtv). Further research is needed to evaluate the performance of the algorithm in various contexts, for example when the underlying assumptions do not hold. We conclude with recommendations and potential further improvements for the field. ND and OV are considered as censored data, e.g. due to measurement error cutoffs. Several common univariate approaches in handling ND and OV are presented. In a simulation study, their performances are compared. A novel algorithm shows preferable properties. General recommendations on how to deal with ND and OV are presented.
Collapse
|
21
|
Rahmatbakhsh M, Gagarinova A, Babu M. Bioinformatic Analysis of Temporal and Spatial Proteome Alternations During Infections. Front Genet 2021; 12:667936. [PMID: 34276775 PMCID: PMC8283032 DOI: 10.3389/fgene.2021.667936] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/08/2021] [Indexed: 12/13/2022] Open
Abstract
Microbial pathogens have evolved numerous mechanisms to hijack host's systems, thus causing disease. This is mediated by alterations in the combined host-pathogen proteome in time and space. Mass spectrometry-based proteomics approaches have been developed and tailored to map disease progression. The result is complex multidimensional data that pose numerous analytic challenges for downstream interpretation. However, a systematic review of approaches for the downstream analysis of such data has been lacking in the field. In this review, we detail the steps of a typical temporal and spatial analysis, including data pre-processing steps (i.e., quality control, data normalization, the imputation of missing values, and dimensionality reduction), different statistical and machine learning approaches, validation, interpretation, and the extraction of biological information from mass spectrometry data. We also discuss current best practices for these steps based on a collection of independent studies to guide users in selecting the most suitable strategies for their dataset and analysis objectives. Moreover, we also compiled the list of commonly used R software packages for each step of the analysis. These could be easily integrated into one's analysis pipeline. Furthermore, we guide readers through various analysis steps by applying these workflows to mock and host-pathogen interaction data from public datasets. The workflows presented in this review will serve as an introduction for data analysis novices, while also helping established users update their data analysis pipelines. We conclude the review by discussing future directions and developments in temporal and spatial proteomics and data analysis approaches. Data analysis codes, prepared for this review are available from https://github.com/BabuLab-UofR/TempSpac, where guidelines and sample datasets are also offered for testing purposes.
Collapse
Affiliation(s)
| | - Alla Gagarinova
- Department of Biochemistry, Microbiology, & Immunology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, SK, Canada
| |
Collapse
|
22
|
Niksirat H, Siino V, Steinbach C, Levander F. High-Resolution Proteomic Profiling Shows Sexual Dimorphism in Zebrafish Heart-Associated Proteins. J Proteome Res 2021; 20:4075-4088. [PMID: 34185526 DOI: 10.1021/acs.jproteome.1c00387] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Understanding the molecular basis of sexual dimorphism in the cardiovascular system may contribute to the improvement of the outcome in biological, pharmacological, and toxicological studies as well as on the development of sex-based drugs and therapeutic approaches. Label-free protein quantification using high-resolution mass spectrometry was applied to detect sex-based proteome differences in the heart of zebrafish Danio rerio. Out of almost 3000 unique identified proteins in the heart, 79 showed significant abundance differences between male and female fish. The functional differences were mapped using enrichment analyses. Our results suggest that a large amount of materials needed for reproduction (e.g., sugars, lipids, proteins, etc.) may impose extra pressure on blood, vessels, and heart on their way toward the ovaries. In the present study, the female's heart shows a clear sexual dimorphism by changing abundance levels of numerous proteins, which could be a way to safely overcome material-induced elevated pressures. These proteins belong to the immune system, oxidative stress response, drug metabolization, detoxification, energy, metabolism, and so on. In conclusion, we showed that sex can induce dimorphism at the molecular level in nonsexual organs such as heart and must be considered as an important factor in cardiovascular research. Data are available via ProteomeXchange with identifier PXD023506.
Collapse
Affiliation(s)
- Hamid Niksirat
- Faculty of Fisheries and Protection of Waters, CENAKVA, University of South Bohemia in České Budějovice, Vodňany, 370 05 České Budějovice, Czech Republic
| | - Valentina Siino
- Department of Immunotechnology, Lund University, Lund 223 87, Sweden
| | - Christoph Steinbach
- Faculty of Fisheries and Protection of Waters, CENAKVA, University of South Bohemia in České Budějovice, Vodňany, 370 05 České Budějovice, Czech Republic
| | - Fredrik Levander
- Department of Immunotechnology, Lund University, Lund 223 87, Sweden.,National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Lund University, Lund 223 87, Sweden
| |
Collapse
|
23
|
A comparative study of evaluating missing value imputation methods in label-free proteomics. Sci Rep 2021; 11:1760. [PMID: 33469060 PMCID: PMC7815892 DOI: 10.1038/s41598-021-81279-4] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 12/31/2020] [Indexed: 12/29/2022] Open
Abstract
The presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. Imputation has been widely utilized to handle MVs, and selection of the proper method is critical for the accuracy and reliability of imputation. Here we present a comparative study that evaluates the performance of seven popular imputation methods with a large-scale benchmark dataset and an immune cell dataset. Simulated MVs were incorporated into the complete part of each dataset with different combinations of MV rates and missing not at random (MNAR) rates. Normalized root mean square error (NRMSE) was applied to evaluate the accuracy of protein abundances and intergroup protein ratios after imputation. Detection of true positives (TPs) and false altered-protein discovery rate (FADR) between groups were also compared using the benchmark dataset. Furthermore, the accuracy of handling real MVs was assessed by comparing enriched pathways and signature genes of cell activation after imputing the immune cell dataset. We observed that the accuracy of imputation is primarily affected by the MNAR rate rather than the MV rate, and downstream analysis can be largely impacted by the selection of imputation methods. A random forest-based imputation method consistently outperformed other popular methods by achieving the lowest NRMSE, high amount of TPs with the average FADR < 5%, and the best detection of relevant pathways and signature genes, highlighting it as the most suitable method for label-free proteomics.
Collapse
|
24
|
Bramer LM, Irvahn J, Piehowski PD, Rodland KD, Webb-Robertson BJM. A Review of Imputation Strategies for Isobaric Labeling-Based Shotgun Proteomics. J Proteome Res 2020; 20:1-13. [PMID: 32929967 DOI: 10.1021/acs.jproteome.0c00123] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The throughput efficiency and increased depth of coverage provided by isobaric-labeled proteomics measurements have led to increased usage of these techniques. However, the structure of missing data is different than unlabeled studies, which prompts the need for this review to compare the efficacy of nine imputation methods on large isobaric-labeled proteomics data sets to guide researchers on the appropriateness of various imputation methods. Imputation methods were evaluated by accuracy, statistical hypothesis test inference, and run time. In general, expectation maximization and random forest imputation methods yielded the best performance, and constant-based methods consistently performed poorly across all data set sizes and percentages of missing values. For data sets with small sample sizes and higher percentages of missing data, results indicate that statistical inference with no imputation may be preferable. On the basis of the findings in this review, there are core imputation methods that perform better for isobaric-labeled proteomics data, but great care and consideration as to whether imputation is the optimal strategy should be given for data sets comprised of a small number of samples.
Collapse
Affiliation(s)
- Lisa M Bramer
- Computing & Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Jan Irvahn
- Boeing, Seattle, Washington 98055, United States
| | - Paul D Piehowski
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, Washington 99354, United States
| | - Karin D Rodland
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, Washington 99354, United States
| | - Bobbie-Jo M Webb-Robertson
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, Washington 99354, United States
| |
Collapse
|
25
|
Li Q, Fisher K, Meng W, Fang B, Welsh E, Haura EB, Koomen JM, Eschrich SA, Fridley BL, Chen YA. GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis. Bioinformatics 2020; 36:257-263. [PMID: 31199438 PMCID: PMC6956786 DOI: 10.1093/bioinformatics/btz488] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 05/06/2019] [Accepted: 06/10/2019] [Indexed: 12/16/2022] Open
Abstract
Motivation Missingness in label-free mass spectrometry is inherent to the technology. A computational approach to recover missing values in metabolomics and proteomics datasets is important. Most existing methods are designed under a particular assumption, either missing at random or under the detection limit. If the missing pattern deviates from the assumption, it may lead to biased results. Hence, we investigate the missing patterns in free mass spectrometry data and develop an omnibus approach GMSimpute, to allow effective imputation accommodating different missing patterns. Results Three proteomics datasets and one metabolomics dataset indicate missing values could be a mixture of abundance-dependent and abundance-independent missingness. We assess the performance of GMSimpute using simulated data (with a wide range of 80 missing patterns) and metabolomics data from the Cancer Genome Atlas breast cancer and clear cell renal cell carcinoma studies. Using Pearson correlation and normalized root mean square errors between the true and imputed abundance, we compare its performance to K-nearest neighbors’ type approaches, Random Forest, GSimp, a model-based method implemented in DanteR and minimum values. The results indicate GMSimpute provides higher accuracy in imputation and exhibits stable performance across different missing patterns. In addition, GMSimpute is able to identify the features in downstream differential expression analysis with high accuracy when applied to the Cancer Genome Atlas datasets. Availability and implementation GMSimpute is on CRAN: https://cran.r-project.org/web/packages/GMSimpute/index.html. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qian Li
- Health Informatics Institute, University of South Florida, Tampa, FL, USA
| | - Kate Fisher
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA.,Department of Biostatistics, IDDI Inc., Raleigh, NC, USA
| | - Wenjun Meng
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
| | - Bin Fang
- Proteomics and Metabolomics Core Facility, Moffitt Cancer Center, Tampa, FL, USA
| | - Eric Welsh
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
| | - Eric B Haura
- Department of Thoracic Oncology, Moffitt Cancer Center, Tampa, FL, USA
| | - John M Koomen
- Department of Molecular Oncology, Moffitt Cancer Center, Tampa, FL, USA
| | - Steven A Eschrich
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
| | - Brooke L Fridley
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
| | - Y Ann Chen
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
| |
Collapse
|
26
|
Zhang Y, Ouyang Z, Qian WJ, Smith RD, Wong WH, Davis RW. Meta-analysis of peptides to detect protein significance. STATISTICS AND ITS INTERFACE 2020; 13:465-474. [PMID: 34055134 PMCID: PMC8162183 DOI: 10.4310/sii.2020.v13.n4.a4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Shotgun assays are widely used in biotechnologies to characterize large molecules, which are hard to be measured as a whole directly. For instance, in Liquid Chromatography - Mass Spectrometry (LC-MS) shotgun experiments, proteins in biological samples are digested into peptides, and then peptides are separated and measured. However, in proteomics study, investigators are usually interested in the performance of the whole proteins instead of those peptide fragments. In light of meta-analysis, we propose an adaptive thresholding method to select informative peptides, and combine peptide-level models to protein-level analysis. The meta-analysis procedure and modeling rationale can be adapted to data analysis of other types of shotgun assays.
Collapse
Affiliation(s)
| | - Zhengqing Ouyang
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | - Wei-Jun Qian
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, USA
| | - Richard D. Smith
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - Ronald W. Davis
- Stanford Genome Technology Center, Stanford University, Palo Alto, California 94306, USA
| |
Collapse
|
27
|
Abstract
We consider data-analysis settings where data are missing not at random. In these cases, the two basic modeling approaches are 1) pattern-mixture models, with separate distributions for missing data and observed data, and 2) selection models, with a distribution for the data preobservation and a missing-data mechanism that selects which data are observed. These two modeling approaches lead to distinct factorizations of the joint distribution of the observed-data and missing-data indicators. In this paper, we explore a third approach, apparently originally proposed by J. W. Tukey as a remark in a discussion between Rubin and Hartigan, and reported by Holland in a two-page note, which has been so far neglected. Data analyses typically rely upon assumptions about the missingness mechanisms that lead to observed versus missing data, assumptions that are typically unassessable. We explore an approach where the joint distribution of observed data and missing data are specified in a nonstandard way. In this formulation, which traces back to a representation of the joint distribution of the data and missingness mechanism, apparently first proposed by J. W. Tukey, the modeling assumptions about the distributions are either assessable or are designed to allow relatively easy incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both observed and missing. We develop Tukey’s representation for exponential-family models, propose a computationally tractable approach to inference in this class of models, and offer some general theoretical comments. We then illustrate the utility of this approach with an example in systems biology.
Collapse
|
28
|
The M, Käll L. Focus on the spectra that matter by clustering of quantification data in shotgun proteomics. Nat Commun 2020; 11:3234. [PMID: 32591519 PMCID: PMC7319958 DOI: 10.1038/s41467-020-17037-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 06/08/2020] [Indexed: 02/02/2023] Open
Abstract
In shotgun proteomics, the analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow, thereby preventing valuable information from being discarded in the identification stage. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This reduces search time due to the data reduction. We can now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Quandenser+Triqler outperforms the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins for all tested datasets. Software is available for all major operating systems at https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.
Collapse
Affiliation(s)
- Matthew The
- Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, 17121, Solna, Sweden
| | - Lukas Käll
- Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, 17121, Solna, Sweden.
| |
Collapse
|
29
|
Liu M, Dongre A. Proper imputation of missing values in proteomics datasets for differential expression analysis. Brief Bioinform 2020; 22:5855395. [PMID: 32520347 DOI: 10.1093/bib/bbaa112] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/16/2020] [Accepted: 05/11/2020] [Indexed: 01/01/2023] Open
Abstract
Label-free shotgun proteomics is an important tool in biomedical research, where tandem mass spectrometry with data-dependent acquisition (DDA) is frequently used for protein identification and quantification. However, the DDA datasets contain a significant number of missing values (MVs) that severely hinders proper analysis. Existing literature suggests that different imputation methods should be used for the two types of MVs: missing completely at random or missing not at random. However, the simulated or biased datasets utilized by most of such studies offer few clues about the composition and thus proper imputation of MVs in real-life proteomic datasets. Moreover, the impact of imputation methods on downstream differential expression analysis-a critical goal for many biomedical projects-is largely undetermined. In this study, we investigated public DDA datasets of various tissue/sample types to determine the composition of MVs in them. We then developed simulated datasets that imitate the MV profile of real-life datasets. Using such datasets, we compared the impact of various popular imputation methods on the analysis of differentially expressed proteins. Finally, we make recommendations on which imputation method(s) to use for proteomic data beyond just DDA datasets.
Collapse
|
30
|
Mallikarjun V, Richardson SM, Swift J. BayesENproteomics: Bayesian Elastic Nets for Quantification of Peptidoforms in Complex Samples. J Proteome Res 2020; 19:2167-2184. [PMID: 32319298 DOI: 10.1021/acs.jproteome.9b00468] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Multivariate regression modelling provides a statistically powerful means of quantifying the effects of a given treatment while compensating for sources of variation and noise, such as variability between human donors and the behavior of different peptides during mass spectrometry. However, methods to quantify endogenous post-translational modifications (PTMs) are typically reliant on summary statistical methods that fail to consider sources of variability such as changes in the levels of the parent protein. Here, we compare three multivariate regression methods, including a novel Bayesian elastic net algorithm (BayesENproteomics) that enables assessment of relative protein abundances while also quantifying identified PTMs for each protein. We tested the ability of these methods to accurately quantify expression of proteins in a mixed-species benchmark experiment and to quantify synthetic PTMs induced by stable isotope labelling. Finally, we extended our regression pipeline to calculate fold changes at the pathway level, providing a complement to commonly used enrichment analysis. Our results show that BayesENproteomics can quantify changes to protein levels across a broad dynamic range while also accurately quantifying PTM and pathway-level fold changes.
Collapse
Affiliation(s)
- Venkatesh Mallikarjun
- Wellcome Centre for Cell-Matrix Research, University of Manchester, Oxford Road, Manchester M13 9PT, U.K.,Division of Cell Matrix Biology and Regenerative Medicine, School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Oxford Road, Manchester M13 9PL, U.K
| | - Stephen M Richardson
- Division of Cell Matrix Biology and Regenerative Medicine, School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Oxford Road, Manchester M13 9PL, U.K
| | - Joe Swift
- Wellcome Centre for Cell-Matrix Research, University of Manchester, Oxford Road, Manchester M13 9PT, U.K.,Division of Cell Matrix Biology and Regenerative Medicine, School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Oxford Road, Manchester M13 9PL, U.K
| |
Collapse
|
31
|
Lindberg T, de Ávila RI, Zeller KS, Levander F, Eriksson D, Chawade A, Lindstedt M. An integrated transcriptomic- and proteomic-based approach to evaluate the human skin sensitization potential of glyphosate and its commercial agrochemical formulations. J Proteomics 2020; 217:103647. [PMID: 32006680 DOI: 10.1016/j.jprot.2020.103647] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 12/11/2019] [Accepted: 01/08/2020] [Indexed: 02/07/2023]
Abstract
We investigated the skin sensitization hazard of glyphosate, the surfactant polyethylated tallow amine (POEA) and two commercial glyphosate-containing formulations using different omics-technologies based on a human dendritic cell (DC)-like cell line. First, the GARD™skin assay, investigating changes in the expression of 200 transcripts upon cell exposure to xenobiotics, was used for skin sensitization prediction. POEA and the formulations were classified as skin sensitizers while glyphosate alone was classified as a non-sensitizer. Interestingly, the mixture of POEA together with glyphosate displayed a similar sensitizing prediction as POEA alone, indicating that glyphosate likely does not increase the sensitizing capacity when associated with POEA. Moreover, mass spectrometry analysis identified differentially regulated protein groups and predicted molecular pathways based on a proteomic approach in response to cell exposures with glyphosate, POEA and the glyphosate-containing formulations. Based on the protein expression data, predicted pathways were linked to immunologically relevant events and regulated proteins further to cholesterol biosynthesis and homeostasis as well as to autophagy, identifying novel aspects of DC responses after exposure to xenobiotics. In summary, we here present an integrative analysis involving advanced technologies to elucidate the molecular mechanisms behind DC activation in the skin sensitization process triggered by the investigated agrochemical materials. SIGNIFICANCE: The use of glyphosate has increased worldwide, and much effort has been made to improve risk assessments and to further elucidate the mechanisms behind any potential human health hazard of this chemical and its agrochemical formulations. In this context, omics-based techniques can provide a multiparametric approach, including several biomarkers, to expand the mechanistic knowledge of xenobiotics-induced toxicity. Based on this, we performed the integration of GARD™skin and proteomic data to elucidate the skin sensitization hazard of POEA, glyphosate and its two commercial mixtures, and to investigate cellular responses more in detail on protein level. The proteomic data indicate the regulation of immune response-related pathways and proteins associated with cholesterol biosynthesis and homeostasis as well as to autophagy, identifying novel aspects of DC responses after exposure to xenobiotics. Therefore, our data show the applicability of a multiparametric integrated approach for the mechanism-based hazard evaluation of xenobiotics, eventually complementing decision making in the holistic risk assessment of chemicals regarding their allergenic potential in humans.
Collapse
Affiliation(s)
- Tim Lindberg
- Department of Immunotechnology, Lund University, Medicon Village, Lund, Sweden
| | - Renato Ivan de Ávila
- Department of Immunotechnology, Lund University, Medicon Village, Lund, Sweden; Laboratory of Education and Research in In Vitro Toxicology (Tox In), Faculty of Pharmacy, Universidade Federal de Goiás, Goiânia, GO, Brazil; SenzaGen AB, Medicon Village, Lund, Sweden
| | - Kathrin S Zeller
- Department of Immunotechnology, Lund University, Medicon Village, Lund, Sweden
| | - Fredrik Levander
- Department of Immunotechnology, Lund University, Medicon Village, Lund, Sweden
| | | | - Aakash Chawade
- Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Malin Lindstedt
- Department of Immunotechnology, Lund University, Medicon Village, Lund, Sweden.
| |
Collapse
|
32
|
Klein JA, Zaia J. A Perspective on the Confident Comparison of Glycoprotein Site-Specific Glycosylation in Sample Cohorts. Biochemistry 2019; 59:3089-3097. [PMID: 31833756 DOI: 10.1021/acs.biochem.9b00730] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein glycosylation, resulting from glycosyl transferase reactions under complex control in the secretory pathway, consists of a distribution of related glycoforms at each glycosylation site. Because the biosynthetic substrate concentration and transport rates depend on architecture and other aspects of cellular phenotypes, site-specific glycosylation cannot be predicted accurately from genomic, transcriptomic, or proteomic information. Rather, it is necessary to quantify glycosylation at each protein site and how this changes among a sample cohort to provide information about disease mechanisms. At present, mature mass spectrometry-based methods allow for qualitative assignment of the glycan composition and glycosylation site of singly glycosylated proteolytic peptides. To make such quantitative comparisons, it is necessary to sample the glycosylation distribution with sufficient coverage and accuracy for confident assessment of the glycosylation changes that occur in the biological cohort. In this Perspective, we discuss the unmet needs for mass spectrometry acquisition methods and bioinformatics for the confident comparison of protein site-specific glycosylation among sample cohorts.
Collapse
|
33
|
Shah J, Brock GN, Gaskins J. BayesMetab: treatment of missing values in metabolomic studies using a Bayesian modeling approach. BMC Bioinformatics 2019; 20:673. [PMID: 31861984 PMCID: PMC6923847 DOI: 10.1186/s12859-019-3250-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background With the rise of metabolomics, the development of methods to address analytical challenges in the analysis of metabolomics data is of great importance. Missing values (MVs) are pervasive, yet the treatment of MVs can have a substantial impact on downstream statistical analyses. The MVs problem in metabolomics is quite challenging and can arise because the metabolite is not biologically present in the sample, or is present in the sample but at a concentration below the lower limit of detection (LOD), or is present in the sample but undetected due to technical issues related to sample pre-processing steps. The former is considered missing not at random (MNAR) while the latter is an example of missing at random (MAR). Typically, such MVs are substituted by a minimum value, which may lead to severely biased results in downstream analyses. Results We develop a Bayesian model, called BayesMetab, that systematically accounts for missing values based on a Markov chain Monte Carlo (MCMC) algorithm that incorporates data augmentation by allowing MVs to be due to either truncation below the LOD or other technical reasons unrelated to its abundance. Based on a variety of performance metrics (power for detecting differential abundance, area under the curve, bias and MSE for parameter estimates), our simulation results indicate that BayesMetab outperformed other imputation algorithms when there is a mixture of missingness due to MAR and MNAR. Further, our approach was competitive with other methods tailored specifically to MNAR in situations where missing data were completely MNAR. Applying our approach to an analysis of metabolomics data from a mouse myocardial infarction revealed several statistically significant metabolites not previously identified that were of direct biological relevance to the study. Conclusions Our findings demonstrate that BayesMetab has improved performance in imputing the missing values and performing statistical inference compared to other current methods when missing values are due to a mixture of MNAR and MAR. Analysis of real metabolomics data strongly suggests this mixture is likely to occur in practice, and thus, it is important to consider an imputation model that accounts for a mixture of missing data types.
Collapse
Affiliation(s)
- Jasmit Shah
- Department of Population Health, The Aga Khan University, Nairobi, Kenya
| | - Guy N Brock
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| | - Jeremy Gaskins
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 40202, USA.
| |
Collapse
|
34
|
Kawahara R, Recuero S, Nogueira FCS, Domont GB, Leite KRM, Srougi M, Thaysen-Andersen M, Palmisano G. Tissue Proteome Signatures Associated with Five Grades of Prostate Cancer and Benign Prostatic Hyperplasia. Proteomics 2019; 19:e1900174. [PMID: 31576646 DOI: 10.1002/pmic.201900174] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 08/28/2019] [Indexed: 12/22/2022]
Abstract
The histology-based Gleason score (GS) of prostate cancer (PCa) tissue biopsy is the most accurate predictor of disease aggressiveness and an important measure to guide treatment strategies and patient management. The variability associated with PCa tumor sampling and the subjective determination of the GS are challenges that limit accurate diagnostication and prognostication. Thus, novel molecular signatures are needed to distinguish between indolent and aggressive forms of PCa for better patient management and outcomes. Herein, label-free LC-MS/MS proteomics is used to profile the proteome of 50 PCa tissues spanning five grade groups (n = 10 per group) relative to tissues from individuals with benign prostatic hyperplasia (BPH). Over 2000 proteins are identified albeit at different levels between and within the patient groups, revealing biological processes associated with specific grades. A panel of 11 prostate-derived proteins including IGKV3D-20, RNASET2, TACC2, ANXA7, LMOD1, PRCP, GYG1, NDUFV1, H1FX, APOBEC3C, and CTSZ display the potential to stratify patients from low and high PCa grade groups. Parallel reaction monitoring of the same sample cohort validate the differential expression of LMOD1, GYG1, IGKV3D-20, and RNASET2. The four proteins associated with low and high PCa grades reported here warrant further exploration as candidate biomarkers for PCa aggressiveness.
Collapse
Affiliation(s)
- Rebeca Kawahara
- Instituto de Ciências Biomédicas, Departamento de Parasitologia, Universidade de São Paulo, USP, São Paulo, CEP: 05508-000, Brazil.,Department of Molecular Sciences, Macquarie University, Sydney, NSW, 2109, Australia
| | - Saulo Recuero
- Laboratório de Investigação Médica da Disciplina de Urologia da Faculdade de Medicina da USP, LIM55, São Paulo, CEP: 01246-903, Brazil
| | - Fabio C S Nogueira
- Instituto de Química, Departamento de Bioquímica, Universidade Federal do Rio de Janeiro, Rio de Janeiro, CEP: 21941-909, Brazil
| | - Gilberto B Domont
- Instituto de Química, Departamento de Bioquímica, Universidade Federal do Rio de Janeiro, Rio de Janeiro, CEP: 21941-909, Brazil
| | - Katia R M Leite
- Laboratório de Investigação Médica da Disciplina de Urologia da Faculdade de Medicina da USP, LIM55, São Paulo, CEP: 01246-903, Brazil
| | - Miguel Srougi
- Laboratório de Investigação Médica da Disciplina de Urologia da Faculdade de Medicina da USP, LIM55, São Paulo, CEP: 01246-903, Brazil
| | | | - Giuseppe Palmisano
- Instituto de Ciências Biomédicas, Departamento de Parasitologia, Universidade de São Paulo, USP, São Paulo, CEP: 05508-000, Brazil
| |
Collapse
|
35
|
Misiewicz Z, Iurato S, Kulesskaya N, Salminen L, Rodrigues L, Maccarrone G, Martins J, Czamara D, Laine MA, Sokolowska E, Trontti K, Rewerts C, Novak B, Volk N, Park DI, Jokitalo E, Paulin L, Auvinen P, Voikar V, Chen A, Erhardt A, Turck CW, Hovatta I. Multi-omics analysis identifies mitochondrial pathways associated with anxiety-related behavior. PLoS Genet 2019; 15:e1008358. [PMID: 31557158 PMCID: PMC6762065 DOI: 10.1371/journal.pgen.1008358] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 08/08/2019] [Indexed: 01/10/2023] Open
Abstract
Stressful life events are major environmental risk factors for anxiety disorders, although not all individuals exposed to stress develop clinical anxiety. The molecular mechanisms underlying the influence of environmental effects on anxiety are largely unknown. To identify biological pathways mediating stress-related anxiety and resilience to it, we used the chronic social defeat stress (CSDS) paradigm in male mice of two inbred strains, C57BL/6NCrl (B6) and DBA/2NCrl (D2), that differ in their susceptibility to stress. Using a multi-omics approach, we identified differential mRNA, miRNA and protein expression changes in the bed nucleus of the stria terminalis (BNST) and blood cells after chronic stress. Integrative gene set enrichment analysis revealed enrichment of mitochondrial-related genes in the BNST and blood of stressed mice. To translate these results to human anxiety, we investigated blood gene expression changes associated with exposure-induced panic attacks. Remarkably, we found reduced expression of mitochondrial-related genes in D2 stress-susceptible mice and in exposure-induced panic attacks in humans, but increased expression of these genes in B6 stress-susceptible mice. Moreover, stress-susceptible vs. stress-resilient B6 mice displayed more mitochondrial cross-sections in the post-synaptic compartment after CSDS. Our findings demonstrate mitochondrial-related alterations in gene expression as an evolutionarily conserved response in stress-related behaviors and validate the use of cross-species approaches in investigating the biological mechanisms underlying anxiety disorders.
Collapse
Affiliation(s)
- Zuzanna Misiewicz
- Molecular and Integrative Biosciences Research Program, University of Helsinki, Helsinki, Finland
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
- Department of Psychology and Logopedics, Medicum, University of Helsinki, Helsinki, Finland
| | - Stella Iurato
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Natalia Kulesskaya
- Molecular and Integrative Biosciences Research Program, University of Helsinki, Helsinki, Finland
- Department of Psychology and Logopedics, Medicum, University of Helsinki, Helsinki, Finland
| | - Laura Salminen
- Molecular and Integrative Biosciences Research Program, University of Helsinki, Helsinki, Finland
| | - Luis Rodrigues
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Giuseppina Maccarrone
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Jade Martins
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Darina Czamara
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Mikaela A. Laine
- Molecular and Integrative Biosciences Research Program, University of Helsinki, Helsinki, Finland
- Department of Psychology and Logopedics, Medicum, University of Helsinki, Helsinki, Finland
| | - Ewa Sokolowska
- Molecular and Integrative Biosciences Research Program, University of Helsinki, Helsinki, Finland
| | - Kalevi Trontti
- Molecular and Integrative Biosciences Research Program, University of Helsinki, Helsinki, Finland
- Department of Psychology and Logopedics, Medicum, University of Helsinki, Helsinki, Finland
| | - Christiane Rewerts
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Bozidar Novak
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Naama Volk
- Department of Neurobiology, Weizmann Institute of Science, Rehovot, Israel
| | - Dong Ik Park
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Eija Jokitalo
- Electron Microscopy Unit, Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Lars Paulin
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Petri Auvinen
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Vootele Voikar
- Neuroscience Center, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Alon Chen
- Department of Neurobiology, Weizmann Institute of Science, Rehovot, Israel
- Department of Stress Neurobiology and Neurogenetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Angelika Erhardt
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
- * E-mail: (AE); (CWT); (IH)
| | - Christoph W. Turck
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
- * E-mail: (AE); (CWT); (IH)
| | - Iiris Hovatta
- Molecular and Integrative Biosciences Research Program, University of Helsinki, Helsinki, Finland
- Department of Psychology and Logopedics, Medicum, University of Helsinki, Helsinki, Finland
- * E-mail: (AE); (CWT); (IH)
| |
Collapse
|
36
|
Ammar C, Gruber M, Csaba G, Zimmer R. MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins. Mol Cell Proteomics 2019; 18:1880-1892. [PMID: 31235637 PMCID: PMC6731086 DOI: 10.1074/mcp.ra119.001509] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 06/12/2019] [Indexed: 11/06/2022] Open
Abstract
Mass spectrometry based proteomics is the method of choice for quantifying genome-wide differential changes of protein expression in a wide range of biological and biomedical applications. Protein expression changes need to be reliably derived from many measured peptide intensities and their corresponding peptide fold changes. These peptide fold changes vary considerably for a given protein. Numerous instrumental setups aim to reduce this variability, whereas current computational methods only implicitly account for this problem. We introduce a new method, MS-EmpiRe, which explicitly accounts for the noise underlying peptide fold changes. We derive data set-specific, intensity-dependent empirical error fold change distributions, which are used for individual weighing of peptide fold changes to detect differentially expressed proteins (DEPs).In a recently published proteome-wide benchmarking data set, MS-EmpiRe doubles the number of correctly identified DEPs at an estimated FDR cutoff compared with state-of-the-art tools. We additionally confirm the superior performance of MS-EmpiRe on simulated data. MS-EmpiRe requires only peptide intensities mapped to proteins and, thus, can be applied to any common quantitative proteomics setup. We apply our method to diverse MS data sets and observe consistent increases in sensitivity with more than 1000 additional significant proteins in deep data sets, including a clinical study over multiple patients. At the same time, we observe that even the proteins classified as most insignificant by other methods but significant by MS-EmpiRe show very clear regulation on the peptide intensity level. MS-EmpiRe provides rapid processing (< 2 min for 6 LC-MS/MS runs (3 h gradients)) and is publicly available under github.com/zimmerlab/MS-EmpiRe with a manual including examples.
Collapse
Affiliation(s)
- Constantin Ammar
- ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany; §Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximillians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany
| | - Markus Gruber
- ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany
| | - Gergely Csaba
- ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany
| | - Ralf Zimmer
- ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany; §Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximillians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany.
| |
Collapse
|
37
|
The M, Käll L. Integrated Identification and Quantification Error Probabilities for Shotgun Proteomics. Mol Cell Proteomics 2019; 18:561-570. [PMID: 30482846 PMCID: PMC6398204 DOI: 10.1074/mcp.ra118.001018] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 11/05/2018] [Indexed: 02/02/2023] Open
Abstract
Protein quantification by label-free shotgun proteomics experiments is plagued by a multitude of error sources. Typical pipelines for identifying differential proteins use intermediate filters to control the error rate. However, they often ignore certain error sources and, moreover, regard filtered lists as completely correct in subsequent steps. These two indiscretions can easily lead to a loss of control of the false discovery rate (FDR). We propose a probabilistic graphical model, Triqler, that propagates error information through all steps, employing distributions in favor of point estimates, most notably for missing value imputation. The model outputs posterior probabilities for fold changes between treatment groups, highlighting uncertainty rather than hiding it. We analyzed 3 engineered data sets and achieved FDR control and high sensitivity, even for truly absent proteins. In a bladder cancer clinical data set we discovered 35 proteins at 5% FDR, whereas the original study discovered 1 and MaxQuant/Perseus 4 proteins at this threshold. Compellingly, these 35 proteins showed enrichment for functional annotation terms, whereas the top ranked proteins reported by MaxQuant/Perseus showed no enrichment. The model executes in minutes and is freely available at https://pypi.org/project/triqler/.
Collapse
Affiliation(s)
- Matthew The
- From the ‡Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Box 1031, 17121 Solna, Sweden
| | - Lukas Käll
- From the ‡Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
38
|
Lewis LSC, Muldoon PP, Pilaka PP, Ottens AK. Frontal Cortex Proteome Perturbation after Juvenile Rat Secondhand Smoke Exposure. Proteomics 2018; 18:e1800268. [PMID: 30474317 PMCID: PMC6484431 DOI: 10.1002/pmic.201800268] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 11/12/2018] [Indexed: 11/09/2022]
Abstract
Secondhand smoke remains a global concern for children's health. Epidemiological studies implicate exposure to secondhand smoke as a major risk factor for behavioral disorders, yet biological causation remains unclear. Model studies have mainly focused on secondhand smoke impacts to prenatal neurodevelopment, yet juvenile exposure represents a separate risk. Using ion mobility-enhanced data-independent mass spectrometry, the effect of juvenile secondhand smoke exposure on the prefrontal cortex, a principal part of the brain involved in behavioral control, is characterized. The produced dataset includes 800 significantly responsive proteins within the juvenile orbital frontal cortex, with 716 showing an increase in abundance. The neuroproteomic response reflects a prominent perturbation within the glutamatergic synaptic system, suggesting aberrant, disorganized excitation as observed underlying psychiatric disorders. Also disclosed are impacts to GABAergic and dopaminergic systems. Overall, the dataset provides a wealth of detail, facilitating further targeted research into the causal mechanisms underlying behavioral disorders associated with juvenile exposure to secondhand smoke and other environmental pollutants. All MS data have been deposited to the ProteomeXchange consortium with identifier PXD011744.
Collapse
Affiliation(s)
- Liam S C Lewis
- Department of Anatomy and Neurobiology, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - Pretal P Muldoon
- Department of Anatomy and Neurobiology, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - Pallavi P Pilaka
- Department of Anatomy and Neurobiology, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - Andrew K Ottens
- Department of Anatomy and Neurobiology, Virginia Commonwealth University, Richmond, VA, 23298, USA
| |
Collapse
|
39
|
O'Brien JJ, Gunawardena HP, Paulo JA, Chen X, Ibrahim JG, Gygi SP, Qaqish BF. The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann Appl Stat 2018; 12:2075-2095. [PMID: 30473739 PMCID: PMC6249692 DOI: 10.1214/18-aoas1144] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
An idealized version of a label-free discovery mass spectrometry proteomics experiment would provide absolute abundance measurements for a whole proteome, across varying conditions. Unfortunately, this ideal is not realized. Measurements are made on peptides requiring an inferential step to obtain protein level estimates. The inference is complicated by experimental factors that necessitate relative abundance estimation and result in widespread non-ignorable missing data. Relative abundance on the log scale takes the form of parameter contrasts. In a complete-case analysis, contrast estimates may be biased by missing data and a substantial amount of useful information will often go unused. To avoid problems with missing data, many analysts have turned to single imputation solutions. Unfortunately, these methods often create further difficulties by hiding inestimable contrasts, preventing the recovery of interblock information and failing to account for imputation uncertainty. To mitigate many of the problems caused by missing values, we propose the use of a Bayesian selection model. Our model is tested on simulated data, real data with simulated missing values, and on a ground truth dilution experiment where all of the true relative changes are known. The analysis suggests that our model, compared with various imputation strategies and complete-case analyses, can increase accuracy and provide substantial improvements to interval coverage.
Collapse
Affiliation(s)
- Jonathon J O'Brien
- Department of Cell Biology, Harvard Medical School, 240 Longwood Ave, Boston, MA, 02115, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, 3101 McGavran-Greenberg Hall, CB 7420, Chapel Hill, NC 27599, USA; Department of Biochemistry and Biophysics University of North Carolina at Chapel Hill 120 Mason Farm Rd, Campus Box 7260 Chapel Hill, NC 27599 USA
| | - Harsha P Gunawardena
- Department of Cell Biology, Harvard Medical School, 240 Longwood Ave, Boston, MA, 02115, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, 3101 McGavran-Greenberg Hall, CB 7420, Chapel Hill, NC 27599, USA; Department of Biochemistry and Biophysics University of North Carolina at Chapel Hill 120 Mason Farm Rd, Campus Box 7260 Chapel Hill, NC 27599 USA
| | - Joao A Paulo
- Department of Cell Biology, Harvard Medical School, 240 Longwood Ave, Boston, MA, 02115, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, 3101 McGavran-Greenberg Hall, CB 7420, Chapel Hill, NC 27599, USA; Department of Biochemistry and Biophysics University of North Carolina at Chapel Hill 120 Mason Farm Rd, Campus Box 7260 Chapel Hill, NC 27599 USA
| | - Xian Chen
- Department of Cell Biology, Harvard Medical School, 240 Longwood Ave, Boston, MA, 02115, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, 3101 McGavran-Greenberg Hall, CB 7420, Chapel Hill, NC 27599, USA; Department of Biochemistry and Biophysics University of North Carolina at Chapel Hill 120 Mason Farm Rd, Campus Box 7260 Chapel Hill, NC 27599 USA
| | - Joseph G Ibrahim
- Department of Cell Biology, Harvard Medical School, 240 Longwood Ave, Boston, MA, 02115, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, 3101 McGavran-Greenberg Hall, CB 7420, Chapel Hill, NC 27599, USA; Department of Biochemistry and Biophysics University of North Carolina at Chapel Hill 120 Mason Farm Rd, Campus Box 7260 Chapel Hill, NC 27599 USA
| | - Steven P Gygi
- Department of Cell Biology, Harvard Medical School, 240 Longwood Ave, Boston, MA, 02115, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, 3101 McGavran-Greenberg Hall, CB 7420, Chapel Hill, NC 27599, USA; Department of Biochemistry and Biophysics University of North Carolina at Chapel Hill 120 Mason Farm Rd, Campus Box 7260 Chapel Hill, NC 27599 USA
| | - Bahjat F Qaqish
- Department of Cell Biology, Harvard Medical School, 240 Longwood Ave, Boston, MA, 02115, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, 3101 McGavran-Greenberg Hall, CB 7420, Chapel Hill, NC 27599, USA; Department of Biochemistry and Biophysics University of North Carolina at Chapel Hill 120 Mason Farm Rd, Campus Box 7260 Chapel Hill, NC 27599 USA
| |
Collapse
|
40
|
Taylor SL, Ruhaak LR, Kelly K, Weiss RH, Kim K. Effects of imputation on correlation: implications for analysis of mass spectrometry data from multiple biological matrices. Brief Bioinform 2017; 18:312-320. [PMID: 26896791 DOI: 10.1093/bib/bbw010] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Indexed: 11/14/2022] Open
Abstract
With expanded access to, and decreased costs of, mass spectrometry, investigators are collecting and analyzing multiple biological matrices from the same subject such as serum, plasma, tissue and urine to enhance biomarker discoveries, understanding of disease processes and identification of therapeutic targets. Commonly, each biological matrix is analyzed separately, but multivariate methods such as MANOVAs that combine information from multiple biological matrices are potentially more powerful. However, mass spectrometric data typically contain large amounts of missing values, and imputation is often used to create complete data sets for analysis. The effects of imputation on multiple biological matrix analyses have not been studied. We investigated the effects of seven imputation methods (half minimum substitution, mean substitution, k-nearest neighbors, local least squares regression, Bayesian principal components analysis, singular value decomposition and random forest), on the within-subject correlation of compounds between biological matrices and its consequences on MANOVA results. Through analysis of three real omics data sets and simulation studies, we found the amount of missing data and imputation method to substantially change the between-matrix correlation structure. The magnitude of the correlations was generally reduced in imputed data sets, and this effect increased with the amount of missing data. Significant results from MANOVA testing also were substantially affected. In particular, the number of false positives increased with the level of missing data for all imputation methods. No one imputation method was universally the best, but the simple substitution methods (Half Minimum and Mean) consistently performed poorly.
Collapse
Affiliation(s)
- Sandra L Taylor
- Division of Biostatistics, Department of Public Health Sciences, University of California School of Medicine, CA, USA
| | - L Renee Ruhaak
- Department of Chemistry, University of California, CA, USA
| | - Karen Kelly
- Division of Hematology and Oncology, University of California Davis Comprehensive Cancer Center , Sacramento, California, USA
| | - Robert H Weiss
- Division of Nephrology, Department of Internal Medicine, University of California, CA, USA
| | - Kyoungmi Kim
- Division of Biostatistics, Department of Public Health Sciences, University of California , California, USA
| |
Collapse
|
41
|
Burel S, Coyan FC, Lorenzini M, Meyer MR, Lichti CF, Brown JH, Loussouarn G, Charpentier F, Nerbonne JM, Townsend RR, Maier LS, Marionneau C. C-terminal phosphorylation of Na V1.5 impairs FGF13-dependent regulation of channel inactivation. J Biol Chem 2017; 292:17431-17448. [PMID: 28882890 PMCID: PMC5655519 DOI: 10.1074/jbc.m117.787788] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 08/23/2017] [Indexed: 11/06/2022] Open
Abstract
Voltage-gated Na+ (NaV) channels are key regulators of myocardial excitability, and Ca2+/calmodulin-dependent protein kinase II (CaMKII)-dependent alterations in NaV1.5 channel inactivation are emerging as a critical determinant of arrhythmias in heart failure. However, the global native phosphorylation pattern of NaV1.5 subunits associated with these arrhythmogenic disorders and the associated channel regulatory defects remain unknown. Here, we undertook phosphoproteomic analyses to identify and quantify in situ the phosphorylation sites in the NaV1.5 proteins purified from adult WT and failing CaMKIIδc-overexpressing (CaMKIIδc-Tg) mouse ventricles. Of 19 native NaV1.5 phosphorylation sites identified, two C-terminal phosphoserines at positions 1938 and 1989 showed increased phosphorylation in the CaMKIIδc-Tg compared with the WT ventricles. We then tested the hypothesis that phosphorylation at these two sites impairs fibroblast growth factor 13 (FGF13)-dependent regulation of NaV1.5 channel inactivation. Whole-cell voltage-clamp analyses in HEK293 cells demonstrated that FGF13 increases NaV1.5 channel availability and decreases late Na+ current, two effects that were abrogated with NaV1.5 mutants mimicking phosphorylation at both sites. Additional co-immunoprecipitation experiments revealed that FGF13 potentiates the binding of calmodulin to NaV1.5 and that phosphomimetic mutations at both sites decrease the interaction of FGF13 and, consequently, of calmodulin with NaV1.5. Together, we have identified two novel native phosphorylation sites in the C terminus of NaV1.5 that impair FGF13-dependent regulation of channel inactivation and may contribute to CaMKIIδc-dependent arrhythmogenic disorders in failing hearts.
Collapse
Affiliation(s)
- Sophie Burel
- From the l'Institut du Thorax, INSERM, CNRS, UNIV Nantes, Nantes 44007, France
| | - Fabien C Coyan
- From the l'Institut du Thorax, INSERM, CNRS, UNIV Nantes, Nantes 44007, France
| | - Maxime Lorenzini
- From the l'Institut du Thorax, INSERM, CNRS, UNIV Nantes, Nantes 44007, France
| | | | - Cheryl F Lichti
- the Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas 77555
| | - Joan H Brown
- the Department of Pharmacology, University of California at San Diego, La Jolla, California 92093-0636, and
| | - Gildas Loussouarn
- From the l'Institut du Thorax, INSERM, CNRS, UNIV Nantes, Nantes 44007, France
| | - Flavien Charpentier
- From the l'Institut du Thorax, INSERM, CNRS, UNIV Nantes, Nantes 44007, France
| | | | - R Reid Townsend
- Internal Medicine, and
- Cell Biology and Physiology, Washington University Medical School, St. Louis, Missouri 63110
| | - Lars S Maier
- the Department of Internal Medicine II, University Heart Center, University Hospital Regensburg, D-93042 Regensburg, Germany
| | - Céline Marionneau
- From the l'Institut du Thorax, INSERM, CNRS, UNIV Nantes, Nantes 44007, France,
| |
Collapse
|
42
|
Dowsey AW. The need for statistical contributions to bioinformatics at scale, with illustration to mass spectrometry. STAT MODEL 2017. [DOI: 10.1177/1471082x17708519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In their article, Morris and Baladandayuthapani clearly evidence the influence of statisticians in recent methodological advances throughout the bioinformatics pipeline and advocate for the expansion of this role. The latest acquisition platforms, such as next generation sequencing (genomics/transcriptomics) and hyphenated mass spectrometry (proteomics/metabolomics), output raw datasets in the order of gigabytes; it is not unusual to acquire a terabyte or more of data per study. The increasing computational burden this brings is a further impediment against the use of statistically rigorous methodology in the pre-processing stages of the bioinformatics pipeline. In this discussion I describe the mass spectrometry pipeline and use it as an example to show that beneath this challenge lies a two-fold opportunity: (a) Biological complexity and dynamic range is still well beyond what is captured by current processing methodology; hence, potential biomarkers and mechanistic insights are consistently missed; (b) Statistical science could play a larger role in optimizing the acquisition process itself. Data rates will continue to increase as routine clinical omics analysis moves to large-scale facilities with systematic, standardized protocols. Key inferential gains will be achieved by borrowing strength across the sum total of all analyzed studies, a task best underpinned by appropriate statistical modelling.
Collapse
Affiliation(s)
- Andrew W Dowsey
- School of Social & Community Medicine and School of Veterinary Sciences, Faculty of Health Sciences, University of Bristol, United Kingdom
| |
Collapse
|
43
|
Resjö S, Brus M, Ali A, Meijer HJG, Sandin M, Govers F, Levander F, Grenville-Briggs L, Andreasson E. Proteomic Analysis of Phytophthora infestans Reveals the Importance of Cell Wall Proteins in Pathogenicity. Mol Cell Proteomics 2017; 16:1958-1971. [PMID: 28935716 DOI: 10.1074/mcp.m116.065656] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Revised: 09/19/2017] [Indexed: 11/06/2022] Open
Abstract
The oomycete Phytophthora infestans is the most harmful pathogen of potato. It causes the disease late blight, which generates increased yearly costs of up to one billion euro in the EU alone and is difficult to control. We have performed a large-scale quantitative proteomics study of six P. infestans life stages with the aim to identify proteins that change in abundance during development, with a focus on preinfectious life stages. Over 10 000 peptides from 2061 proteins were analyzed. We identified several abundance profiles of proteins that were up- or downregulated in different combinations of life stages. One of these profiles contained 59 proteins that were more abundant in germinated cysts and appressoria. A large majority of these proteins were not previously recognized as being appressorial proteins or involved in the infection process. Among those are proteins with putative roles in transport, amino acid metabolism, pathogenicity (including one RXLR effector) and cell wall structure modification. We analyzed the expression of the genes encoding nine of these proteins using RT-qPCR and found an increase in transcript levels during disease progression, in agreement with the hypothesis that these proteins are important in early infection. Among the nine proteins was a group involved in cell wall structure modification and adhesion, including three closely related, uncharacterized proteins encoded by PITG_01131, PITG_01132, and PITG_16135, here denoted Piacwp1-3 Transient silencing of these genes resulted in reduced severity of infection, indicating that these proteins are important for pathogenicity. Our results contribute to further insight into P. infestans biology, and indicate processes that might be relevant for the pathogen while preparing for host cell penetration and during infection. The mass spectrometry data have been deposited to ProteomeXchange via the PRIDE partner repository with the data set identifier PXD002446.
Collapse
Affiliation(s)
- Svante Resjö
- From the ‡Department of Plant Protection Biology, Swedish University of Agricultural Sciences, PO Box 102, SE-230 53 Alnarp, Sweden;
| | - Maja Brus
- From the ‡Department of Plant Protection Biology, Swedish University of Agricultural Sciences, PO Box 102, SE-230 53 Alnarp, Sweden
| | - Ashfaq Ali
- From the ‡Department of Plant Protection Biology, Swedish University of Agricultural Sciences, PO Box 102, SE-230 53 Alnarp, Sweden
| | - Harold J G Meijer
- §Laboratory of Phytopathology, Wageningen University and Research, The Netherlands
| | | | - Francine Govers
- §Laboratory of Phytopathology, Wageningen University and Research, The Netherlands
| | - Fredrik Levander
- ¶Department of Immunotechnology, Lund University, Sweden.,‖National Bioinformatics Infrastructure Sweden (NBIS), Lund University, Sweden
| | - Laura Grenville-Briggs
- From the ‡Department of Plant Protection Biology, Swedish University of Agricultural Sciences, PO Box 102, SE-230 53 Alnarp, Sweden
| | - Erik Andreasson
- From the ‡Department of Plant Protection Biology, Swedish University of Agricultural Sciences, PO Box 102, SE-230 53 Alnarp, Sweden
| |
Collapse
|
44
|
D’Angelo G, Chaerkady R, Yu W, Hizal DB, Hess S, Zhao W, Lekstrom K, Guo X, White WI, Roskos L, Bowen MA, Yang H. Statistical Models for the Analysis of Isobaric Tags Multiplexed Quantitative Proteomics. J Proteome Res 2017; 16:3124-3136. [DOI: 10.1021/acs.jproteome.6b01050] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Gina D’Angelo
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Raghothama Chaerkady
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Wen Yu
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Deniz Baycin Hizal
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Sonja Hess
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Wei Zhao
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Kristen Lekstrom
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Xiang Guo
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Wendy I. White
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Lorin Roskos
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Michael A. Bowen
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| | - Harry Yang
- Statistical
Sciences, ‡Antibody Discovery and Protein Engineering, Protein Sciences, §Research Bioinformatics, ∥Clinical Biomarkers
and Computational Biology, and ⊥Clinical Pharmacology, Pharmacometrics, and
DMPK, MedImmune, Gaithersburg, Maryland 20878, United States
| |
Collapse
|
45
|
Statistical characterization of therapeutic protein modifications. Sci Rep 2017; 7:7896. [PMID: 28801661 PMCID: PMC5554216 DOI: 10.1038/s41598-017-08333-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 07/07/2017] [Indexed: 12/25/2022] Open
Abstract
Peptide mapping with liquid chromatography–tandem mass spectrometry (LC-MS/MS) is an important analytical method for characterization of post-translational and chemical modifications in therapeutic proteins. Despite its importance, there is currently no consensus on the statistical analysis of the resulting data. In this manuscript, we distinguish three statistical goals for therapeutic protein characterization: (1) estimation of site occupancy of modifications in one condition, (2) detection of differential site occupancy between conditions, and (3) estimation of combined site occupancy across multiple modification sites. We propose an approach, which addresses these goals in terms of summarizing the quantitative information from the mass spectra, statistical modeling, and model-based analysis of LC-MS/MS data. We illustrate the approach using an LC-MS/MS experiment from an antibody-drug conjugate and its monoclonal antibody intermediate. The performance was compared to a ‘naïve’ data analysis approach, by using computer simulation, evaluation of differential site occupancy in positive and negative controls, and comparisons of estimated site occupancy with orthogonal experimental measurements of N-linked glycoforms and total oxidation. The results demonstrated the importance of replicated studies of protein characterization, and of appropriate statistical modeling, for reproducible, accurate and efficient site occupancy estimation and differential analysis.
Collapse
|
46
|
Goeminne LJE, Gevaert K, Clement L. Experimental design and data-analysis in label-free quantitative LC/MS proteomics: A tutorial with MSqRob. J Proteomics 2017; 171:23-36. [PMID: 28391044 DOI: 10.1016/j.jprot.2017.04.004] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 03/29/2017] [Accepted: 04/01/2017] [Indexed: 12/14/2022]
Abstract
Label-free shotgun proteomics is routinely used to assess proteomes. However, extracting relevant information from the massive amounts of generated data remains difficult. This tutorial provides a strong foundation on analysis of quantitative proteomics data. We provide key statistical concepts that help researchers to design proteomics experiments and we showcase how to analyze quantitative proteomics data using our recent free and open-source R package MSqRob, which was developed to implement the peptide-level robust ridge regression method for relative protein quantification described by Goeminne et al. MSqRob can handle virtually any experimental proteomics design and outputs proteins ordered by statistical significance. Moreover, its graphical user interface and interactive diagnostic plots provide easy inspection and also detection of anomalies in the data and flaws in the data analysis, allowing deeper assessment of the validity of results and a critical review of the experimental design. Our tutorial discusses interactive preprocessing, data analysis and visualization of label-free MS-based quantitative proteomics experiments with simple and more complex designs. We provide well-documented scripts to run analyses in bash mode on GitHub, enabling the integration of MSqRob in automated pipelines on cluster environments (https://github.com/statOmics/MSqRob). SIGNIFICANCE The concepts outlined in this tutorial aid in designing better experiments and analyzing the resulting data more appropriately. The two case studies using the MSqRob graphical user interface will contribute to a wider adaptation of advanced peptide-based models, resulting in higher quality data analysis workflows and more reproducible results in the proteomics community. We also provide well-documented scripts for experienced users that aim at automating MSqRob on cluster environments.
Collapse
Affiliation(s)
- Ludger J E Goeminne
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biochemistry, Ghent University, Belgium; Bioinformatics Institute Ghent, Ghent University, Belgium.
| | - Kris Gevaert
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biochemistry, Ghent University, Belgium; Bioinformatics Institute Ghent, Ghent University, Belgium.
| | - Lieven Clement
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium; Bioinformatics Institute Ghent, Ghent University, Belgium.
| |
Collapse
|
47
|
Shah JS, Rai SN, DeFilippis AP, Hill BG, Bhatnagar A, Brock GN. Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies. BMC Bioinformatics 2017; 18:114. [PMID: 28219348 PMCID: PMC5319174 DOI: 10.1186/s12859-017-1547-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Accepted: 02/13/2017] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND High throughput metabolomics makes it possible to measure the relative abundances of numerous metabolites in biological samples, which is useful to many areas of biomedical research. However, missing values (MVs) in metabolomics datasets are common and can arise due to both technical and biological reasons. Typically, such MVs are substituted by a minimum value, which may lead to different results in downstream analyses. RESULTS Here we present a modified version of the K-nearest neighbor (KNN) approach which accounts for truncation at the minimum value, i.e., KNN truncation (KNN-TN). We compare imputation results based on KNN-TN with results from other KNN approaches such as KNN based on correlation (KNN-CR) and KNN based on Euclidean distance (KNN-EU). Our approach assumes that the data follow a truncated normal distribution with the truncation point at the detection limit (LOD). The effectiveness of each approach was analyzed by the root mean square error (RMSE) measure as well as the metabolite list concordance index (MLCI) for influence on downstream statistical testing. Through extensive simulation studies and application to three real data sets, we show that KNN-TN has lower RMSE values compared to the other two KNN procedures as well as simpler imputation methods based on substituting missing values with the metabolite mean, zero values, or the LOD. MLCI values between KNN-TN and KNN-EU were roughly equivalent, and superior to the other four methods in most cases. CONCLUSION Our findings demonstrate that KNN-TN generally has improved performance in imputing the missing values of the different datasets compared to KNN-CR and KNN-EU when there is missingness due to missing at random combined with an LOD. The results shown in this study are in the field of metabolomics but this method could be applicable with any high throughput technology which has missing due to LOD.
Collapse
Affiliation(s)
- Jasmit S Shah
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 40202, USA. .,Department of Medicine, Division of Cardiovascular Medicine, Diabetes and Obesity Center, University of Louisville, Louisville, KY, 40202, USA.
| | - Shesh N Rai
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 40202, USA
| | - Andrew P DeFilippis
- Department of Medicine, Division of Cardiovascular Medicine, Diabetes and Obesity Center, University of Louisville, Louisville, KY, 40202, USA
| | - Bradford G Hill
- Department of Medicine, Division of Cardiovascular Medicine, Diabetes and Obesity Center, University of Louisville, Louisville, KY, 40202, USA
| | - Aruni Bhatnagar
- Department of Medicine, Division of Cardiovascular Medicine, Diabetes and Obesity Center, University of Louisville, Louisville, KY, 40202, USA
| | - Guy N Brock
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, 40202, USA. .,Present Affiliation: Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
48
|
Kakourou A, Vach W, Mertens B. Adapting censored regression methods to adjust for the limit of detection in the calibration of diagnostic rules for clinical mass spectrometry proteomic data. Stat Methods Med Res 2016; 27:2742-2755. [DOI: 10.1177/0962280216685742] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this paper, we consider the problem of calibrating diagnostic rules based on high-resolution mass spectrometry data subject to the limit of detection. The limit of detection is related to the limitation of instruments in measuring low-concentration proteins. As a consequence, peak intensities below the limit of detection are often reported as missing during the quantification step of proteomic analysis. We propose the use of censored data methodology to handle spectral measurements within the presence of limit of detection, recognizing that those have been left-censored for low-abundance proteins. We replace the set of incomplete spectral measurements with estimates of the expected intensity and use those as input to a prediction model. To correct for lack of information and measurement uncertainty, we combine this approach with borrowing of information through the addition of an individual-specific random effect formulation. We present different modalities of using the above formulation for prediction purposes and show how it may also allow for variable selection. We evaluate the proposed methods by comparing their predictive performance with the one achieved using the complete information as well as alternative methods to deal with the limit of detection.
Collapse
Affiliation(s)
- Alexia Kakourou
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands
| | - Werner Vach
- Center for Medical Biometry and Medical Informatics, University of Freiburg, Freiburg, Germany
| | - Bart Mertens
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
49
|
Taylor SL, Ruhaak LR, Weiss RH, Kelly K, Kim K. Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens. Bioinformatics 2016; 33:17-25. [PMID: 27592710 DOI: 10.1093/bioinformatics/btw578] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Revised: 08/30/2016] [Accepted: 08/31/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. RESULTS We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. AVAILABILITY AND IMPLEMENTATION We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sandra L Taylor
- Division of Biostatistics, Department of Public Health Sciences, University of California Davis, CA, 95616, USA
| | - L Renee Ruhaak
- Department of Clinical Chemistry and Laboratory Medicine, Leiden University Medical Center, Leiden, The Netherlands
| | | | - Karen Kelly
- Division of Hematology and Oncology, Department of Internal Medicine School of Medicine, University of California, Davis, CA 95616, USA
| | - Kyoungmi Kim
- Division of Biostatistics, Department of Public Health Sciences, University of California Davis, CA, 95616, USA
| |
Collapse
|
50
|
Shearer JJ, Wold EA, Umbaugh CS, Lichti CF, Nilsson CL, Figueiredo ML. Inorganic Arsenic-Related Changes in the Stromal Tumor Microenvironment in a Prostate Cancer Cell-Conditioned Media Model. ENVIRONMENTAL HEALTH PERSPECTIVES 2016; 124:1009-15. [PMID: 26588813 PMCID: PMC4937864 DOI: 10.1289/ehp.1510090] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 11/12/2015] [Indexed: 05/18/2023]
Abstract
BACKGROUND The tumor microenvironment plays an important role in the progression of cancer by mediating stromal-epithelial paracrine signaling, which can aberrantly modulate cellular proliferation and tumorigenesis. Exposure to environmental toxicants, such as inorganic arsenic (iAs), has also been implicated in the progression of prostate cancer. OBJECTIVE The role of iAs exposure in stromal signaling in the tumor microenvironment has been largely unexplored. Our objective was to elucidate molecular mechanisms of iAs-induced changes to stromal signaling by an enriched prostate tumor microenvironment cell population, adipose-derived mesenchymal stem/stromal cells (ASCs). RESULTS ASC-conditioned media (CM) collected after 1 week of iAs exposure increased prostate cancer cell viability, whereas CM from ASCs that received no iAs exposure decreased cell viability. Cytokine array analysis suggested changes to cytokine signaling associated with iAs exposure. Subsequent proteomic analysis suggested a concentration-dependent alteration to the HMOX1/THBS1/TGFβ signaling pathway by iAs. These results were validated by quantitative reverse transcriptase-polymerase chain reaction (RT-PCR) and Western blotting, confirming a concentration-dependent increase in HMOX1 and a decrease in THBS1 expression in ASC following iAs exposure. Subsequently, we used a TGFβ pathway reporter construct to confirm a decrease in stromal TGFβ signaling in ASC following iAs exposure. CONCLUSIONS Our results suggest a concentration-dependent alteration of stromal signaling: specifically, attenuation of stromal-mediated TGFβ signaling following exposure to iAs. Our results indicate iAs may enhance prostate cancer cell viability through a previously unreported stromal-based mechanism. These findings indicate that the stroma may mediate the effects of iAs in tumor progression, which may have future therapeutic implications. CITATION Shearer JJ, Wold EA, Umbaugh CS, Lichti CF, Nilsson CL, Figueiredo ML. 2016. Inorganic arsenic-related changes in the stromal tumor microenvironment in a prostate cancer cell-conditioned media model. Environ Health Perspect 124:1009-1015; http://dx.doi.org/10.1289/ehp.1510090.
Collapse
Affiliation(s)
- Joseph J. Shearer
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Eric A. Wold
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Charles S. Umbaugh
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Cheryl F. Lichti
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Carol L. Nilsson
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Marxa L. Figueiredo
- Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, Texas, USA
| |
Collapse
|