1
|
O'Brien JJ, Raj A, Gaun A, Waite A, Li W, Hendrickson DG, Olsson N, McAllister FE. A data analysis framework for combining multiple batches increases the power of isobaric proteomics experiments. Nat Methods 2024; 21:290-300. [PMID: 38110636 DOI: 10.1038/s41592-023-02120-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 10/31/2023] [Indexed: 12/20/2023]
Abstract
We present a framework for the analysis of multiplexed mass spectrometry proteomics data that reduces estimation error when combining multiple isobaric batches. Variations in the number and quality of observations have long complicated the analysis of isobaric proteomics data. Here we show that the power to detect statistical associations is substantially improved by utilizing models that directly account for known sources of variation in the number and quality of observations that occur across batches.In a multibatch benchmarking experiment, our open-source software (msTrawler) increases the power to detect changes, especially in the range of less than twofold changes, while simultaneously increasing quantitative proteome coverage by utilizing more low-signal observations. Further analyses of previously published multiplexed datasets of 4 and 23 batches highlight both increased power and the ability to navigate complex missing data patterns without relying on unverifiable imputations or discarding reliable measurements.
Collapse
Affiliation(s)
| | - Anil Raj
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | - Adam Waite
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Wenzhou Li
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | - Niclas Olsson
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | |
Collapse
|
2
|
Grégoire S, Vanderaa C, Dit Ruys SP, Kune C, Mazzucchelli G, Vertommen D, Gatto L. Standardized Workflow for Mass-Spectrometry-Based Single-Cell Proteomics Data Processing and Analysis Using the scp Package. Methods Mol Biol 2024; 2817:177-220. [PMID: 38907155 DOI: 10.1007/978-1-0716-3934-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
Mass-spectrometry (MS)-based single-cell proteomics (SCP) explores cellular heterogeneity by focusing on the functional effectors of the cells-proteins. However, extracting meaningful biological information from MS data is far from trivial, especially with single cells. Currently, data analysis workflows are substantially different from one research team to another. Moreover, it is difficult to evaluate pipelines as ground truths are missing. Our team has developed the R/Bioconductor package called scp to provide a standardized framework for SCP data analysis. It relies on the widely used QFeatures and SingleCellExperiment data structures. In addition, we used a design containing cell lines mixed in known proportions to generate controlled variability for data analysis benchmarking. In this chapter, we provide a flexible data analysis protocol for SCP data using the scp package together with comprehensive explanations at each step of the processing. Our main steps are quality control on the feature and cell level, aggregation of the raw data into peptides and proteins, normalization, and batch correction. We validate our workflow using our ground truth data set. We illustrate how to use this modular, standardized framework and highlight some crucial steps.
Collapse
Affiliation(s)
- Samuel Grégoire
- Computational Biology and Bioinformatics Unit, de Duve Institute, UCLouvain, Brussels, Belgium
| | - Christophe Vanderaa
- Computational Biology and Bioinformatics Unit, de Duve Institute, UCLouvain, Brussels, Belgium
| | | | - Christopher Kune
- Laboratory of Mass Spectrometry, MolSys Research Unit, University of Liège, Liège, Belgium
| | - Gabriel Mazzucchelli
- Laboratory of Mass Spectrometry, MolSys Research Unit, University of Liège, Liège, Belgium
- GIGA Proteomics Facility, University of Liège, Liège, Belgium
| | - Didier Vertommen
- Protein Phosphorylation Unit, de Duve Institute, UCLouvain, Brussels, Belgium
| | - Laurent Gatto
- Computational Biology and Bioinformatics Unit, de Duve Institute, UCLouvain, Brussels, Belgium.
| |
Collapse
|
3
|
Abstract
Missing values are a notable challenge when analyzing mass spectrometry-based proteomics data. While the field is still actively debating the best practices, the challenge increased with the emergence of mass spectrometry-based single-cell proteomics and the dramatic increase in missing values. A popular approach to deal with missing values is to perform imputation. Imputation has several drawbacks for which alternatives exist, but currently, imputation is still a practical solution widely adopted in single-cell proteomics data analysis. This perspective discusses the advantages and drawbacks of imputation. We also highlight 5 main challenges linked to missing value management in single-cell proteomics. Future developments should aim to solve these challenges, whether it is through imputation or data modeling. The perspective concludes with recommendations for reporting missing values, for reporting methods that deal with missing values, and for proper encoding of missing values.
Collapse
Affiliation(s)
- Christophe Vanderaa
- Computational Biology and Bioinformatics Unit (CBIO), de Duve Institute, UCLouvain, 1200 Brussels, Belgium
| | - Laurent Gatto
- Computational Biology and Bioinformatics Unit (CBIO), de Duve Institute, UCLouvain, 1200 Brussels, Belgium
| |
Collapse
|
4
|
Boekweg H, Payne SH. Challenges and opportunities for single cell computational proteomics. Mol Cell Proteomics 2023; 22:100518. [PMID: 36828128 PMCID: PMC10060113 DOI: 10.1016/j.mcpro.2023.100518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/15/2023] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
Single-cell proteomics is growing rapidly and has made several technological advancements. As most research has been focused on improving instrumentation and sample preparation methods, very little attention has been given to algorithms responsible for identifying and quantifying proteins. Given the inherent difference between bulk data and single-cell data, it's necessary to realize that current algorithms being employed on single-cell data were designed for bulk data, and have underlying assumptions that may not hold true for single-cell data. In order to develop and optimize algorithms for single-cell data, we need to characterize the differences between single-cell data and bulk data, and assess how current algorithms perform on single-cell data. Here, we present a review of algorithms responsible for identifying and quantifying peptides and proteins. We will give a review of how each type of algorithm works, assumptions it relies on, how it performs on single-cell data, and possible optimizations and solutions that could be used to address the differences in single-cell data.
Collapse
Affiliation(s)
- Hannah Boekweg
- Biology Department, Brigham Young University, Provo, Utah, USA
| | - Samuel H Payne
- Biology Department, Brigham Young University, Provo, Utah, USA.
| |
Collapse
|
5
|
Chion M, Carapito C, Bertrand F. Towards a More Accurate Differential Analysis of Multiple Imputed Proteomics Data with mi4limma. Methods Mol Biol 2023; 2426:131-140. [PMID: 36308688 DOI: 10.1007/978-1-0716-1967-4_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Imputing missing values is a common practice in label-free quantitative proteomics. Imputation replaces a missing value by a user-defined one. However, the imputation itself is not optimally considered downstream of the imputation process. In particular, imputed datasets are considered as if they had always been complete. The uncertainty due to the imputation is not properly taken into account. Hence, the mi4p package provides a more accurate statistical analysis of multiple-imputed datasets. A rigorous multiple imputation methodology is implemented, leading to a less biased estimation of parameters and their variability, thanks to Rubin's rules. The imputation-based peptide's intensities' variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results.
Collapse
Affiliation(s)
- Marie Chion
- CNRS, University of Strasbourg, Strasbourg, France.
| | | | | |
Collapse
|
6
|
O’Brien JJ, Gadzuk-Shea M, Seitzer PM, Rad R, McAllister FE, Schweppe DK. Conditional Fragment Ion Probabilities Improve Database Searching for Nonmonoisotopic Precursors. J Proteome Res 2022; 22:334-342. [PMID: 36414539 PMCID: PMC9903324 DOI: 10.1021/acs.jproteome.2c00247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Stochastic, intensity-based precursor isolation can result in isotopically enriched fragment ions. This problem is exacerbated for large peptides and stable isotope labeling experiments using deuterium or 15N. For stable isotope labeling experiments, incomplete and ubiquitous labeling strategies result in the isolation of peptide ions composed of many distinct structural isomers. Unfortunately, existing proteomics search algorithms do not account for this variability in isotopic incorporation, and thus often yield poor peptide and protein identification rates. We sought to resolve this shortcoming by deriving the expected isotopic distributions of each fragment ion and incorporating them into the theoretical mass spectra used for peptide-spectrum-matching. We adapted the Comet search platform to integrate a modified spectral prediction algorithm we term Conditional fragment Ion Distribution Search (CIDS). Comet-CIDS uses a traditional database searching strategy, but for each candidate peptide we compute the isotopic distribution of each fragment to better match the observed m/z distributions. Evaluating previously generated D2O and 15N labeled data sets, we found that Comet-CIDS identified more confident peptide spectral matches and higher protein sequence coverage compared to traditional theoretical spectra generation, with the magnitude of improvement largely determined by the amount of labeling in the sample.
Collapse
Affiliation(s)
- Jonathon J. O’Brien
- Calico
Laboratories, South
San Francisco, California94080, United States,E-mail:
| | | | - Phillip M. Seitzer
- Calico
Laboratories, South
San Francisco, California94080, United States
| | - Ramin Rad
- Calico
Laboratories, South
San Francisco, California94080, United States
| | | | - Devin K. Schweppe
- University
of Washington, Seattle, Washington98105, United States,E-mail:
| |
Collapse
|
7
|
Cupp-Sutton KA, Fang M, Wu S. Separation methods in single-cell proteomics: RPLC or CE? INTERNATIONAL JOURNAL OF MASS SPECTROMETRY 2022; 481:116920. [PMID: 36211475 PMCID: PMC9542495 DOI: 10.1016/j.ijms.2022.116920] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Cellular heterogeneity is commonly investigated using single-cell genomics and transcriptomics to investigate biological questions such as disease mechanism, therapeutic screening, and genomic and transcriptomic diversity between cellular populations and subpopulations at the cellular level. Single-cell mass spectrometry (MS)-based proteomics enables the high-throughput examination of protein expression at the single-cell level with wide applicability, and with spatial and temporal resolution, applicable to the study of cellular development, disease, effect of treatment, etc. The study of single-cell proteomics has lagged behind genomics and transcriptomics largely because proteins from single-cell samples cannot be amplified as DNA and RNA can using well established techniques such as PCR. Therefore, analytical methods must be robust, reproducible, and sensitive enough to detect the very small amount of protein within a single cell. To this end, nearly every step of the proteomics process has been extensively altered and improved to facilitate the proteomics analysis of single cells including cell counting and sorting, lysis, protein digestion, sample cleanup, separation, MS data acquisition, and data analysis. Here, we have reviewed recent advances in single-cell protein separation using nano reversed phase liquid chromatography (nRPLC) and capillary electrophoresis (CE) to inform application driven selection of separation techniques in the laboratory setting.
Collapse
Affiliation(s)
| | - Mulin Fang
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019
| |
Collapse
|
8
|
MetaProClust-MS1: an MS1 Profiling Approach for Large-Scale Microbiome Screening. mSystems 2022; 7:e0038122. [PMID: 35950762 PMCID: PMC9426440 DOI: 10.1128/msystems.00381-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Metaproteomics is used to explore the functional dynamics of microbial communities. However, acquiring metaproteomic data by tandem mass spectrometry (MS/MS) is time-consuming and resource-intensive, and there is a demand for computational methods that can be used to reduce these resource requirements. We present MetaProClust-MS1, a computational framework for microbiome feature screening developed to prioritize samples for follow-up MS/MS. In this proof-of-concept study, we tested and compared MetaProClust-MS1 results on gut microbiome data, from fecal samples, acquired using short 15-min MS1-only chromatographic gradients and MS1 spectra from longer 60-min gradients to MS/MS-acquired data. We found that MetaProClust-MS1 identified robust gut microbiome responses caused by xenobiotics with significantly correlated cluster topologies of comparable data sets. We also used MetaProClust-MS1 to reanalyze data from both a clinical MS/MS diagnostic study of pediatric patients with inflammatory bowel disease and an experiment evaluating the therapeutic effects of a small molecule on the brain tissue of Alzheimer's disease mouse models. MetaProClust-MS1 clusters could distinguish between inflammatory bowel disease diagnoses (ulcerative colitis and Crohn's disease) using samples from mucosal luminal interface samples and identified hippocampal proteome shifts of Alzheimer's disease mouse models after small-molecule treatment. Therefore, we demonstrate that MetaProClust-MS1 can screen both microbiomes and single-species proteomes using only MS1 profiles, and our results suggest that this approach may be generalizable to any proteomics experiment. MetaProClust-MS1 may be especially useful for large-scale metaproteomic screening for the prioritization of samples for further metaproteomic characterization, using MS/MS, for instance, in addition to being a promising novel approach for clinical diagnostic screening. IMPORTANCE Growing evidence suggests that human gut microbiome composition and function are highly associated with health and disease. As such, high-throughput metaproteomic studies are becoming more common in gut microbiome research. However, using a conventional long liquid chromatography (LC)-MS/MS gradient metaproteomics approach as an initial screen in large-scale microbiome experiments can be slow and expensive. To combat this challenge, we introduce MetaProClust-MS1, a computational framework for microbiome screening using MS1-only profiles. In this proof-of-concept study, we show that MetaProClust-MS1 identifies clusters of gut microbiome treatments using MS1-only profiles similar to those identified using MS/MS. Our approach allows researchers to prioritize samples and treatments of interest for further metaproteomic analyses and may be generally applicable to any proteomic analysis. In particular, this approach may be especially useful for large-scale metaproteomic screening or in clinical settings where rapid diagnostic evidence is required.
Collapse
|
9
|
Plancade S, Berland M, Blein-Nicolas M, Langella O, Bassignani A, Juste C. A combined test for feature selection on sparse metaproteomics data-an alternative to missing value imputation. PeerJ 2022; 10:e13525. [PMID: 35769140 PMCID: PMC9235818 DOI: 10.7717/peerj.13525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 05/11/2022] [Indexed: 01/18/2023] Open
Abstract
One of the difficulties encountered in the statistical analysis of metaproteomics data is the high proportion of missing values, which are usually treated by imputation. Nevertheless, imputation methods are based on restrictive assumptions regarding missingness mechanisms, namely "at random" or "not at random". To circumvent these limitations in the context of feature selection in a multi-class comparison, we propose a univariate selection method that combines a test of association between missingness and classes, and a test for difference of observed intensities between classes. This approach implicitly handles both missingness mechanisms. We performed a quantitative and qualitative comparison of our procedure with imputation-based feature selection methods on two experimental data sets, as well as simulated data with various scenarios regarding the missingness mechanisms and the nature of the difference of expression (differential intensity or differential presence). Whereas we observed similar performances in terms of prediction on the experimental data set, the feature ranking and selection from various imputation-based methods were strongly divergent. We showed that the combined test reaches a compromise by correlating reasonably with other methods, and remains efficient in all simulated scenarios unlike imputation-based feature selection methods.
Collapse
Affiliation(s)
- Sandra Plancade
- UR875 MIAT, Université fédérale de Toulouse, INRAE, Castanet-Tolosan, France
| | - Magali Berland
- Université Paris-Saclay, INRAE, MGP, Jouy en Josas, France
| | - Mélisande Blein-Nicolas
- Université Paris-Saclay, CNRS, INRAE, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette, France,Université Paris-Saclay, CNRS, INRAE, AgroParisTech, PAPPSO, Gif-sur-Yvette, France
| | - Olivier Langella
- Université Paris-Saclay, CNRS, INRAE, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette, France,Université Paris-Saclay, CNRS, INRAE, AgroParisTech, PAPPSO, Gif-sur-Yvette, France
| | - Ariane Bassignani
- Université Paris-Saclay, INRAE, MGP, Jouy en Josas, France,Université Paris-Saclay, CNRS, INRAE, AgroParisTech, PAPPSO, Gif-sur-Yvette, France
| | - Catherine Juste
- Micalis Institute, Université Paris-Saclay, INRAE, AgroParis Tech, Jouy-en-Josas, France
| |
Collapse
|
10
|
Smith TS, Andrejeva A, Christopher J, Crook OM, Elzek M, Lilley KS. Prior Signal Acquisition Software Versions for Orbitrap Underestimate Low Isobaric Mass Tag Intensities, Without Detriment to Differential Abundance Experiments. ACS MEASUREMENT SCIENCE AU 2022; 2:233-240. [PMID: 35726249 PMCID: PMC9204819 DOI: 10.1021/acsmeasuresciau.1c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 01/24/2022] [Accepted: 01/25/2022] [Indexed: 06/15/2023]
Abstract
Tandem mass tags (TMTs) enable simple and accurate quantitative proteomics for multiplexed samples by relative quantification of tag reporter ions. Orbitrap quantification of reporter ions has been associated with a characteristic notch region in intensity distribution, within which few reporter intensities are recorded. This has been resolved in version 3 of the instrument acquisition software Tune. However, 47% of Orbitrap Fusion, Lumos, or Eclipse submissions to PRIDE were generated using prior software versions. To quantify the impact of the notch on existing quantitative proteomics data, we generated a mixed species benchmark and acquired quantitative data using Tune versions 2 and 3. Intensities below the notch are predominantly underestimated with Tune version 2, leading to overestimation of the true differences in intensities between samples. However, when summarizing reporter ion intensities to higher-level features, such as peptides and proteins, few features are significantly affected. Targeted removal of spectra with reporter ion intensities below the notch is not beneficial for differential peptide or protein testing. Overall, we find that the systematic quantification bias associated with the notch is not detrimental for a typical proteomics experiment.
Collapse
Affiliation(s)
- Tom S. Smith
- MRC
Toxicology Unit, University of Cambridge, Cambridge CB2 1QR, U.K.
| | - Anna Andrejeva
- Department
of Biochemistry, University of Cambridge, Cambridge CB2 1QW, U.K.
| | - Josie Christopher
- Department
of Biochemistry, University of Cambridge, Cambridge CB2 1QW, U.K.
| | - Oliver M. Crook
- Department
of Statistics, University of Oxford, Oxford OX1 3LB, U.K.
| | - Mohamed Elzek
- MRC
Toxicology Unit, University of Cambridge, Cambridge CB2 1QR, U.K.
| | - Kathryn S. Lilley
- Department
of Biochemistry, University of Cambridge, Cambridge CB2 1QW, U.K.
| |
Collapse
|
11
|
Crook OM, Chung CW, Deane CM. Challenges and Opportunities for Bayesian Statistics in Proteomics. J Proteome Res 2022; 21:849-864. [PMID: 35258980 PMCID: PMC8982455 DOI: 10.1021/acs.jproteome.1c00859] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Indexed: 12/27/2022]
Abstract
Proteomics is a data-rich science with complex experimental designs and an intricate measurement process. To obtain insights from the large data sets produced, statistical methods, including machine learning, are routinely applied. For a quantity of interest, many of these approaches only produce a point estimate, such as a mean, leaving little room for more nuanced interpretations. By contrast, Bayesian statistics allows quantification of uncertainty through the use of probability distributions. These probability distributions enable scientists to ask complex questions of their proteomics data. Bayesian statistics also offers a modular framework for data analysis by making dependencies between data and parameters explicit. Hence, specifying complex hierarchies of parameter dependencies is straightforward in the Bayesian framework. This allows us to use a statistical methodology which equals, rather than neglects, the sophistication of experimental design and instrumentation present in proteomics. Here, we review Bayesian methods applied to proteomics, demonstrating their potential power, alongside the challenges posed by adopting this new statistical framework. To illustrate our review, we give a walk-through of the development of a Bayesian model for dynamic organic orthogonal phase-separation (OOPS) data.
Collapse
Affiliation(s)
- Oliver M. Crook
- Department
of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Chun-wa Chung
- Structural
and Biophysical Sciences, GlaxoSmithKline
R&D, Stevenage SG1 2NY, United Kingdom
| | - Charlotte M. Deane
- Department
of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| |
Collapse
|
12
|
On the maximal deviation of kernel regression estimators with NMAR response variables. Stat Pap (Berl) 2022. [DOI: 10.1007/s00362-022-01293-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
Ctortecka C, Stejskal K, Krššáková G, Mendjan S, Mechtler K. Quantitative Accuracy and Precision in Multiplexed Single-Cell Proteomics. Anal Chem 2021; 94:2434-2443. [PMID: 34967612 PMCID: PMC8829824 DOI: 10.1021/acs.analchem.1c04174] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
![]()
Single-cell proteomics
workflows have considerably improved in
sensitivity and reproducibility to characterize as-yet unknown biological
phenomena. With the emergence of multiplexed single-cell proteomics,
studies increasingly present single-cell measurements in conjunction
with an abundant congruent carrier to improve the precursor selection
and enhance identifications. While these extreme carrier spikes are
often >100× more abundant than the investigated samples, the
total ion current undoubtably increases but the quantitative accuracy
possibly is affected. We here focus on narrowly titrated carrier spikes
(i.e., <20×) and assess their elimination for a comparable
sensitivity with superior accuracy. We find that subtle changes in
the carrier ratio can severely impact the measurement variability
and describe alternative multiplexing strategies to evaluate data
quality. Lastly, we demonstrate elevated replicate overlap while preserving
acquisition throughput at an improved quantitative accuracy with DIA-TMT
and discuss optimized experimental designs for multiplexed proteomics
of trace samples. This comprehensive benchmarking gives an overview
of currently available techniques and guides the conceptualization
of the optimal single-cell proteomics experiment.
Collapse
Affiliation(s)
- Claudia Ctortecka
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus Vienna Biocenter 1, 1030 Vienna, Austria
| | - Karel Stejskal
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus Vienna Biocenter 1, 1030 Vienna, Austria.,Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria.,The Gregor Mendel Institute of Molecular Plant Biology of the Austrian Academy of Sciences (GMI), Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Gabriela Krššáková
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus Vienna Biocenter 1, 1030 Vienna, Austria.,Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria.,The Gregor Mendel Institute of Molecular Plant Biology of the Austrian Academy of Sciences (GMI), Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Sasha Mendjan
- Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Karl Mechtler
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus Vienna Biocenter 1, 1030 Vienna, Austria.,Institute of Molecular Biotechnology of the Austrian Academy of Sciences (IMBA), Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria.,The Gregor Mendel Institute of Molecular Plant Biology of the Austrian Academy of Sciences (GMI), Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| |
Collapse
|
14
|
Gardner ML, Freitas MA. Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics. Int J Mol Sci 2021; 22:ijms22179650. [PMID: 34502557 PMCID: PMC8431783 DOI: 10.3390/ijms22179650] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 08/28/2021] [Accepted: 08/31/2021] [Indexed: 01/15/2023] Open
Abstract
Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of samples and varying rates of “missing not at random” (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.
Collapse
Affiliation(s)
- Miranda L. Gardner
- Ohio State Biochemistry Program, Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA;
- Cancer Biology and Genetics, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
| | - Michael A. Freitas
- Ohio State Biochemistry Program, Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA;
- Cancer Biology and Genetics, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
- Correspondence: or
| |
Collapse
|
15
|
Kalxdorf M, Müller T, Stegle O, Krijgsveld J. IceR improves proteome coverage and data completeness in global and single-cell proteomics. Nat Commun 2021; 12:4787. [PMID: 34373457 PMCID: PMC8352929 DOI: 10.1038/s41467-021-25077-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Accepted: 07/21/2021] [Indexed: 11/10/2022] Open
Abstract
Label-free proteomics by data-dependent acquisition enables the unbiased quantification of thousands of proteins, however it notoriously suffers from high rates of missing values, thus prohibiting consistent protein quantification across large sample cohorts. To solve this, we here present IceR (Ion current extraction Re-quantification), an efficient and user-friendly quantification workflow that combines high identification rates of data-dependent acquisition with low missing value rates similar to data-independent acquisition. Specifically, IceR uses ion current information for a hybrid peptide identification propagation approach with superior quantification precision, accuracy, reliability and data completeness compared to other quantitative workflows. Applied to plasma and single-cell proteomics data, IceR enhanced the number of reliably quantified proteins, improved discriminability between single-cell populations, and allowed reconstruction of a developmental trajectory. IceR will be useful to improve performance of large scale global as well as low-input proteomics applications, facilitated by its availability as an easy-to-use R-package.
Collapse
Affiliation(s)
- Mathias Kalxdorf
- German Cancer Research Center, Heidelberg, Germany.
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| | - Torsten Müller
- German Cancer Research Center, Heidelberg, Germany
- Heidelberg University, Medical Faculty, Heidelberg, Germany
| | - Oliver Stegle
- German Cancer Research Center, Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jeroen Krijgsveld
- German Cancer Research Center, Heidelberg, Germany.
- Heidelberg University, Medical Faculty, Heidelberg, Germany.
| |
Collapse
|
16
|
|
17
|
Missing Value Monitoring to Address Missing Values in Quantitative Proteomics. Methods Mol Biol 2021. [PMID: 33950505 DOI: 10.1007/978-1-0716-1024-4_27] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Many classes of key functional proteins such as transcription factors or cell cycle proteins are present in the proteome at a very low concentration. These low-abundance proteins are almost entirely invisible to systematic quantitative analysis by classical data dependent proteomics methods (DDA). Moreover, DDA runs in shotgun proteomics experiments are plenty of missing values among the replicates due to the stochastic nature of the acquisition method, thus hampering the robustness of the quantitative analysis. Here, we have overcome these obstacles designing a robust workflow named missing value monitoring (MvM) in order to follow low abundance proteins dynamics.
Collapse
|
18
|
Egert J, Brombacher E, Warscheid B, Kreutz C. DIMA: Data-Driven Selection of an Imputation Algorithm. J Proteome Res 2021; 20:3489-3496. [PMID: 34062065 DOI: 10.1021/acs.jproteome.1c00119] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Imputation is a prominent strategy when dealing with missing values (MVs) in proteomics data analysis pipelines. However, it is difficult to assess the performance of different imputation methods and varies strongly depending on data characteristics. To overcome this issue, we present the concept of a data-driven selection of an imputation algorithm (DIMA). The performance and broad applicability of DIMA are demonstrated on 142 quantitative proteomics data sets from the PRoteomics IDEntifications (PRIDE) database and on simulated data consisting of 5-50% MVs with different proportions of missing not at random and missing completely at random values. DIMA reliably suggests a high-performing imputation algorithm, which is always among the three best algorithms and results in a root mean square error difference (ΔRMSE) ≤ 10% in 80% of the cases. DIMA implementation is available in MATLAB at github.com/kreutz-lab/OmicsData and in R at github.com/kreutz-lab/DIMAR.
Collapse
Affiliation(s)
- Janine Egert
- Institute of Medical Biometry and Statistics (IMBI), Institute of Medicine and Medical Center Freiburg, 79104 Freiburg im Breisgau, Germany.,Centre for Integrative Biological Signalling Studies (CIBSS), Albert-Ludwigs-Universität Freiburg, 79104 Freiburg, Germany
| | - Eva Brombacher
- Institute of Medical Biometry and Statistics (IMBI), Institute of Medicine and Medical Center Freiburg, 79104 Freiburg im Breisgau, Germany.,Centre for Integrative Biological Signalling Studies (CIBSS), Albert-Ludwigs-Universität Freiburg, 79104 Freiburg, Germany.,Spemann Graduate School of Biology and Medicine (SGBM), Albert-Ludwigs-Universität Freiburg, 79104 Freiburg, Germany.,Faculty of Biology, Albert-Ludwigs-Universität Freiburg, 79104 Freiburg im Breisgau, Germany
| | - Bettina Warscheid
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, Albert-Ludwigs-Universität Freiburg, 79104 Freiburg im Breisgau, Germany.,Signalling Research Centres BIOSS and CIBSS, Albert-Ludwigs-Universität Freiburg, 79104 Freiburg im Breisgau, Germany
| | - Clemens Kreutz
- Institute of Medical Biometry and Statistics (IMBI), Institute of Medicine and Medical Center Freiburg, 79104 Freiburg im Breisgau, Germany.,Signalling Research Centres BIOSS and CIBSS, Albert-Ludwigs-Universität Freiburg, 79104 Freiburg im Breisgau, Germany.,Center for Data Analysis and Modeling (FDM), Albert-Ludwigs-Universität Freiburg, 79104 Freiburg im Breisgau, Germany
| |
Collapse
|
19
|
Dabke K, Kreimer S, Jones MR, Parker SJ. A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets. J Proteome Res 2021; 20:3214-3229. [PMID: 33939434 DOI: 10.1021/acs.jproteome.1c00070] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for clinical DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a strategy to assess imputation methods on clinical label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight imputation methods with multiple parameters at different levels of protein quantification: a dilution series data set, a small pilot data set, and a clinical proteomic data set comparing paired tumor and stroma tissue. We found that imputation methods based on local structures within the data, like local least-squares (LLS) and random forest (RF), worked well in our dilution series data set, whereas imputation methods based on global structures within the data, like BPCA, performed well in the other two data sets. We also found that imputation at the most basic protein quantification level-fragment level-improved accuracy and the number of proteins quantified. With this analytical framework, we quickly and cost-effectively evaluated different imputation methods using two smaller complementary data sets to narrow down to the larger proteomic data set's most accurate methods. This acquisition strategy allowed us to provide reproducible evidence of the accuracy of the imputation method, even in the absence of a ground truth. Overall, this study indicates that the most suitable imputation method relies on the overall structure of the data set and provides an example of an analytic framework that may assist in identifying the most appropriate imputation strategies for the differential analysis of proteins.
Collapse
Affiliation(s)
- Kruttika Dabke
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States.,Graduate Program in Biomedical Sciences, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| | - Simion Kreimer
- Advanced Clinical Biosystems Research Institute, Smidt Heart Institute, Departments of Cardiology and Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| | - Michelle R Jones
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Smidt Heart Institute, Departments of Cardiology and Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| |
Collapse
|
20
|
Christopher JA, Stadler C, Martin CE, Morgenstern M, Pan Y, Betsinger CN, Rattray DG, Mahdessian D, Gingras AC, Warscheid B, Lehtiö J, Cristea IM, Foster LJ, Emili A, Lilley KS. Subcellular proteomics. NATURE REVIEWS. METHODS PRIMERS 2021; 1:32. [PMID: 34549195 PMCID: PMC8451152 DOI: 10.1038/s43586-021-00029-y] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/15/2021] [Indexed: 12/11/2022]
Abstract
The eukaryotic cell is compartmentalized into subcellular niches, including membrane-bound and membrane-less organelles. Proteins localize to these niches to fulfil their function, enabling discreet biological processes to occur in synchrony. Dynamic movement of proteins between niches is essential for cellular processes such as signalling, growth, proliferation, motility and programmed cell death, and mutations causing aberrant protein localization are associated with a wide range of diseases. Determining the location of proteins in different cell states and cell types and how proteins relocalize following perturbation is important for understanding their functions, related cellular processes and pathologies associated with their mislocalization. In this Primer, we cover the major spatial proteomics methods for determining the location, distribution and abundance of proteins within subcellular structures. These technologies include fluorescent imaging, protein proximity labelling, organelle purification and cell-wide biochemical fractionation. We describe their workflows, data outputs and applications in exploring different cell biological scenarios, and discuss their main limitations. Finally, we describe emerging technologies and identify areas that require technological innovation to allow better characterization of the spatial proteome.
Collapse
Affiliation(s)
- Josie A. Christopher
- Department of Biochemistry, University of Cambridge, Cambridge, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK
| | - Charlotte Stadler
- Department of Protein Sciences, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Claire E. Martin
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
| | - Marcel Morgenstern
- Institute of Biology II, Biochemistry and Functional Proteomics, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Yanbo Pan
- Department of Oncology and Pathology, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Cora N. Betsinger
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA
| | - David G. Rattray
- Department of Biochemistry & Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Diana Mahdessian
- Department of Protein Sciences, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Anne-Claude Gingras
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Bettina Warscheid
- Institute of Biology II, Biochemistry and Functional Proteomics, Faculty of Biology, University of Freiburg, Freiburg, Germany
- BIOSS and CIBSS Signaling Research Centers, University of Freiburg, Freiburg, Germany
| | - Janne Lehtiö
- Department of Oncology and Pathology, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Ileana M. Cristea
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA
| | - Leonard J. Foster
- Department of Biochemistry & Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Andrew Emili
- Center for Network Systems Biology, Boston University, Boston, MA, USA
| | - Kathryn S. Lilley
- Department of Biochemistry, University of Cambridge, Cambridge, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK
| |
Collapse
|
21
|
Brain organoid formation on decellularized porcine brain ECM hydrogels. PLoS One 2021; 16:e0245685. [PMID: 33507989 PMCID: PMC7842896 DOI: 10.1371/journal.pone.0245685] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 01/05/2021] [Indexed: 12/21/2022] Open
Abstract
Human brain tissue models such as cerebral organoids are essential tools for developmental and biomedical research. Current methods to generate cerebral organoids often utilize Matrigel as an external scaffold to provide structure and biologically relevant signals. Matrigel however is a nonspecific hydrogel of mouse tumor origin and does not represent the complexity of the brain protein environment. In this study, we investigated the application of a decellularized adult porcine brain extracellular matrix (B-ECM) which could be processed into a hydrogel (B-ECM hydrogel) to be used as a scaffold for human embryonic stem cell (hESC)-derived brain organoids. We decellularized pig brains with a novel detergent- and enzyme-based method and analyzed the biomaterial properties, including protein composition and content, DNA content, mechanical characteristics, surface structure, and antigen presence. Then, we compared the growth of human brain organoid models with the B-ECM hydrogel or Matrigel controls in vitro. We found that the native brain source material was successfully decellularized with little remaining DNA content, while Mass Spectrometry (MS) showed the loss of several brain-specific proteins, while mainly different collagen types remained in the B-ECM. Rheological results revealed stable hydrogel formation, starting from B-ECM hydrogel concentrations of 5 mg/mL. hESCs cultured in B-ECM hydrogels showed gene expression and differentiation outcomes similar to those grown in Matrigel. These results indicate that B-ECM hydrogels can be used as an alternative scaffold for human cerebral organoid formation, and may be further optimized for improved organoid growth by further improving protein retention other than collagen after decellularization.
Collapse
|
22
|
Gaun A, Lewis Hardell KN, Olsson N, O'Brien JJ, Gollapudi S, Smith M, McAlister G, Huguet R, Keyser R, Buffenstein R, McAllister FE. Automated 16-Plex Plasma Proteomics with Real-Time Search and Ion Mobility Mass Spectrometry Enables Large-Scale Profiling in Naked Mole-Rats and Mice. J Proteome Res 2021; 20:1280-1295. [PMID: 33499602 DOI: 10.1021/acs.jproteome.0c00681] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Performing large-scale plasma proteome profiling is challenging due to limitations imposed by lengthy preparation and instrument time. We present a fully automated multiplexed proteome profiling platform (AutoMP3) using the Hamilton Vantage liquid handling robot capable of preparing hundreds to thousands of samples. To maximize protein depth in single-shot runs, we combined 16-plex Tandem Mass Tags (TMTpro) with high-field asymmetric waveform ion mobility spectrometry (FAIMS Pro) and real-time search (RTS). We quantified over 40 proteins/min/sample, doubling the previously published rates. We applied AutoMP3 to investigate the naked mole-rat plasma proteome both as a function of the circadian cycle and in response to ultraviolet (UV) treatment. In keeping with the lack of synchronized circadian rhythms in naked mole-rats, we find few circadian patterns in plasma proteins over the course of 48 h. Furthermore, we quantify many disparate changes between mice and naked mole-rats at both 48 h and one week after UV exposure. These species differences in plasma protein temporal responses could contribute to the pronounced cancer resistance observed in naked mole-rats. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE [1] partner repository with the dataset identifier PXD022891.
Collapse
Affiliation(s)
- Aleksandr Gaun
- Calico Life Sciences LLC, South San Francisco, California 94080-7095, United States
| | - Kaitlyn N Lewis Hardell
- Calico Life Sciences LLC, South San Francisco, California 94080-7095, United States.,Cancer Prevention Fellowship Program, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland 20892-7315, United States
| | - Niclas Olsson
- Calico Life Sciences LLC, South San Francisco, California 94080-7095, United States
| | - Jonathon J O'Brien
- Calico Life Sciences LLC, South San Francisco, California 94080-7095, United States
| | - Sudha Gollapudi
- Calico Life Sciences LLC, South San Francisco, California 94080-7095, United States
| | - Megan Smith
- Calico Life Sciences LLC, South San Francisco, California 94080-7095, United States
| | - Graeme McAlister
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Romain Huguet
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Robert Keyser
- Calico Life Sciences LLC, South San Francisco, California 94080-7095, United States
| | - Rochelle Buffenstein
- Calico Life Sciences LLC, South San Francisco, California 94080-7095, United States
| | - Fiona E McAllister
- Calico Life Sciences LLC, South San Francisco, California 94080-7095, United States
| |
Collapse
|
23
|
Wang S, Li W, Hu L, Cheng J, Yang H, Liu Y. NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses. Nucleic Acids Res 2020; 48:e83. [PMID: 32526036 PMCID: PMC7641313 DOI: 10.1093/nar/gkaa498] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 04/20/2020] [Accepted: 06/08/2020] [Indexed: 02/05/2023] Open
Abstract
Mass spectrometry (MS)-based quantitative proteomics experiments frequently generate data with missing values, which may profoundly affect downstream analyses. A wide variety of imputation methods have been established to deal with the missing-value issue. To date, however, there is a scarcity of efficient, systematic, and easy-to-handle tools that are tailored for proteomics community. Herein, we developed a user-friendly and powerful stand-alone software, NAguideR, to enable implementation and evaluation of different missing value methods offered by 23 widely used missing-value imputation algorithms. NAguideR further evaluates data imputation results through classic computational criteria and, unprecedentedly, proteomic empirical criteria, such as quantitative consistency between different charge-states of the same peptide, different peptides belonging to the same proteins, and individual proteins participating protein complexes and functional interactions. We applied NAguideR into three label-free proteomic datasets featuring peptide-level, protein-level, and phosphoproteomic variables respectively, all generated by data independent acquisition mass spectrometry (DIA-MS) with substantial biological replicates. The results indicate that NAguideR is able to discriminate the optimal imputation methods that are facilitating DIA-MS experiments over those sub-optimal and low-performance algorithms. NAguideR further provides downloadable tables and figures supporting flexible data analysis and interpretation. NAguideR is freely available at http://www.omicsolution.org/wukong/NAguideR/ and the source code: https://github.com/wangshisheng/NAguideR/.
Collapse
Affiliation(s)
- Shisheng Wang
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Wenxue Li
- Yale Cancer Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Liqiang Hu
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jingqiu Cheng
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Hao Yang
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Yansheng Liu
- Yale Cancer Biology Institute, Yale University, West Haven, CT 06516, USA.,Department of Pharmacology, Yale University School of Medicine, New Haven, CT 06520, USA
| |
Collapse
|
24
|
Liu M, Dongre A. Proper imputation of missing values in proteomics datasets for differential expression analysis. Brief Bioinform 2020; 22:5855395. [PMID: 32520347 DOI: 10.1093/bib/bbaa112] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/16/2020] [Accepted: 05/11/2020] [Indexed: 01/01/2023] Open
Abstract
Label-free shotgun proteomics is an important tool in biomedical research, where tandem mass spectrometry with data-dependent acquisition (DDA) is frequently used for protein identification and quantification. However, the DDA datasets contain a significant number of missing values (MVs) that severely hinders proper analysis. Existing literature suggests that different imputation methods should be used for the two types of MVs: missing completely at random or missing not at random. However, the simulated or biased datasets utilized by most of such studies offer few clues about the composition and thus proper imputation of MVs in real-life proteomic datasets. Moreover, the impact of imputation methods on downstream differential expression analysis-a critical goal for many biomedical projects-is largely undetermined. In this study, we investigated public DDA datasets of various tissue/sample types to determine the composition of MVs in them. We then developed simulated datasets that imitate the MV profile of real-life datasets. Using such datasets, we compared the impact of various popular imputation methods on the analysis of differentially expressed proteins. Finally, we make recommendations on which imputation method(s) to use for proteomic data beyond just DDA datasets.
Collapse
|
25
|
McKennan C, Ober C, Nicolae D. ESTIMATION AND INFERENCE IN METABOLOMICS WITH NON-RANDOM MISSING DATA AND LATENT FACTORS. Ann Appl Stat 2020; 14:789-808. [PMID: 34221212 PMCID: PMC8248477 DOI: 10.1214/20-aoas1328] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
High throughput metabolomics data are fraught with both non-ignorable missing observations and unobserved factors that influence a metabolite's measured concentration, and it is well known that ignoring either of these complications can compromise estimators. However, current methods to analyze these data can only account for the missing data or unobserved factors, but not both. We therefore developed MetabMiss, a statistically rigorous method to account for both non-random missing data and latent factors in high throughput metabolomics data. Our methodology does not require the practitioner specify a likelihood for the missing data, and makes investigating the relationship between the metabolome and tens, or even hundreds, of phenotypes computationally tractable. We demonstrate the fidelity of Metab-Miss's estimates using both simulated and real metabolomics data, and prove their asymptotic correctness when the sample size and number of metabolites grows to infinity.
Collapse
|
26
|
Sinkeviciute D, Aspberg A, He Y, Bay-Jensen AC, Önnerfjord P. Characterization of the interleukin-17 effect on articular cartilage in a translational model: an explorative study. BMC Rheumatol 2020; 4:30. [PMID: 32426694 PMCID: PMC7216541 DOI: 10.1186/s41927-020-00122-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 03/06/2020] [Indexed: 12/29/2022] Open
Abstract
Background Osteoarthritis (OA) is a progressive, chronic disease characterized by articular cartilage destruction. The pro-inflammatory cytokine IL-17 levels have been reported elevated in serum and synovial fluid of OA patients and correlated with increased cartilage defects and bone remodeling. The aim of this study was to characterize an IL-17-mediated articular cartilage degradation ex-vivo model and to investigate IL-17 effect on cartilage extracellular matrix protein turnover. Methods Full-depth bovine femoral condyle articular cartilage explants were cultured in serum-free medium for three weeks in the absence, or presence of cytokines: IL-17A (100 ng/ml or 25 ng/ml), or 10 ng OSM combined with 20 ng/ml TNFα (O + T). RNA isolation and PCR analysis were performed on tissue lysates to confirm IL-17 receptor expression. GAG and ECM-turnover biomarker release into conditioned media was assessed with dimethyl methylene blue and ELISA assays, respectively. Gelatin zymography was used for matrix metalloproteinase (MMP) 2 and MMP9 activity assessment in conditioned media, and shotgun LC-MS/MS for identification and label-free quantification of proteins and protein fragments in conditioned media. Western blotting was used to validate MS results. Results IL-17RA mRNA was expressed in bovine full-depth articular cartilage and the treatment with IL-17A did not interfere with metabolic activity of the model. IL-17A induced cartilage breakdown; conditioned media GAG levels were 3.6-fold-elevated compared to untreated. IL-17A [100 ng/ml] induced ADAMTS-mediated aggrecan degradation fragment release (14-fold increase compared to untreated) and MMP-mediated type II collagen fragment release (6-fold-change compared to untreated). MS data analysis revealed 16 differentially expressed proteins in IL-17A conditioned media compared to untreated, and CHI3L1 upregulation in conditioned media in response to IL-17 was confirmed by Western blotting. Conclusions We showed that IL-17A has cartilage modulating potential. It induces collagen and aggrecan degradation indicating an upregulation of MMPs. This was confirmed by zymography and mass spectrometry data. We also showed that the expression of other cytokines is induced by IL-17A, which provide further insight to the pathways that are active in response to IL-17A. This exploratory study confirms that IL-17A may play a role in cartilage pathology and that the applied model may be a good tool to further investigate it.
Collapse
Affiliation(s)
- Dovile Sinkeviciute
- 1Nordic Bioscience, Biomarkers & Research, Herlev, Denmark.,2Rheumatology and Molecular Skeletal Biology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | - Anders Aspberg
- 2Rheumatology and Molecular Skeletal Biology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | - Yi He
- 1Nordic Bioscience, Biomarkers & Research, Herlev, Denmark
| | | | - Patrik Önnerfjord
- 2Rheumatology and Molecular Skeletal Biology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| |
Collapse
|
27
|
Lim MY, Paulo JA, Gygi SP. Evaluating False Transfer Rates from the Match-between-Runs Algorithm with a Two-Proteome Model. J Proteome Res 2019; 18:4020-4026. [PMID: 31547658 PMCID: PMC7346880 DOI: 10.1021/acs.jproteome.9b00492] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Stochasticity between independent LC-MS/MS runs is a challenging problem in the field of proteomics, resulting in significant missing values (i.e., abundance measurements) among observed peptides. To address this issue, several approaches have been developed including computational methods such as MaxQuant's match-between-runs (MBR) algorithm. Often dozens of runs are all considered at once by MBR, transferring identifications from any one run to any of the others. To evaluate the error associated with these transfer events, we created a two-sample/two-proteome approach. In this way, samples containing no yeast lysate (n = 20) were assessed for false identification transfers from samples containing yeast (n = 20). While MBR increased the total number of spectral identifications by ∼40%, we also found that 44% of all identified yeast proteins had identifications transferred to at least one sample without yeast. However, of these only 2.7% remained in the final data set after applying the MaxQuant LFQ algorithm. We conclude that false transfers by MBR are plentiful, but few are retained in the final data set.
Collapse
Affiliation(s)
- Matthew Y. Lim
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - João A. Paulo
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Steven P. Gygi
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, United States
| |
Collapse
|