1
|
Costa MM, Martin H, Estellon B, Dupé FX, Saby F, Benoit N, Tissot-Dupont H, Million M, Pradines B, Granjeaud S, Almeras L. Exploratory Study on Application of MALDI-TOF-MS to Detect SARS-CoV-2 Infection in Human Saliva. J Clin Med 2022; 11:295. [PMID: 35053990 PMCID: PMC8781148 DOI: 10.3390/jcm11020295] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 12/16/2021] [Accepted: 12/31/2021] [Indexed: 12/24/2022] Open
Abstract
SARS-CoV-2 has caused a large outbreak since its emergence in December 2019. COVID-19 diagnosis became a priority so as to isolate and treat infected individuals in order to break the contamination chain. Currently, the reference test for COVID-19 diagnosis is the molecular detection (RT-qPCR) of the virus from nasopharyngeal swab (NPS) samples. Although this sensitive and specific test remains the gold standard, it has several limitations, such as the invasive collection method, the relative high cost and the duration of the test. Moreover, the material shortage to perform tests due to the discrepancy between the high demand for tests and the production capacities puts additional constraints on RT-qPCR. Here, we propose a PCR-free method for diagnosing SARS-CoV-2 based on matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) profiling and machine learning (ML) models from salivary samples. Kinetic saliva samples were collected at enrollment and ten and thirty days later (D0, D10 and D30), to assess the classification performance of the ML models compared to the molecular tests performed on NPS specimens. Spectra were generated using an optimized protocol of saliva collection and successive quality control steps were developed to ensure the reliability of spectra. A total of 360 averaged spectra were included in the study. At D0, the comparison of MS spectra from SARS-CoV-2 positive patients (n = 105) with healthy healthcare controls (n = 51) revealed nine peaks that significantly distinguished the two groups. Among the five ML models tested, support vector machine with linear kernel (SVM-LK) provided the best performance on the training dataset (accuracy = 85.2%, sensitivity = 85.1%, specificity = 85.3%, F1-Score = 85.1%). The application of the SVM-LK model on independent datasets confirmed its performances with 88.9% and 80.8% of correct classification for samples collected at D0 and D30, respectively. Conversely, at D10, the proportion of correct classification had fallen to 64.3%. The analysis of saliva samples by MALDI-TOF MS and ML appears as an interesting supplementary tool for COVID-19 diagnosis, despite the mitigated results obtained for convalescent patients (D10).
Collapse
Affiliation(s)
- Monique Melo Costa
- Unité Parasitologie et Entomologie, Département Microbiologie et Maladies Infectieuses, Institut de Recherche Biomédicale des Armées, 91220 Marseille, France; (M.M.C.); (H.M.); (F.S.); (N.B.); (B.P.)
- Aix-Marseille University, IRD, SSA, AP-HM, VITROME, 13005 Marseille, France
- IHU Méditerranée Infection, 13005 Marseille, France; (H.T.-D.); (M.M.)
| | - Hugo Martin
- Unité Parasitologie et Entomologie, Département Microbiologie et Maladies Infectieuses, Institut de Recherche Biomédicale des Armées, 91220 Marseille, France; (M.M.C.); (H.M.); (F.S.); (N.B.); (B.P.)
- Aix-Marseille University, IRD, SSA, AP-HM, VITROME, 13005 Marseille, France
- IHU Méditerranée Infection, 13005 Marseille, France; (H.T.-D.); (M.M.)
| | - Bertrand Estellon
- Laboratoire d’Informatique et Systèmes, Aix-Marseille University, CNRS, University de Toulon, 13013 Marseille, France; (B.E.); (F.-X.D.)
| | - François-Xavier Dupé
- Laboratoire d’Informatique et Systèmes, Aix-Marseille University, CNRS, University de Toulon, 13013 Marseille, France; (B.E.); (F.-X.D.)
| | - Florian Saby
- Unité Parasitologie et Entomologie, Département Microbiologie et Maladies Infectieuses, Institut de Recherche Biomédicale des Armées, 91220 Marseille, France; (M.M.C.); (H.M.); (F.S.); (N.B.); (B.P.)
- Aix-Marseille University, IRD, SSA, AP-HM, VITROME, 13005 Marseille, France
- IHU Méditerranée Infection, 13005 Marseille, France; (H.T.-D.); (M.M.)
| | - Nicolas Benoit
- Unité Parasitologie et Entomologie, Département Microbiologie et Maladies Infectieuses, Institut de Recherche Biomédicale des Armées, 91220 Marseille, France; (M.M.C.); (H.M.); (F.S.); (N.B.); (B.P.)
- Aix-Marseille University, IRD, SSA, AP-HM, VITROME, 13005 Marseille, France
- IHU Méditerranée Infection, 13005 Marseille, France; (H.T.-D.); (M.M.)
- Centre National de Référence du Paludisme, 13005 Marseille, France
| | - Hervé Tissot-Dupont
- IHU Méditerranée Infection, 13005 Marseille, France; (H.T.-D.); (M.M.)
- Aix-Marseille University, IRD, AP-HM, MEPHI, 13005 Marseille, France
| | - Matthieu Million
- IHU Méditerranée Infection, 13005 Marseille, France; (H.T.-D.); (M.M.)
- Aix-Marseille University, IRD, AP-HM, MEPHI, 13005 Marseille, France
| | - Bruno Pradines
- Unité Parasitologie et Entomologie, Département Microbiologie et Maladies Infectieuses, Institut de Recherche Biomédicale des Armées, 91220 Marseille, France; (M.M.C.); (H.M.); (F.S.); (N.B.); (B.P.)
- Aix-Marseille University, IRD, SSA, AP-HM, VITROME, 13005 Marseille, France
- IHU Méditerranée Infection, 13005 Marseille, France; (H.T.-D.); (M.M.)
- Centre National de Référence du Paludisme, 13005 Marseille, France
| | - Samuel Granjeaud
- CRCM Integrative Bioinformatics Platform, Centre de Recherche en Cancérologie de Marseille, INSERM, U1068, Institut Paoli-Calmettes, CNRS, UMR7258, Aix-Marseille Université UM 105, 13009 Marseille, France;
| | - Lionel Almeras
- Unité Parasitologie et Entomologie, Département Microbiologie et Maladies Infectieuses, Institut de Recherche Biomédicale des Armées, 91220 Marseille, France; (M.M.C.); (H.M.); (F.S.); (N.B.); (B.P.)
- Aix-Marseille University, IRD, SSA, AP-HM, VITROME, 13005 Marseille, France
- IHU Méditerranée Infection, 13005 Marseille, France; (H.T.-D.); (M.M.)
| |
Collapse
|
2
|
Diederen T, Delabrière A, Othman A, Reid ME, Zamboni N. Metabolomics. Metab Eng 2021. [DOI: 10.1002/9783527823468.ch9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
3
|
Surowiec I, Johansson E, Stenlund H, Rantapää-Dahlqvist S, Bergström S, Normark J, Trygg J. Quantification of run order effect on chromatography - mass spectrometry profiling data. J Chromatogr A 2018; 1568:229-234. [PMID: 30007791 DOI: 10.1016/j.chroma.2018.07.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 05/31/2018] [Accepted: 07/04/2018] [Indexed: 12/23/2022]
Abstract
Chromatographic systems coupled with mass spectrometry detection are widely used in biological studies investigating how levels of biomolecules respond to different internal and external stimuli. Such changes are normally expected to be of low magnitude and therefore all experimental factors that can influence the analysis need to be understood and minimized. Run order effect is commonly observed and constitutes a major challenge in chromatography-mass spectrometry based profiling studies that needs to be addressed before the biological evaluation of measured data is made. So far there is no established consensus, metric or method that quickly estimates the size of this effect. In this paper we demonstrate how orthogonal projections to latent structures (OPLS®) can be used for objective quantification of the run order effect in profiling studies. The quantification metric is expressed as the amount of variation in the experimental data that is correlated to the run order. One of the primary advantages with this approach is that it provides a fast way of quantifying run-order effect for all detected features, not only internal standards. Results obtained from quantification of run order effect as provided by the OPLS can be used in the evaluation of data normalization, support the optimization of analytical protocols and identification of compounds highly influenced by instrumental drift. The application of OPLS for quantification of run order is demonstrated on experimental data from plasma profiling performed on three analytical platforms: GCMS metabolomics, LCMS metabolomics and LCMS lipidomics.
Collapse
Affiliation(s)
- Izabella Surowiec
- Computational Life Science Cluster (CLiC), Department of Chemistry, Umeå University, Linnaeus väg 10, 901 87 Umeå, Sweden.
| | - Erik Johansson
- Sartorius Stedim Data Analytics, Tvistevägen 48, 907 36 Umeå, Sweden
| | - Hans Stenlund
- Swedish Metabolomics Centre, Linnaeus väg 6, 901 87 Umeå, Sweden
| | - Solbritt Rantapää-Dahlqvist
- Department of Public Health and Clinical Medicine, Rheumatology, Umeå University Hospital, 901 87 Umeå, Sweden
| | - Sven Bergström
- Department of Molecular Biology, Umeå University, 901 87 Umeå, Sweden
| | - Johan Normark
- Department of Molecular Biology, Umeå University, 901 87 Umeå, Sweden
| | - Johan Trygg
- Computational Life Science Cluster (CLiC), Department of Chemistry, Umeå University, Linnaeus väg 10, 901 87 Umeå, Sweden; Sartorius Stedim Data Analytics, Tvistevägen 48, 907 36 Umeå, Sweden
| |
Collapse
|
4
|
Surowiec I, Johansson E, Torell F, Idborg H, Gunnarsson I, Svenungsson E, Jakobsson PJ, Trygg J. Multivariate strategy for the sample selection and integration of multi-batch data in metabolomics. Metabolomics 2017; 13:114. [PMID: 28890672 PMCID: PMC5570768 DOI: 10.1007/s11306-017-1248-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 08/14/2017] [Indexed: 12/20/2022]
Abstract
INTRODUCTION Availability of large cohorts of samples with related metadata provides scientists with extensive material for studies. At the same time, recent development of modern high-throughput 'omics' technologies, including metabolomics, has resulted in the potential for analysis of large sample sizes. Representative subset selection becomes critical for selection of samples from bigger cohorts and their division into analytical batches. This especially holds true when relative quantification of compound levels is used. OBJECTIVES We present a multivariate strategy for representative sample selection and integration of results from multi-batch experiments in metabolomics. METHODS Multivariate characterization was applied for design of experiment based sample selection and subsequent subdivision into four analytical batches which were analyzed on different days by metabolomics profiling using gas-chromatography time-of-flight mass spectrometry (GC-TOF-MS). For each batch OPLS-DA® was used and its p(corr) vectors were averaged to obtain combined metabolic profile. Jackknifed standard errors were used to calculate confidence intervals for each metabolite in the average p(corr) profile. RESULTS A combined, representative metabolic profile describing differences between systemic lupus erythematosus (SLE) patients and controls was obtained and used for elucidation of metabolic pathways that could be disturbed in SLE. CONCLUSION Design of experiment based representative sample selection ensured diversity and minimized bias that could be introduced at this step. Combined metabolic profile enabled unified analysis and interpretation.
Collapse
Affiliation(s)
- Izabella Surowiec
- Computational Life Science Cluster (CLiC), Department of Chemistry, Umeå University, 901 81 Umeå, Sweden
| | | | - Frida Torell
- Computational Life Science Cluster (CLiC), Department of Chemistry, Umeå University, 901 81 Umeå, Sweden
| | - Helena Idborg
- Rheumatology Unit, Department of Medicine, Solna, Karolinska Institutet, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Iva Gunnarsson
- Rheumatology Unit, Department of Medicine, Solna, Karolinska Institutet, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Elisabet Svenungsson
- Rheumatology Unit, Department of Medicine, Solna, Karolinska Institutet, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Per-Johan Jakobsson
- Rheumatology Unit, Department of Medicine, Solna, Karolinska Institutet, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Johan Trygg
- Computational Life Science Cluster (CLiC), Department of Chemistry, Umeå University, 901 81 Umeå, Sweden
- Sartorius Stedim Data Analytics AB, 907 19 Umeå, Sweden
| |
Collapse
|
5
|
De Livera AM, Sysi-Aho M, Jacob L, Gagnon-Bartsch JA, Castillo S, Simpson JA, Speed TP. Statistical methods for handling unwanted variation in metabolomics data. Anal Chem 2015; 87:3606-15. [PMID: 25692814 DOI: 10.1021/ac502439y] [Citation(s) in RCA: 122] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Metabolomics experiments are inevitably subject to a component of unwanted variation, due to factors such as batch effects, long runs of samples, and confounding biological variation. Although the removal of this unwanted variation is a vital step in the analysis of metabolomics data, it is considered a gray area in which there is a recognized need to develop a better understanding of the procedures and statistical methods required to achieve statistically relevant optimal biological outcomes. In this paper, we discuss the causes of unwanted variation in metabolomics experiments, review commonly used metabolomics approaches for handling this unwanted variation, and present a statistical approach for the removal of unwanted variation to obtain normalized metabolomics data. The advantages and performance of the approach relative to several widely used metabolomics normalization approaches are illustrated through two metabolomics studies, and recommendations are provided for choosing and assessing the most suitable normalization method for a given metabolomics experiment. Software for the approach is made freely available.
Collapse
Affiliation(s)
- Alysha M De Livera
- †Biostatistics Unit, Centre for Epidemiology and Biostatistics, University of Melbourne, Melbourne, VIC 3800, Australia
| | - Marko Sysi-Aho
- ‡Zora Biosciences Oy, FIN-02150 Espoo, Finland.,¶VTT Technical Research Centre of Finland, P. O. Box 1000, FI-02044 VTT Espoo, Finland
| | - Laurent Jacob
- §Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France
| | - Johann A Gagnon-Bartsch
- ∥Department of Statistics, University of California, Berkeley, California United States, 94720
| | - Sandra Castillo
- ¶VTT Technical Research Centre of Finland, P. O. Box 1000, FI-02044 VTT Espoo, Finland
| | - Julie A Simpson
- †Biostatistics Unit, Centre for Epidemiology and Biostatistics, University of Melbourne, Melbourne, VIC 3800, Australia
| | - Terence P Speed
- ∥Department of Statistics, University of California, Berkeley, California United States, 94720.,⊥Bioinformatics Division, Walter and Eliza Hall Institute, 1 G Royal Parade, Parkville, Victoria 3052, Australia.,⧧Department of Mathematics and Statistics, University of Melbourne, VIC 3800, Melbourne, Australia
| |
Collapse
|
6
|
Abstract
Systems biology has gained a tremendous amount of interest in the last few years. This is partly due to the realization that traditional approaches focusing only on a few molecules at a time cannot describe the impact of aberrant or modulated molecular environments across a whole system. Furthermore, a hypothesis-driven study aims to prove or disprove its postulations, whereas a hypothesis-free systems approach can yield an unbiased and novel testable hypothesis as an end-result. This latter approach foregoes assumptions which predict how a biological system should react to an altered microenvironment within a cellular context, across a tissue or impacting on distant organs. Additionally, re-use of existing data by systematic data mining and re-stratification, one of the cornerstones of integrative systems biology, is also gaining attention. While tremendous efforts using a systems methodology have already yielded excellent results, it is apparent that a lack of suitable analytic tools and purpose-built databases poses a major bottleneck in applying a systematic workflow. This review addresses the current approaches used in systems analysis and obstacles often encountered in large-scale data analysis and integration which tend to go unnoticed, but have a direct impact on the final outcome of a systems approach. Its wide applicability, ranging from basic research, disease descriptors, pharmacological studies, to personalized medicine, makes this emerging approach well suited to address biological and medical questions where conventional methods are not ideal.
Collapse
Affiliation(s)
- Scott W Robinson
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, BHF Glasgow Cardiovascular Research Centre, 126 University Place, Glasgow G12 8TA, UK
| | - Marco Fernandes
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, BHF Glasgow Cardiovascular Research Centre, 126 University Place, Glasgow G12 8TA, UK
| | - Holger Husi
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, BHF Glasgow Cardiovascular Research Centre, 126 University Place, Glasgow G12 8TA, UK
| |
Collapse
|
7
|
Lee W, Lazar IM. Endogenous Protein “Barcode” for Data Validation and Normalization in Quantitative MS Analysis. Anal Chem 2014; 86:6379-86. [DOI: 10.1021/ac500855q] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Wooram Lee
- Department of Biological
Sciences, Virginia Polytechnic Institute and State University, 1981 Kraft Drive, Blacksburg, Virginia 24061, United States
| | - Iulia M. Lazar
- Department of Biological
Sciences, Virginia Polytechnic Institute and State University, 1981 Kraft Drive, Blacksburg, Virginia 24061, United States
| |
Collapse
|
8
|
Dupae J, Bohler S, Noben JP, Carpentier S, Vangronsveld J, Cuypers A. Problems inherent to a meta-analysis of proteomics data: a case study on the plants' response to Cd in different cultivation conditions. J Proteomics 2014; 108:30-54. [PMID: 24821411 DOI: 10.1016/j.jprot.2014.04.029] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2013] [Revised: 03/07/2014] [Accepted: 04/15/2014] [Indexed: 01/14/2023]
Abstract
UNLABELLED This meta-analysis focuses on plant-proteome responses to cadmium (Cd) stress. Initially, some general topics related to a proteomics meta-analysis are discussed: (1) obstacles encountered during data analysis, (2) a consensus in proteomic research, (3) validation and good reporting practices for protein identification and (4) guidelines for statistical analysis of differentially abundant proteins. In a second part, the Cd responses in leaves and roots obtained from a proteomics meta-analysis are discussed in (1) a time comparison (short versus long term exposure), and (2) a culture comparison (hydroponics versus soil cultivation). Data of the meta-analysis confirmed the existence of an initial alarm phase upon Cd exposure. Whereas no metabolic equilibrium is established in hydroponically exposed plants, an equilibrium seems to be manifested in roots of plants grown in Cd-contaminated soil after long term exposure. In leaves, the carbohydrate metabolism is primarily affected independent of the exposure time and the cultivation method. In addition, a metabolic shift from CO2-fixation towards respiration is manifested, independent of the cultivation system. Finally, some ideas for the improvement of proteomics setups and for comparisons between studies are discussed. BIOLOGICAL SIGNIFICANCE This meta-analysis focuses on the plant responses to Cd stress in leaves and roots at the proteome level. This meta-analysis points out the encountered obstacles when performing a proteomics meta-analysis related to inherent technologies, but also related to experimental setups. Furthermore, the question is addressed whether an extrapolation of results obtained in hydroponic cultivation towards soil-grown plants is possible.
Collapse
Affiliation(s)
- Joke Dupae
- Environmental Biology, Hasselt University, Agoralaan - Gebouw D, 3590 Diepenbeek, Belgium.
| | - Sacha Bohler
- Environmental Biology, Hasselt University, Agoralaan - Gebouw D, 3590 Diepenbeek, Belgium.
| | - Jean-Paul Noben
- Biomedical Institute, Hasselt University, Agoralaan - Gebouw D, 3590 Diepenbeek, Belgium.
| | - Sebastien Carpentier
- Afdeling Plantenbiotechniek, Catholic University Leuven, Willem de Croylaan 42 - bus 2455, 3001 Leuven, Belgium.
| | - Jaco Vangronsveld
- Environmental Biology, Hasselt University, Agoralaan - Gebouw D, 3590 Diepenbeek, Belgium.
| | - Ann Cuypers
- Environmental Biology, Hasselt University, Agoralaan - Gebouw D, 3590 Diepenbeek, Belgium.
| |
Collapse
|
9
|
Abstract
Statistical matters form an integral part of a metabolomics experiment. In this chapter we describe several important aspects in the analysis of metabolomics data such as the removal of unwanted variation and the identification of differentially abundant metabolites, along with a number of other essential statistical considerations.
Collapse
Affiliation(s)
- Alysha M De Livera
- Metabolomics Australia, Bio21 Institute (Molecular Science and Biotechnology Institute), The University of Melbourne, Melbourne, Australia
| | | | | |
Collapse
|
10
|
The age of the "ome": genome, transcriptome and proteome data set collection and analysis. Brain Res Bull 2011; 88:294-301. [PMID: 22142972 DOI: 10.1016/j.brainresbull.2011.11.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2011] [Revised: 06/30/2011] [Accepted: 11/14/2011] [Indexed: 12/14/2022]
Abstract
The current state of human genetic studies is both a marvel and a morass. A marvel in that with the completion of the human genome sequence, projects that used to take years now take months or weeks; however, this creates a wealth of data concomitant to a black hole of meaning. In terms of the well used analogy: the human genome sequence is a library in an ancient language with no Rosetta stone. Researchers have readily exploited the human genome map and thousands of candidate gene studies for a multitude of diseases have been performed. However, many of those studies have found that the variants associated with disease risk are not obvious coding changes. The question now becomes: what do these associations mean? One approach to the downstream mapping of associations is to use additional information to map which variant might truly be causative of risk and what that risk variant is doing. This review will summarize the current state of both data set collection and analysis for the understanding of DNA variants and their downstream effects on transcripts and proteins. This article is part of a Special Issue entitled 'Transcriptome'.
Collapse
|
11
|
Thompson D, Develter W, Cairns DA, Barrett JH, Perkins DA, Stanley AJ, Mooney A, Selby PJ, Banks RE. A pilot study to investigate the potential of mass spectrometry profiling in the discovery of novel serum markers in chronic renal disease. Proteomics Clin Appl 2011; 5:523-31. [DOI: 10.1002/prca.201100009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2011] [Revised: 06/04/2011] [Accepted: 07/14/2011] [Indexed: 02/03/2023]
|
12
|
Tracy MB, Cooke WE, Gatlin CL, Cazares LH, Weaver DM, Semmes OJ, Tracy ER, Manos DM, Malyarenko DI. Improved signal processing and normalization for biomarker protein detection in broad-mass-range TOF mass spectra from clinical samples. Proteomics Clin Appl 2011; 5:440-7. [PMID: 21751409 DOI: 10.1002/prca.201000095] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Revised: 05/01/2011] [Accepted: 05/30/2011] [Indexed: 11/11/2022]
Abstract
PURPOSE To demonstrate robust detection of biomarkers in broad-mass-range TOF-MS data. EXPERIMENTAL DESIGN Spectra were obtained for two serum protein profiling studies: (i) 2-200 kDa for 132 patients, 67 healthy and 65 diagnosed as having adult T-cell leukemia and (ii) 2-100 kDa for 140 patients, 70 pairs, each with matched prostate-specific antigen (PSA) levels and biopsy-confirmed diagnoses of one benign and one prostate cancer. Signal processing was performed on raw spectra and peak data were normalized using four methods. Feature selection was performed using Bayesian Network Analysis and a classifier was tested on withheld data. Identification of candidate biomarkers was pursued. RESULTS Integrated peak intensities were resolved over full spectra. Normalization using local noise values was superior to global methods in reducing peak correlations, reducing replicate variability and improving feature selection stability. For the leukemia data set, potential disease biomarkers were detected and were found to be predictive for withheld data. Preliminary assignments of protein IDs were consistent with published results and LC-MS/MS identification. No prostate-specific-antigen-independent biomarkers were detected in the prostate cancer data set. CONCLUSIONS AND CLINICAL RELEVANCE Signal processing, local signal-to-noise (SNR) normalization and Bayesian Network Analysis feature selection facilitate robust detection and identification of biomarker proteins in broad-mass-range clinical TOF-MS data.
Collapse
Affiliation(s)
- Maureen B Tracy
- William and Mary Research Institute (WMRI), College of William and Mary (CWM), Williamsburg, VA 23187-8795, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Sun CS, Markey MK. Recent advances in computational analysis of mass spectrometry for proteomic profiling. JOURNAL OF MASS SPECTROMETRY : JMS 2011; 46:443-456. [PMID: 21500303 DOI: 10.1002/jms.1909] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The proteome, defined as an organism's proteins and their actions, is a highly complex end-effector of molecular and cellular events. Differing amounts of proteins in a sample can be indicators of an individual's health status; thus, it is valuable to identify key proteins that serve as 'biomarkers' for diseases. Since the proteome cannot be simply inferred from the genome due to pre- and posttranslational modifications, a direct approach toward mapping the proteome must be taken. The difficulty in evaluating a large number of individual proteins has been eased with the development of high-throughput methods based on mass spectrometry (MS) of peptide or protein mixtures, bypassing the time-consuming, laborious process of protein purification. However, proteomic profiling by MS requires extensive computational analysis. This article describes key issues and recent advances in computational analysis of mass spectra for biomarker identification.
Collapse
Affiliation(s)
- Clement S Sun
- Department of Biomedical Engineering, The University of Texas at Austin, Texas 78712, USA
| | | |
Collapse
|
14
|
Roy P, Truntzer C, Maucort-Boulch D, Jouve T, Molinari N. Protein mass spectra data analysis for clinical biomarker discovery: a global review. Brief Bioinform 2010; 12:176-86. [PMID: 20534688 DOI: 10.1093/bib/bbq019] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The identification of new diagnostic or prognostic biomarkers is one of the main aims of clinical cancer research. In recent years there has been a growing interest in using high throughput technologies for the detection of such biomarkers. In particular, mass spectrometry appears as an exciting tool with great potential. However, to extract any benefit from the massive potential of clinical proteomic studies, appropriate methods, improvement and validation are required. To better understand the key statistical points involved with such studies, this review presents the main data analysis steps of protein mass spectra data analysis, from the pre-processing of the data to the identification and validation of biomarkers.
Collapse
Affiliation(s)
- Pascal Roy
- Hospices Civils de Lyon, Service de Biostatistique, Lyon, F-69003, France
| | | | | | | | | |
Collapse
|
15
|
Abstract
Mass spectrometric analysis of the low-molecular-weight (LMW) range of the serum/plasma proteome is revealing the existence of large numbers of previously unknown peptides and protein fragments, predicted to be derived from circulating low-abundance proteins. While genomics and proteomics are the primary discovery research tool, recent innovations in high-throughput proteomics are now standard practice for biomarker and target discovery. Surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry (MS) is the current mainstay for serum or plasma analysis, although other methods are emerging as alternative high-throughput approaches. From a proteomics perspective, the bone cancers, such as myeloma, breast and prostate cancer bony metastases, and osteosarcoma, are likely among the least studied. As recent advances in proteomic technology have thrust the bone cancer field into the era of proteomics, a review of the current status of the proteome as it relates to the skeletal consequences of malignancy seems reasonable.
Collapse
Affiliation(s)
- Stephanie Byrum
- Department of Orthopaedic Surgery, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | | | | | | |
Collapse
|
16
|
Stitt M, Lunn J, Usadel B. Arabidopsis and primary photosynthetic metabolism - more than the icing on the cake. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2010; 61:1067-91. [PMID: 20409279 DOI: 10.1111/j.1365-313x.2010.04142.x] [Citation(s) in RCA: 203] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Historically speaking, Arabidopsis was not the plant of choice for investigating photosynthesis, with physiologists and biochemists favouring other species such as Chlorella, spinach and pea. However, its inherent advantages for forward genetics rapidly led to its adoption for photosynthesis research. In the last ten years, the availability of the Arabidopsis genome sequence - still the gold-standard for plant genomes - and the rapid expansion of genetic and genomic resources have further increased its importance. Research in Arabidopsis has not only provided comprehensive information about the enzymes and other proteins involved in photosynthesis, but has also allowed transcriptional responses, protein levels and compartmentation to be analysed at a global level for the first time. Emerging technical and theoretical advances offer another leap forward in our understanding of post-translational regulation and the control of metabolism. To illustrate the impact of Arabidopsis, we provide a historical review of research in primary photosynthetic metabolism, highlighting the role of Arabidopsis in elucidation of the pathway of photorespiration and the regulation of RubisCO, as well as elucidation of the pathways of starch turnover and studies of the significance of starch for plant growth.
Collapse
Affiliation(s)
- Mark Stitt
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm, Germany.
| | | | | |
Collapse
|
17
|
Borgaonkar SP, Hocker H, Shin H, Markey MK. Comparison of Normalization Methods for the Identification of Biomarkers Using MALDI-TOF and SELDI-TOF Mass Spectra. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010; 14:115-26. [DOI: 10.1089/omi.2009.0082] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
| | - Harrison Hocker
- The University of Texas, Department of Biomedical Engineering, Austin, Texas
| | - Hyunjin Shin
- Harvard School of Public Health, Department of Biostatistics, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Mia K. Markey
- The University of Texas, Department of Biomedical Engineering, Austin, Texas
| |
Collapse
|
18
|
Challenges for biomarker discovery in body fluids using SELDI-TOF-MS. J Biomed Biotechnol 2009; 2010:906082. [PMID: 20029632 PMCID: PMC2793423 DOI: 10.1155/2010/906082] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2009] [Accepted: 09/01/2009] [Indexed: 01/17/2023] Open
Abstract
Protein profiling using SELDI-TOF-MS has gained over the past few years an increasing interest in the field of biomarker discovery. The technology presents great potential if some parameters, such as sample handling, SELDI settings, and data analysis, are strictly controlled. Practical considerations to set up a robust and sensitive strategy for biomarker discovery are presented. This paper also reviews biological fluids generally available including a description of their peculiar properties and the preanalytical challenges inherent to sample collection and storage. Finally, some new insights for biomarker identification and validation challenges are provided.
Collapse
|
19
|
Smith MPW, Banks RE, Wood SL, Lewington AJP, Selby PJ. Application of proteomic analysis to the study of renal diseases. Nat Rev Nephrol 2009; 5:701-12. [DOI: 10.1038/nrneph.2009.183] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
20
|
Penno MAS, Ernst M, Hoffmann P. Optimal preparation methods for automated matrix-assisted laser desorption/ionization time-of-flight mass spectrometry profiling of low molecular weight proteins and peptides. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2009; 23:2656-2662. [PMID: 19630030 DOI: 10.1002/rcm.4167] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Mass spectrometry (MS) profiling of the proteome and peptidome for disease-associated patterns is a new concept in clinical diagnostics. The technique, however, is highly sensitive to external sources of variation leading to potentially unacceptable numbers of false positive and false negative results. Before MS profiling can be confidently implemented in a medical setting, standard experimental methods must be developed that minimize technical variance. Past studies of variance have focused largely on pre-analytical variation (i.e., sample collection, handling, etc.). Here, we examined how factors at the analytical stage including the matrix and solid-phase extraction influence MS profiling. Firstly, a standard peptide/protein sample was measured automatically by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) MS across five consecutive days using two different preparation methods, dried droplet and sample/matrix, of four types of matrix: alpha-cyano-4-hydroxycinnamic acid (HCCA), sinapinic acid (SA), 2,5-dihydroxybenzoic acid (DHB) and 2,5-dihydroxyacetophenone (DHAP). The results indicated that the matrix preparation greatly influenced a number of key parameters of the spectra including repeatability (within-day variability), reproducibility (inter-day variability), resolution, signal strength, background intensity and detectability. Secondly, an investigation into the variance associated with C8 magnetic bead extraction of the standard sample prior to automated MS profiling demonstrated that the process did not adversely affect these same parameters. In fact, the spectra were generally more robust following extraction. Thirdly, the best performing matrix preparations were evaluated using C8 magnetic bead extracted human plasma. We conclude that the DHAP prepared according to the dried-droplet method is the most appropriate matrix to use when performing automated MS profiling.
Collapse
Affiliation(s)
- Megan A S Penno
- Adelaide Proteomics Centre, University of Adelaide, Adelaide, South Australia, Australia.
| | | | | |
Collapse
|
21
|
Ting L, Cowley MJ, Hoon SL, Guilhaus M, Raftery MJ, Cavicchioli R. Normalization and statistical analysis of quantitative proteomics data generated by metabolic labeling. Mol Cell Proteomics 2009; 8:2227-42. [PMID: 19605365 DOI: 10.1074/mcp.m800462-mcp200] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Comparative proteomics is a powerful analytical method for learning about the responses of biological systems to changes in growth parameters. To make confident inferences about biological responses, proteomics approaches must incorporate appropriate statistical measures of quantitative data. In the present work we applied microarray-based normalization and statistical analysis (significance testing) methods to analyze quantitative proteomics data generated from the metabolic labeling of a marine bacterium (Sphingopyxis alaskensis). Quantitative data were generated for 1,172 proteins, representing 1,736 high confidence protein identifications (54% genome coverage). To test approaches for normalization, cells were grown at a single temperature, metabolically labeled with (14)N or (15)N, and combined in different ratios to give an artificially skewed data set. Inspection of ratio versus average (MA) plots determined that a fixed value median normalization was most suitable for the data. To determine an appropriate statistical method for assessing differential abundance, a -fold change approach, Student's t test, unmoderated t test, and empirical Bayes moderated t test were applied to proteomics data from cells grown at two temperatures. Inverse metabolic labeling was used with multiple technical and biological replicates, and proteomics was performed on cells that were combined based on equal optical density of cultures (providing skewed data) or on cell extracts that were combined to give equal amounts of protein (no skew). To account for arbitrarily complex experiment-specific parameters, a linear modeling approach was used to analyze the data using the limma package in R/Bioconductor. A high quality list of statistically significant differentially abundant proteins was obtained by using lowess normalization (after inspection of MA plots) and applying the empirical Bayes moderated t test. The approach also effectively controlled for the number of false discoveries and corrected for the multiple testing problem using the Storey-Tibshirani false discovery rate (Storey, J. D., and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. U.S.A. 100, 9440-9445). The approach we have developed is generally applicable to quantitative proteomics analyses of diverse biological systems.
Collapse
Affiliation(s)
- Lily Ting
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | | | | | | | | | | |
Collapse
|
22
|
Cairns DA, Barrett JH, Billingham LJ, Stanley AJ, Xinarianos G, Field JK, Johnson PJ, Selby PJ, Banks RE. Sample size determination in clinical proteomic profiling experiments using mass spectrometry for class comparison. Proteomics 2009; 9:74-86. [PMID: 19053145 DOI: 10.1002/pmic.200800417] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Mass spectrometric profiling approaches such as MALDI-TOF and SELDI-TOF are increasingly being used in disease marker discovery, particularly in the lower molecular weight proteome. However, little consideration has been given to the issue of sample size in experimental design. The aim of this study was to develop a protocol for the use of sample size calculations in proteomic profiling studies using MS. These sample size calculations can be based on a simple linear mixed model which allows the inclusion of estimates of biological and technical variation inherent in the experiment. The use of a pilot experiment to estimate these components of variance is investigated and is shown to work well when compared with larger studies. Examination of data from a number of studies using different sample types and different chromatographic surfaces shows the need for sample- and preparation-specific sample size calculations.
Collapse
Affiliation(s)
- David A Cairns
- Clinical and Biomedical Proteomics Group, Cancer Research UK Clinical Centre, Leeds Institute of Molecular Medicine, St. James's University Hospital, Leeds, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Integrated multi-level quality control for proteomic profiling studies using mass spectrometry. BMC Bioinformatics 2008; 9:519. [PMID: 19055809 PMCID: PMC2657802 DOI: 10.1186/1471-2105-9-519] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Accepted: 12/04/2008] [Indexed: 01/28/2023] Open
Abstract
Background Proteomic profiling using mass spectrometry (MS) is one of the most promising methods for the analysis of complex biological samples such as urine, serum and tissue for biomarker discovery. Such experiments are often conducted using MALDI-TOF (matrix-assisted laser desorption/ionisation time-of-flight) and SELDI-TOF (surface-enhanced laser desorption/ionisation time-of-flight) MS. Using such profiling methods it is possible to identify changes in protein expression that differentiate disease states and individual proteins or patterns that may be useful as potential biomarkers. However, the incorporation of quality control (QC) processes that allow the identification of low quality spectra reliably and hence allow the removal of such data before further analysis is often overlooked. In this paper we describe rigorous methods for the assessment of quality of spectral data. These procedures are presented in a user-friendly, web-based program. The data obtained post-QC is then examined using variance components analysis to quantify the amount of variance due to some of the factors in the experimental design. Results Using data from a SELDI profiling study of serum from patients with different levels of renal function, we show how the algorithms described in this paper may be used to detect systematic variability within and between sample replicates, pooled samples and SELDI chips and spots. Manual inspection of those spectral data that were identified as being of poor quality confirmed the efficacy of the algorithms. Variance components analysis demonstrated the relatively small amount of technical variance attributable to day of profile generation and experimental array. Conclusion Using the techniques described in this paper it is possible to reliably detect poor quality data within proteomic profiling experiments undertaken by MS. The removal of these spectra at the initial stages of the analysis substantially improves the confidence of putative biomarker identification and allows inter-experimental comparisons to be carried out with greater confidence.
Collapse
|
24
|
The application of SELDI-TOF mass spectrometry to mammalian cell culture. Biotechnol Adv 2008; 27:177-84. [PMID: 19049820 DOI: 10.1016/j.biotechadv.2008.10.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2008] [Revised: 10/28/2008] [Accepted: 10/30/2008] [Indexed: 11/20/2022]
Abstract
Surface Enhanced Laser Desorption/Ionisation Time-of-Fight Mass Spectrometry (SELDI-TOF MS) is a technique by which protein profiles can be rapidly produced from a wide variety of biological samples. By employing chromatographic surfaces combined with the specificity and reproducibility of mass spectrometry it has allowed for profiles from complex biological samples to be analysed. Profiling and biomarker identification have been employed widely throughout the biological sciences. To date, however, the benefits of SELDI-TOF MS have not been realised in the area of mammalian cell culture. The advantages in identifying markers for cell stresses, apoptosis and other culture parameters mean that these tools could help greatly to enhance monitoring and control of bioreaction process and improve the production of therapeutics. Better characterisation of culture systems through proteome analysis will allow for improved productivity and better yields.
Collapse
|