101
|
Abstract
Phenotypic expression of renal diseases encompasses a complex interaction between genetic, environmental, and local tissue factors. The level of complexity requires integrated understanding of perturbations in the network of genes, proteins, and metabolites. Metabolomics attempts to systematically identify and quantitate metabolites from biological samples. The small molecules represent the end result of complexity of biological processes in a given cell, tissue, or organ, and thus form attractive candidates to understand disease phenotypes. Metabolites represent a diverse group of low-molecular-weight structures including lipids, amino acids, peptides, nucleic acids, and organic acids, which makes comprehensive analysis a difficult analytical challenge. The recent rapid development of a variety of analytical platforms based on mass spectrometry and nuclear magnetic resonance have enabled separation, characterization, detection, and quantification of such chemically diverse structures. Continued development of bioinformatics and analytical strategies will accelerate widespread use and integration of metabolomics into systems biology. Here, we will discuss analytical and bioinformatic techniques and highlight recent studies that use metabolomics in understanding pathophysiology of disease processes.
Collapse
Affiliation(s)
- Jeffrey H Wang
- Department of Internal Medicine, Hennepin County Medical Center and University of Minnesota School of Medicine, Minneapolis, MN, USA
| | | | | |
Collapse
|
102
|
Sandin M, Krogh M, Hansson K, Levander F. Generic workflow for quality assessment of quantitative label-free LC-MS analysis. Proteomics 2011; 11:1114-24. [PMID: 21298787 DOI: 10.1002/pmic.201000493] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Revised: 10/11/2010] [Accepted: 11/08/2010] [Indexed: 11/10/2022]
Abstract
As high-resolution instruments are becoming standard in proteomics laboratories, label-free quantification using precursor measurements is becoming a viable option, and is consequently rapidly gaining popularity. Several software solutions have been presented for label-free analysis, but to our knowledge no conclusive studies regarding the sensitivity and reliability of each step of the analysis procedure has been described. Here, we use real complex samples to assess the reliability of label-free quantification using four different software solutions. A generic approach to quality test quantitative label-free LC-MS is introduced. Measures for evaluation are defined for feature detection, alignment and quantification. All steps of the analysis could be considered adequately performed by the utilized software solutions, although differences and possibilities for improvement could be identified. The described method provides an effective testing procedure, which can help the user to quickly pinpoint where in the workflow changes are needed.
Collapse
Affiliation(s)
- Marianne Sandin
- Department of Immunotechnology, Lund University, BMC D13, Lund, Sweden
| | | | | | | |
Collapse
|
103
|
Voss B, Hanselmann M, Renard BY, Lindner MS, Köthe U, Kirchner M, Hamprecht FA. SIMA: simultaneous multiple alignment of LC/MS peak lists. ACTA ACUST UNITED AC 2011; 27:987-93. [PMID: 21296750 DOI: 10.1093/bioinformatics/btr051] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Alignment of multiple liquid chromatography/mass spectrometry (LC/MS) experiments is a necessity today, which arises from the need for biological and technical repeats. Due to limits in sampling frequency and poor reproducibility of retention times, current LC systems suffer from missing observations and non-linear distortions of the retention times across runs. Existing approaches for peak correspondence estimation focus almost exclusively on solving the pairwise alignment problem, yielding straightforward but suboptimal results for multiple alignment problems. RESULTS We propose SIMA, a novel automated procedure for alignment of peak lists from multiple LC/MS runs. SIMA combines hierarchical pairwise correspondence estimation with simultaneous alignment and global retention time correction. It employs a tailored multidimensional kernel function and a procedure based on maximum likelihood estimation to find the retention time distortion function that best fits the observed data. SIMA does not require a dedicated reference spectrum, is robust with regard to outliers, needs only two intuitive parameters and naturally incorporates incomplete correspondence information. In a comparison with seven alternative methods on four different datasets, we show that SIMA yields competitive and superior performance on real-world data. AVAILABILITY A C++ implementation of the SIMA algorithm is available from http://hci.iwr.uni-heidelberg.de/MIP/Software.
Collapse
Affiliation(s)
- Björn Voss
- Interdisciplinary Center for Scientific Computing, University of Heidelberg, Heidelberg, Germany
| | | | | | | | | | | | | |
Collapse
|
104
|
Titulaer MK, de Costa D, Stingl C, Dekker LJ, Sillevis Smitt PAE, Luider TM. Label-free peptide profiling of Orbitrap™ full mass spectra. BMC Res Notes 2011; 4:21. [PMID: 21272362 PMCID: PMC3042405 DOI: 10.1186/1756-0500-4-21] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2010] [Accepted: 01/27/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We developed a new version of the open source software package Peptrix that can yet compare large numbers of Orbitrap™ LC-MS data. The peptide profiling results for Peptrix on MS1 spectra were compared with those obtained from a small selection of open source and commercial software packages: msInspect, Sieve™ and Progenesis™. The properties compared in these packages were speed, total number of detected masses, redundancy of masses, reproducibility in numbers and CV of intensity, overlap of masses, and differences in peptide peak intensities. Reproducibility measurements were taken for the different MS1 software applications by measuring in triplicate a complex peptide mixture of immunoglobulin on the Orbitrap™ mass spectrometer. Values of peptide masses detected from the high intensity peaks of the MS1 spectra by peptide profiling were verified with values of the MS2 fragmented and sequenced masses that resulted in protein identifications with a significant score. FINDINGS Peptrix finds about the same number of peptide features as the other packages, but peptide masses are in some cases approximately 5 to 10 times less redundant present in the peptide profile matrix. The Peptrix profile matrix displays the largest overlap when comparing the number of masses in a pair between two software applications. The overlap of peptide masses between software packages of low intensity peaks in the spectra is remarkably low with about 50% of the detected masses in the individual packages. Peptrix does not differ from the other packages in detecting 96% of the masses that relate to highly abundant sequenced proteins. MS1 peak intensities vary between the applications in a non linear way as they are not processed using the same method. CONCLUSIONS Peptrix is capable of peptide profiling using Orbitrap™ files and finding differential expressed peptides in body fluid and tissue samples. The number of peptide masses detected in Orbitrap™ files can be increased by using more MS1 peptide profiling applications, including Peptrix, since it appears from the comparison of Peptrix with the other applications that all software packages have likely a high false negative rate of low intensity peptide peaks (missing peptides).
Collapse
Affiliation(s)
- Mark K Titulaer
- Laboratory of Neuro-Oncology and Clinical and Cancer Proteomics, Department of Neurology, Erasmus University Medical Center, Dr. Molewaterplein 50, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands
- Academic Medical Center, University of Amsterdam, Meibergdreef 9, P.O. Box 22660, 1100 DD Amsterdam, The Netherlands
| | - Dominique de Costa
- Department of Pulmonology, Erasmus University Medical Center, Dr. Molewaterplein 50, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands
| | - Christoph Stingl
- Laboratory of Neuro-Oncology and Clinical and Cancer Proteomics, Department of Neurology, Erasmus University Medical Center, Dr. Molewaterplein 50, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands
| | - Lennard J Dekker
- Laboratory of Neuro-Oncology and Clinical and Cancer Proteomics, Department of Neurology, Erasmus University Medical Center, Dr. Molewaterplein 50, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands
| | - Peter AE Sillevis Smitt
- Laboratory of Neuro-Oncology and Clinical and Cancer Proteomics, Department of Neurology, Erasmus University Medical Center, Dr. Molewaterplein 50, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands
| | - Theo M Luider
- Laboratory of Neuro-Oncology and Clinical and Cancer Proteomics, Department of Neurology, Erasmus University Medical Center, Dr. Molewaterplein 50, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands
| |
Collapse
|
105
|
Abstract
The broad view of the state of biological systems cannot be complete without the added value of integrating proteomic and genomic data with metabolite measurement. By definition, metabolomics aims at quantifying not less than the totality of small molecules present in a biofluid, tissue, organism, or any material beyond living systems. To cope with the complexity of the task, mass spectrometry (MS) is the most promising analytical environment to fulfill increasing appetite for more accurate and larger view of the metabolome while providing sufficient data generation throughput. Bioinformatics and associated disciplines naturally play a central role in bridging the gap between fast evolving technology and domain experts. Here, we describe the strategies to translate crude MS information into features characteristics of metabolites, and resources available to guide scientists along the metabolomics pipeline. A particular emphasis is put on pragmatic solutions to interpret the outcome of metabolomics experiments at the level of signal processing, statistical treatment, and biochemical understanding.
Collapse
|
106
|
Data processing pipelines for comprehensive profiling of proteomics samples by label-free LC–MS for biomarker discovery. Talanta 2011; 83:1209-24. [DOI: 10.1016/j.talanta.2010.10.029] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2010] [Revised: 10/18/2010] [Accepted: 10/21/2010] [Indexed: 01/30/2023]
|
107
|
Bielow C, Gröpl C, Kohlbacher O, Reinert K. Bioinformatics for qualitative and quantitative proteomics. Methods Mol Biol 2011; 719:331-349. [PMID: 21370091 DOI: 10.1007/978-1-61779-027-0_15] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Mass spectrometry is today a key analytical technique to elucidate the amount and content of proteins expressed in a certain cellular context. The degree of automation in proteomics has yet to reach that of genomic techniques, but even current technologies make a manual inspection of the data infeasible. This article addresses the key algorithmic problems bioinformaticians face when handling modern proteomic samples and shows common solutions to them. We provide examples on how algorithms can be combined to build relatively complex analysis pipelines, point out certain pitfalls and aspects worth considering and give a list of current state-of-the-art tools.
Collapse
Affiliation(s)
- Chris Bielow
- AG Algorithmische Bioinformatik, Institut für Informatik, Freie Universität Berlin, Berlin, Germany.
| | | | | | | |
Collapse
|
108
|
Dowsey AW, English JA, Lisacek F, Morris JS, Yang GZ, Dunn MJ. Image analysis tools and emerging algorithms for expression proteomics. Proteomics 2010; 10:4226-57. [PMID: 21046614 PMCID: PMC3257807 DOI: 10.1002/pmic.200900635] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 08/28/2010] [Indexed: 11/11/2022]
Abstract
Since their origins in academic endeavours in the 1970s, computational analysis tools have matured into a number of established commercial packages that underpin research in expression proteomics. In this paper we describe the image analysis pipeline for the established 2-DE technique of protein separation, and by first covering signal analysis for MS, we also explain the current image analysis workflow for the emerging high-throughput 'shotgun' proteomics platform of LC coupled to MS (LC/MS). The bioinformatics challenges for both methods are illustrated and compared, whereas existing commercial and academic packages and their workflows are described from both a user's and a technical perspective. Attention is given to the importance of sound statistical treatment of the resultant quantifications in the search for differential expression. Despite wide availability of proteomics software, a number of challenges have yet to be overcome regarding algorithm accuracy, objectivity and automation, generally due to deterministic spot-centric approaches that discard information early in the pipeline, propagating errors. We review recent advances in signal and image analysis algorithms in 2-DE, MS, LC/MS and Imaging MS. Particular attention is given to wavelet techniques, automated image-based alignment and differential analysis in 2-DE, Bayesian peak mixture models, and functional mixed modelling in MS, and group-wise consensus alignment methods for LC/MS.
Collapse
Affiliation(s)
- Andrew W. Dowsey
- Institute of Biomedical Engineering, Imperial College London, South Kensington, London SW7 2AZ, U.K
| | - Jane A. English
- Proteome Research Centre, UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Ireland
| | - Frederique Lisacek
- Proteome Informatics Group, Swiss Institute of Bioinformatics, CMU - 1, rue Michel Servet, CH-1211 Geneva, Switzerland
| | - Jeffrey S. Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030-4009, U.S.A
| | - Guang-Zhong Yang
- Institute of Biomedical Engineering, Imperial College London, South Kensington, London SW7 2AZ, U.K
| | - Michael J. Dunn
- Proteome Research Centre, UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Ireland
| |
Collapse
|
109
|
Koh Y, Pasikanti KK, Yap CW, Chan ECY. Comparative evaluation of software for retention time alignment of gas chromatography/time-of-flight mass spectrometry-based metabonomic data. J Chromatogr A 2010; 1217:8308-16. [PMID: 21081237 DOI: 10.1016/j.chroma.2010.10.101] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2010] [Revised: 09/15/2010] [Accepted: 10/25/2010] [Indexed: 10/18/2022]
Abstract
In chromatography-based metabonomic research, retention time (RT) alignment of chromatographic peaks poses a challenge for the accurate profiling of biomarkers. Although a number of RT alignment software has been reported, the performance of these software packages have not been comprehensively evaluated. This study aimed to evaluate the RT alignment accuracy of publicly available and commercial RT alignment software. Two gas chromatography/mass spectrometry (GC/MS) datasets acquired from a mixture of standard metabolites and human bladder cancer urine samples, were used to assess three publicly available software packages, MetAlign, MZmine and TagFinder, and two commercial applications comprising the Calibration feature and Statistical Compare of ChromaTOF software. The overall RT alignment accuracies in aligning standard compounds mixture were 93, 92, 74, 73 and 42% for Calibration feature, MZmine, MetAlign, Statistical Compare and TagFinder, respectively. Additionally, unique trends were observed for the individual software with regards to the different experimental conditions related to extent and direction of RT shifts. Conflicting performance was observed for human urine samples suggesting that RT misalignments still occurred despite the use of RT alignment software. While RT alignment remains an inevitable step in data preprocessing, metabonomic researchers are recommended to perform manual check on the RT alignment of important biomarkers as part of their validation process.
Collapse
Affiliation(s)
- Yueting Koh
- Department of Pharmacy, Faculty of Science, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore
| | | | | | | |
Collapse
|
110
|
Brodsky L, Moussaieff A, Shahaf N, Aharoni A, Rogachev I. Evaluation of Peak Picking Quality in LC−MS Metabolomics Data. Anal Chem 2010; 82:9177-87. [PMID: 20977194 DOI: 10.1021/ac101216e] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Leonid Brodsky
- Department of Plant Sciences, Weizmann Institute of Science, P.O. Box 26, Rehovot 76100, Israel, and Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel
| | - Arieh Moussaieff
- Department of Plant Sciences, Weizmann Institute of Science, P.O. Box 26, Rehovot 76100, Israel, and Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel
| | - Nir Shahaf
- Department of Plant Sciences, Weizmann Institute of Science, P.O. Box 26, Rehovot 76100, Israel, and Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel
| | - Asaph Aharoni
- Department of Plant Sciences, Weizmann Institute of Science, P.O. Box 26, Rehovot 76100, Israel, and Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel
| | - Ilana Rogachev
- Department of Plant Sciences, Weizmann Institute of Science, P.O. Box 26, Rehovot 76100, Israel, and Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel
| |
Collapse
|
111
|
Neumann S, Böcker S. Computational mass spectrometry for metabolomics: identification of metabolites and small molecules. Anal Bioanal Chem 2010; 398:2779-88. [PMID: 20936272 DOI: 10.1007/s00216-010-4142-5] [Citation(s) in RCA: 115] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Revised: 08/16/2010] [Accepted: 08/18/2010] [Indexed: 11/26/2022]
Abstract
The identification of compounds from mass spectrometry (MS) data is still seen as a major bottleneck in the interpretation of MS data. This is particularly the case for the identification of small compounds such as metabolites, where until recently little progress has been made. Here we review the available approaches to annotation and identification of chemical compounds based on electrospray ionization (ESI-MS) data. The methods are not limited to metabolomics applications, but are applicable to any small compounds amenable to MS analysis. Starting with the definition of identification, we focus on the analysis of tandem mass and MS(n) spectra, which can provide a wealth of structural information. Searching in libraries of reference spectra provides the most reliable source of identification, especially if measured on comparable instruments. We review several choices for the distance functions. The identification without reference spectra is even more challenging, because it requires approaches to interpret tandem mass spectra with regard to the molecular structure. Both commercial and free tools are capable of mining general-purpose compound libraries, and identifying candidate compounds. The holy grail of computational mass spectrometry is the de novo deduction of structure hypotheses for compounds, where method development has only started thus far. In a case study, we apply several of the available methods to the three compounds, kaempferol, reserpine, and verapamil, and investigate whether this results in reliable identifications.
Collapse
Affiliation(s)
- Steffen Neumann
- Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry, 06120 Halle, Germany.
| | | |
Collapse
|
112
|
Babushok VI, Zenkevich IG. Retention Characteristics of Peptides in RP-LC: Peptide Retention Prediction. Chromatographia 2010. [DOI: 10.1365/s10337-010-1721-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
113
|
Pluskal T, Castillo S, Villar-Briones A, Oresic M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 2010; 11:395. [PMID: 20650010 PMCID: PMC2918584 DOI: 10.1186/1471-2105-11-395] [Citation(s) in RCA: 2514] [Impact Index Per Article: 179.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2010] [Accepted: 07/23/2010] [Indexed: 12/31/2022] Open
Abstract
Background Mass spectrometry (MS) coupled with online separation methods is commonly applied for differential and quantitative profiling of biological samples in metabolomic as well as proteomic research. Such approaches are used for systems biology, functional genomics, and biomarker discovery, among others. An ongoing challenge of these molecular profiling approaches, however, is the development of better data processing methods. Here we introduce a new generation of a popular open-source data processing toolbox, MZmine 2. Results A key concept of the MZmine 2 software design is the strict separation of core functionality and data processing modules, with emphasis on easy usability and support for high-resolution spectra processing. Data processing modules take advantage of embedded visualization tools, allowing for immediate previews of parameter settings. Newly introduced functionality includes the identification of peaks using online databases, MSn data support, improved isotope pattern support, scatter plot visualization, and a new method for peak list alignment based on the random sample consensus (RANSAC) algorithm. The performance of the RANSAC alignment was evaluated using synthetic datasets as well as actual experimental data, and the results were compared to those obtained using other alignment algorithms. Conclusions MZmine 2 is freely available under a GNU GPL license and can be obtained from the project website at: http://mzmine.sourceforge.net/. The current version of MZmine 2 is suitable for processing large batches of data and has been applied to both targeted and non-targeted metabolomic analyses.
Collapse
Affiliation(s)
- Tomás Pluskal
- G0 Cell Unit, Okinawa Institute of Science and Technology, Onna, Okinawa, Japan.
| | | | | | | |
Collapse
|
114
|
Di Lena P, Margara L. Optimal global alignment of signals by maximization of Pearson correlation. INFORM PROCESS LETT 2010. [DOI: 10.1016/j.ipl.2010.05.024] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
115
|
Torgrip RJO, Alm E, Åberg KM. Warping and alignment technologies for inter-sample feature correspondence in 1D H-NMR, chromatography-, and capillary electrophoresis-mass spectrometry data. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/s12566-010-0008-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
116
|
Crutchfield CA, Lu W, Melamud E, Rabinowitz JD. Mass spectrometry-based metabolomics of yeast. Methods Enzymol 2010; 470:393-426. [PMID: 20946819 DOI: 10.1016/s0076-6879(10)70016-1] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Driven by the advent of metabolomics, recent years have seen renewed interest in the investigation of yeast metabolism. Here we provide a practical guide to metabolomic analysis of yeast using liquid chromatography-mass spectrometry (LC-MS). We begin with background on LC-MS and its utility in studying yeast metabolism. We then describe key issues involved at each step of a typical yeast metabolomics experiment: in experimental design, cell culture, metabolite extraction, LC-MS, and data processing and analysis. Throughout, we highlight interdependencies between the steps that are relevant to developing an integrated workflow which effectively leverages LC-MS to reveal yeast biology.
Collapse
Affiliation(s)
- Christopher A Crutchfield
- Lewis-Sigler Institute for Integrative Genomics, Department of Chemistry, Princeton University, Princeton, New Jersey, USA
| | | | | | | |
Collapse
|
117
|
Christin C, Hoefsloot HCJ, Smilde AK, Suits F, Bischoff R, Horvatovich PL. Time Alignment Algorithms Based on Selected Mass Traces for Complex LC-MS Data. J Proteome Res 2010; 9:1483-95. [DOI: 10.1021/pr9010124] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Christin Christin
- Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands, Biosystem Data Analysis, Swammerdam Institute for Life Science, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands, and BM T.J. Watson Research Centre, Yorktown Heights, New York 10598
| | - Huub C. J. Hoefsloot
- Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands, Biosystem Data Analysis, Swammerdam Institute for Life Science, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands, and BM T.J. Watson Research Centre, Yorktown Heights, New York 10598
| | - Age K. Smilde
- Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands, Biosystem Data Analysis, Swammerdam Institute for Life Science, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands, and BM T.J. Watson Research Centre, Yorktown Heights, New York 10598
| | - Frank Suits
- Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands, Biosystem Data Analysis, Swammerdam Institute for Life Science, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands, and BM T.J. Watson Research Centre, Yorktown Heights, New York 10598
| | - Rainer Bischoff
- Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands, Biosystem Data Analysis, Swammerdam Institute for Life Science, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands, and BM T.J. Watson Research Centre, Yorktown Heights, New York 10598
| | - Peter L. Horvatovich
- Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands, Biosystem Data Analysis, Swammerdam Institute for Life Science, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands, and BM T.J. Watson Research Centre, Yorktown Heights, New York 10598
| |
Collapse
|
118
|
Junot C, Madalinski G, Tabet JC, Ezan E. Fourier transform mass spectrometry for metabolome analysis. Analyst 2010; 135:2203-19. [DOI: 10.1039/c0an00021c] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
119
|
Anderson PE, Raymer ML, Kelly BJ, Reo NV, DelRaso NJ, Doom TE. Characterization of 1H NMR spectroscopic data and the generation of synthetic validation sets. ACTA ACUST UNITED AC 2009; 25:2992-3000. [PMID: 19759199 DOI: 10.1093/bioinformatics/btp540] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Common contemporary practice within the nuclear magnetic resonance (NMR) metabolomics community is to evaluate and validate novel algorithms on empirical data or simplified simulated data. Empirical data captures the complex characteristics of experimental data, but the optimal or most correct analysis is unknown a priori; therefore, researchers are forced to rely on indirect performance metrics, which are of limited value. In order to achieve fair and complete analysis of competing techniques more exacting metrics are required. Thus, metabolomics researchers often evaluate their algorithms on simplified simulated data with a known answer. Unfortunately, the conclusions obtained on simulated data are only of value if the data sets are complex enough for results to generalize to true experimental data. Ideally, synthetic data should be indistinguishable from empirical data, yet retain a known best analysis. RESULTS We have developed a technique for creating realistic synthetic metabolomics validation sets based on NMR spectroscopic data. The validation sets are developed by characterizing the salient distributions in sets of empirical spectroscopic data. Using this technique, several validation sets are constructed with a variety of characteristics present in 'real' data. A case study is then presented to compare the relative accuracy of several alignment algorithms using the increased precision afforded by these synthetic data sets. AVAILABILITY These data sets are available for download at http://birg.cs.wright.edu/nmr_synthetic_data_sets.
Collapse
Affiliation(s)
- Paul E Anderson
- Department of Computer Science and Engineering, Dayton, OH 45435, USA
| | | | | | | | | | | |
Collapse
|
120
|
Issaq HJ, Van QN, Waybright TJ, Muschik GM, Veenstra TD. Analytical and statistical approaches to metabolomics research. J Sep Sci 2009; 32:2183-99. [PMID: 19569098 DOI: 10.1002/jssc.200900152] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Metabolomics, the global profiling of metabolites in different living systems, has experienced a rekindling of interest partially due to the improved detection capabilities of the instrumental techniques currently being used in this area of biomedical research. The analytical methods of choice for the analysis of metabolites in search of disease biomarkers in biological specimens, and for the study of various low molecular weight metabolic pathways include NMR spectroscopy, GC/MS, CE/MS, and HPLC/MS. Global metabolite analysis and profiling of two different sets of data results in a plethora of data that is difficult to manage or interpret manually because of their subtle differences. Multivariate statistical methods and pattern-recognition programs were developed to handle the acquired data and to search for the discriminating features between data acquired from two sample sets, healthy and diseased. Metabolomics have been used in toxicology, plant physiology, and biomedical research. In this paper, we discuss various aspects of metabolomic research including sample collection, handling, storage, requirements for sample analysis, peak alignment, data interpretation using statistical approaches, metabolite identification, and finally recommendations for successful analysis.
Collapse
Affiliation(s)
- Haleem J Issaq
- Laboratory of Proteomics and Analytical Technologies, Advanced Technology Program, SAIC-Frederick, Inc., NCI-Frederick, Frederick, MD, USA.
| | | | | | | | | |
Collapse
|
121
|
Draper J, Enot DP, Parker D, Beckmann M, Snowdon S, Lin W, Zubair H. Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'. BMC Bioinformatics 2009; 10:227. [PMID: 19622150 PMCID: PMC2721842 DOI: 10.1186/1471-2105-10-227] [Citation(s) in RCA: 127] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2009] [Accepted: 07/21/2009] [Indexed: 01/21/2023] Open
Abstract
Background Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI). Results Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data. Conclusion We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.
Collapse
Affiliation(s)
- John Draper
- Institute of Biological Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3DA, UK.
| | | | | | | | | | | | | |
Collapse
|
122
|
Challenges in applying chemometrics to LC–MS-based global metabolite profile data. Bioanalysis 2009; 1:805-19. [DOI: 10.4155/bio.09.64] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Metabolite profiling can provide insights into the metabolic status of complex living systems through the non-targeted analysis of metabolites in any biological sample. Metabolite profiling is complementary to genomics, transcriptomics and proteomics, and its applications span epidemiology, disease diagnosis, nutrition, pharmaceutical research, and toxicology. Metabolic phenotypes are a reflection of an organism’s environment, lifestyle, diet, gut microfloral composition and are also influenced by genetic factors, with important implications in genome-wide-association studies. Specialized analytical platforms, such as NMR spectroscopy and MS, are required to interrogate such metabolic complexity. The increased sophistication of such techniques has lead to a demand for improved data analysis approaches, including preprocessing and advanced chemometric techniques. This article discusses data generation, preprocessing, multivariate analysis and data interpretation for LC-MS-based metabolite profiling, focusing on challenges encountered and potential solutions.
Collapse
|
123
|
Kim YJ, Feild B, Fitzhugh W, Heidbrink JL, Duff JW, Heil J, Ruben SM, He T. Reference map for liquid chromatography-mass spectrometry-based quantitative proteomics. Anal Biochem 2009; 393:155-62. [PMID: 19538932 DOI: 10.1016/j.ab.2009.06.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Revised: 05/29/2009] [Accepted: 06/11/2009] [Indexed: 10/20/2022]
Abstract
The accurate mass and time (AMT) tag strategy has been recognized as a powerful tool for high-throughput analysis in liquid chromatography-mass spectrometry (LC-MS)-based proteomics. Due to the complexity of the human proteome, this strategy requires highly accurate mass measurements for confident identifications. We have developed a method of building a reference map that allows relaxed criteria for mass errors yet delivers high confidence for peptide identifications. The samples used for generating the peptide database were produced by collecting cysteine-containing peptides from T47D cells and then fractionating the peptides using strong cationic exchange chromatography (SCX). LC-tandem mass spectrometry (MS/MS) data from the SCX fractions were combined to create a comprehensive reference map. After the reference map was built, it was possible to skip the SCX step in further proteomic analyses. We found that the reference-driven identification increases the overall throughput and proteomic coverage by identifying peptides with low intensity or complex interference. The use of the reference map also facilitates the quantitation process by allowing extraction of peptide intensities of interest and incorporating models of theoretical isotope distribution.
Collapse
|
124
|
Lommen A. MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing. Anal Chem 2009; 81:3079-86. [PMID: 19301908 DOI: 10.1021/ac900036d] [Citation(s) in RCA: 526] [Impact Index Per Article: 35.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Hyphenated full-scan MS technology creates large amounts of data. A versatile easy to handle automation tool aiding in the data analysis is very important in handling such a data stream. MetAlign softwareas described in this manuscripthandles a broad range of accurate mass and nominal mass GC/MS and LC/MS data. It is capable of automatic format conversions, accurate mass calculations, baseline corrections, peak-picking, saturation and mass-peak artifact filtering, as well as alignment of up to 1000 data sets. A 100 to 1000-fold data reduction is achieved. MetAlign software output is compatible with most multivariate statistics programs.
Collapse
Affiliation(s)
- Arjen Lommen
- RIKILT-Institute of Food Safety, Wageningen UR, P.O. Box 230, 6700 AE Wageningen, The Netherlands.
| |
Collapse
|
125
|
Podwojski K, Fritsch A, Chamrad DC, Paul W, Sitek B, Stühler K, Mutzel P, Stephan C, Meyer HE, Urfer W, Ickstadt K, Rahnenführer J. Retention time alignment algorithms for LC/MS data must consider non-linear shifts. ACTA ACUST UNITED AC 2009; 25:758-64. [PMID: 19176558 DOI: 10.1093/bioinformatics/btp052] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Proteomics has particularly evolved to become of high interest for the field of biomarker discovery and drug development. Especially the combination of liquid chromatography and mass spectrometry (LC/MS) has proven to be a powerful technique for analyzing protein mixtures. Clinically orientated proteomic studies will have to compare hundreds of LC/MS runs at a time. In order to compare different runs, sophisticated preprocessing steps have to be performed. An important step is the retention time (rt) alignment of LC/MS runs. Especially non-linear shifts in the rt between pairs of LC/MS runs make this a crucial and non-trivial problem. RESULTS For the purpose of demonstrating the particular importance of correcting non-linear rt shifts, we evaluate and compare different alignment algorithms. We present and analyze two versions of a new algorithm that is based on regression techniques, once assuming and estimating only linear shifts and once also allowing for the estimation of non-linear shifts. As an example for another type of alignment method we use an established alignment algorithm based on shifting vectors that we adapted to allow for correcting non-linear shifts also. In a simulation study, we show that rt alignment procedures that can estimate non-linear shifts yield clearly better alignments. This is even true under mild non-linear deviations. AVAILABILITY R code for the regression-based alignment methods and simulated datasets are available at http://www.statistik.tu-dortmund.de/genetik-publikationen-alignment.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Katharina Podwojski
- Fakultät Statistik, Technische Universität Dortmund, 44221 Dortmund, Germany.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|