1
|
Comparison of Bi- and Tri-Linear PLS Models for Variable Selection in Metabolomic Time-Series Experiments. Metabolites 2019; 9:metabo9050092. [PMID: 31075899 PMCID: PMC6571821 DOI: 10.3390/metabo9050092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Revised: 05/07/2019] [Accepted: 05/08/2019] [Indexed: 11/25/2022] Open
Abstract
Metabolomic studies with a time-series design are widely used for discovery and validation of biomarkers. In such studies, changes of metabolic profiles over time under different conditions (e.g., control and intervention) are compared, and metabolites responding differently between the conditions are identified as putative biomarkers. To incorporate time-series information into the variable (biomarker) selection in partial least squares regression (PLS) models, we created PLS models with different combinations of bilinear/trilinear X and group/time response dummy Y. In total, five PLS models were evaluated on two real datasets, and also on simulated datasets with varying characteristics (number of subjects, number of variables, inter-individual variability, intra-individual variability and number of time points). Variables showing specific temporal patterns observed visually and determined statistically were labelled as discriminating variables. Bootstrapped-VIP scores were calculated for variable selection and the variable selection performance of five PLS models were assessed based on their capacity to correctly select the discriminating variables. The results showed that the bilinear PLS model with group × time response as dummy Y provided the highest recall (true positive rate) of 83–95% with high precision, independent of most characteristics of the datasets. Trilinear PLS models tend to select a small number of variables with high precision but relatively high false negative rate (lower power). They are also less affected by the noise compared to bilinear PLS models. In datasets with high inter-individual variability, bilinear PLS models tend to provide higher recall while trilinear models tend to provide higher precision. Overall, we recommend bilinear PLS with group x time response Y for variable selection applications in metabolomics intervention time series studies.
Collapse
|
2
|
A weighted relative difference accumulation algorithm for dynamic metabolomics data: long-term elevated bile acids are risk factors for hepatocellular carcinoma. Sci Rep 2015; 5:8984. [PMID: 25757957 PMCID: PMC4355672 DOI: 10.1038/srep08984] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 02/09/2015] [Indexed: 12/14/2022] Open
Abstract
Dynamic metabolomics studies can provide a systematic view of the metabolic trajectory during disease development and drug treatment and reveal the nature of biological processes at metabolic level. To extract important information in a systematic time dimension rather than at isolated time points, a weighted method based on the means and variations along the time points was proposed and first applied to previously published rat model data. The method was subsequently extended and applied to prospective metabolomics data analysis of hepatocellular carcinoma (HCC). Permutation was employed for noise filtering and false discovery rate (FDR) was used for parameter optimization during the feature selection. Long-term elevated serum bile acids were identified as risk factors for HCC development.
Collapse
|
3
|
Multi-way PLS regression: Monotony convergence of tri-linear PLS2 and optimality of parameters. Comput Stat Data Anal 2015. [DOI: 10.1016/j.csda.2014.10.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
4
|
van der Greef J, van Wietmarschen H, van Ommen B, Verheij E. Looking back into the future: 30 years of metabolomics at TNO. MASS SPECTROMETRY REVIEWS 2013; 32:399-415. [PMID: 23630115 DOI: 10.1002/mas.21370] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Revised: 11/21/2012] [Accepted: 11/21/2012] [Indexed: 06/02/2023]
Abstract
Metabolites have played an essential role in our understanding of life, health, and disease for thousands of years. This domain became much more important after the concept of metabolism was discovered. In the 1950s, mass spectrometry was coupled to chromatography and made the technique more application-oriented and allowed the development of new profiling technologies. Since 1980, TNO has performed system-based metabolic profiling of body fluids, and combined with pattern recognition has led to many discoveries and contributed to the field known as metabolomics and systems biology. This review describes the development of related concepts and applications at TNO in the biomedical, pharmaceutical, nutritional, and microbiological fields, and provides an outlook for the future.
Collapse
|
5
|
Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow. Anal Bioanal Chem 2013; 405:5147-57. [DOI: 10.1007/s00216-013-6856-7] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2012] [Revised: 02/19/2013] [Accepted: 02/19/2013] [Indexed: 01/01/2023]
|
6
|
Li J, Huang C, Zheng D, Wang Y, Yuan Z. CcpA-Mediated Enhancement of Sugar and Amino Acid Metabolism in Lysinibacillus sphaericus by NMR-Based Metabolomics. J Proteome Res 2012; 11:4654-61. [DOI: 10.1021/pr300469v] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jing Li
- Center for Applied and Environmental
Microbiology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, People’s Republic
of China
- Graduate School of the Chinese Academy of Sciences, Beijing 100039,
People’s Republic of China
| | - Chongyang Huang
- Wuhan Center of
Magnetic Resonance,
State Key Laboratory of Magnetic Resonance and Atomic and Molecular
Physics, Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences, Wuhan 430071, People’s
Republic of China
- Graduate School of the Chinese Academy of Sciences, Beijing 100039,
People’s Republic of China
| | - Dasheng Zheng
- Center for Applied and Environmental
Microbiology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, People’s Republic
of China
| | - Yulan Wang
- Wuhan Center of
Magnetic Resonance,
State Key Laboratory of Magnetic Resonance and Atomic and Molecular
Physics, Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences, Wuhan 430071, People’s
Republic of China
| | - Zhiming Yuan
- Center for Applied and Environmental
Microbiology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, People’s Republic
of China
| |
Collapse
|
7
|
Xia J, Mandal R, Sinelnikov IV, Broadhurst D, Wishart DS. MetaboAnalyst 2.0--a comprehensive server for metabolomic data analysis. Nucleic Acids Res 2012; 40:W127-33. [PMID: 22553367 PMCID: PMC3394314 DOI: 10.1093/nar/gks374] [Citation(s) in RCA: 872] [Impact Index Per Article: 72.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2012] [Revised: 04/03/2012] [Accepted: 04/12/2012] [Indexed: 01/22/2023] Open
Abstract
First released in 2009, MetaboAnalyst (www.metaboanalyst.ca) was a relatively simple web server designed to facilitate metabolomic data processing and statistical analysis. With continuing advances in metabolomics along with constant user feedback, it became clear that a substantial upgrade to the original server was necessary. MetaboAnalyst 2.0, which is the successor to MetaboAnalyst, represents just such an upgrade. MetaboAnalyst 2.0 now contains dozens of new features and functions including new procedures for data filtering, data editing and data normalization. It also supports multi-group data analysis, two-factor analysis as well as time-series data analysis. These new functions have also been supplemented with: (i) a quality-control module that allows users to evaluate their data quality before conducting any analysis, (ii) a functional enrichment analysis module that allows users to identify biologically meaningful patterns using metabolite set enrichment analysis and (iii) a metabolic pathway analysis module that allows users to perform pathway analysis and visualization for 15 different model organisms. In developing MetaboAnalyst 2.0 we have also substantially improved its graphical presentation tools. All images are now generated using anti-aliasing and are available over a range of resolutions, sizes and formats (PNG, TIFF, PDF, PostScript, or SVG). To improve its performance, MetaboAnalyst 2.0 is now hosted on a much more powerful server with substantially modified code to take advantage the server's multi-core CPUs for computationally intensive tasks. MetaboAnalyst 2.0 also maintains a collection of 50 or more FAQs and more than a dozen tutorials compiled from user queries and requests. A downloadable version of MetaboAnalyst 2.0, along detailed instructions for local installation is now available as well.
Collapse
Affiliation(s)
- Jianguo Xia
- Department of Biological Sciences, Department of Computing Science, Department of Medicine and National Research Council, National Institute for Nanotechnology (NINT), Edmonton, AB, Canada T6G 2E8
| | - Rupasri Mandal
- Department of Biological Sciences, Department of Computing Science, Department of Medicine and National Research Council, National Institute for Nanotechnology (NINT), Edmonton, AB, Canada T6G 2E8
| | - Igor V. Sinelnikov
- Department of Biological Sciences, Department of Computing Science, Department of Medicine and National Research Council, National Institute for Nanotechnology (NINT), Edmonton, AB, Canada T6G 2E8
| | - David Broadhurst
- Department of Biological Sciences, Department of Computing Science, Department of Medicine and National Research Council, National Institute for Nanotechnology (NINT), Edmonton, AB, Canada T6G 2E8
| | - David S. Wishart
- Department of Biological Sciences, Department of Computing Science, Department of Medicine and National Research Council, National Institute for Nanotechnology (NINT), Edmonton, AB, Canada T6G 2E8
| |
Collapse
|
8
|
Ng JWY, Barrett LM, Wong A, Kuh D, Smith GD, Relton CL. The role of longitudinal cohort studies in epigenetic epidemiology: challenges and opportunities. Genome Biol 2012; 13:246. [PMID: 22747597 PMCID: PMC3446311 DOI: 10.1186/gb-2012-13-6-246] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Longitudinal cohort studies are ideal for investigating how epigenetic patterns change over time and relate to changing exposure patterns and the development of disease. We highlight the challenges and opportunities in this approach.
Collapse
|
9
|
Ng JWY, Barrett LM, Wong A, Kuh D, Smith G, Relton CL. The role of longitudinal cohort studies in epigenetic epidemiology: challenges and opportunities. Genome Biol 2012. [DOI: 10.1186/gb4029] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
10
|
Boccard J, Badoud F, Grata E, Ouertani S, Hanafi M, Mazerolles G, Lantéri P, Veuthey JL, Saugy M, Rudaz S. A steroidomic approach for biomarkers discovery in doping control. Forensic Sci Int 2011; 213:85-94. [DOI: 10.1016/j.forsciint.2011.07.023] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Revised: 07/08/2011] [Accepted: 07/12/2011] [Indexed: 11/24/2022]
|
11
|
Koek MM, Jellema RH, van der Greef J, Tas AC, Hankemeier T. Quantitative metabolomics based on gas chromatography mass spectrometry: status and perspectives. Metabolomics 2011; 7:307-328. [PMID: 21949491 PMCID: PMC3155681 DOI: 10.1007/s11306-010-0254-3] [Citation(s) in RCA: 222] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2010] [Accepted: 10/25/2010] [Indexed: 01/17/2023]
Abstract
Metabolomics involves the unbiased quantitative and qualitative analysis of the complete set of metabolites present in cells, body fluids and tissues (the metabolome). By analyzing differences between metabolomes using biostatistics (multivariate data analysis; pattern recognition), metabolites relevant to a specific phenotypic characteristic can be identified. However, the reliability of the analytical data is a prerequisite for correct biological interpretation in metabolomics analysis. In this review the challenges in quantitative metabolomics analysis with regards to analytical as well as data preprocessing steps are discussed. Recommendations are given on how to optimize and validate comprehensive silylation-based methods from sample extraction and derivatization up to data preprocessing and how to perform quality control during metabolomics studies. The current state of method validation and data preprocessing methods used in published literature are discussed and a perspective on the future research necessary to obtain accurate quantitative data from comprehensive GC-MS data is provided.
Collapse
Affiliation(s)
- Maud M. Koek
- Analytical Research Department, TNO Quality of Life, Utrechtseweg 48, P.O. Box 360, 3700 AJ Zeist, The Netherlands
| | - Renger H. Jellema
- DSM Biotechnology Center, Alexander Fleminglaan 1, P.O. Box 1, 2600 MA Delft, The Netherlands
| | - Jan van der Greef
- Division of Analytical Biosciences, Leiden/Amsterdam Center for Drug Research (LACDR), Leiden University, P.O. Box 9502, 2300 RA Leiden, The Netherlands
- SU BioMedicine and TNO Quality of Life, Utrechtseweg 48, P.O. Box 360, 3700 AJ Zeist, The Netherlands
| | - Albert C. Tas
- Analytical Research Department, TNO Quality of Life, Utrechtseweg 48, P.O. Box 360, 3700 AJ Zeist, The Netherlands
| | - Thomas Hankemeier
- Division of Analytical Biosciences, Leiden/Amsterdam Center for Drug Research (LACDR), Leiden University, P.O. Box 9502, 2300 RA Leiden, The Netherlands
- Netherlands Metabolomics Centre, Einsteinweg 55, 2333 CC Leiden, The Netherlands
| |
Collapse
|
12
|
Van Batenburg MF, Coulier L, van Eeuwijk F, Smilde AK, Westerhuis JA. New figures of merit for comprehensive functional genomics data: the metabolomics case. Anal Chem 2011; 83:3267-74. [PMID: 21391558 DOI: 10.1021/ac102374c] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In the field of metabolomics, hundreds of metabolites are measured simultaneously by analytical platforms such as gas chromatography/mass spectrometry (GC/MS), liquid chromatography/mass spectrometry (LC/MS) and NMR to obtain their concentration levels in a reliable way. Analytical repeatability (intrabatch precision) is a common figure of merit for the measurement error of metabolites repeatedly measured in one batch on one platform. This measurement error, however, is not constant as its value may depend on the concentration level of the metabolite. Moreover, measurement errors may be correlated between metabolites. In this work, we introduce new figures of merit for comprehensive measurements that can detect these nonconstant correlated errors. Furthermore, for the metabolomics case we identified that these nonconstant correlated errors can result from sample instability between repeated analyses, instrumental noise generated by the analytical platform, or bias that results from data pretreatment.
Collapse
Affiliation(s)
- Marinus F Van Batenburg
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, The Netherlands
| | | | | | | | | |
Collapse
|
13
|
Hendrickx DM, Hendriks MMWB, Eilers PHC, Smilde AK, Hoefsloot HCJ. Reverse engineering of metabolic networks, a critical assessment. MOLECULAR BIOSYSTEMS 2010; 7:511-20. [PMID: 21069230 DOI: 10.1039/c0mb00083c] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Inferring metabolic networks from metabolite concentration data is a central topic in systems biology. Mathematical techniques to extract information about the network from data have been proposed in the literature. This paper presents a critical assessment of the feasibility of reverse engineering of metabolic networks, illustrated with a selection of methods. Appropriate data are simulated to study the performance of four representative methods. An overview of sampling and measurement methods currently in use for generating time-resolved metabolomics data is given and contrasted with the needs of the discussed reverse engineering methods. The results of this assessment show that if full inference of a real-world metabolic network is the goal there is a large discrepancy between the requirements of reverse engineering of metabolic networks and contemporary measurement practice. Recommendations for improved time-resolved experimental designs are given.
Collapse
Affiliation(s)
- Diana M Hendrickx
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, The Netherlands.
| | | | | | | | | |
Collapse
|
14
|
Braaksma M, Bijlsma S, Coulier L, Punt PJ, van der Werf MJ. Metabolomics as a tool for target identification in strain improvement: the influence of phenotype definition. MICROBIOLOGY-SGM 2010; 157:147-159. [PMID: 20847006 DOI: 10.1099/mic.0.041244-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
For the optimization of microbial production processes, the choice of the quantitative phenotype to be optimized is crucial. For instance, for the optimization of product formation, either product concentration or productivity can be pursued, potentially resulting in different targets for strain improvement. The choice of a quantitative phenotype is highly relevant for classical improvement approaches, and even more so for modern systems biology approaches. In this study, the information content of a metabolomics dataset was determined with respect to different quantitative phenotypes related to the formation of specific products. To this end, the production of two industrially relevant products by Aspergillus niger was evaluated: (i) the enzyme glucoamylase, and (ii) the more complex product group of secreted proteases, consisting of multiple enzymes. For both products, six quantitative phenotypes associated with activity and productivity were defined, also taking into account different time points of sampling during the fermentation. Both linear and nonlinear relationships between the metabolome data and the different quantitative phenotypes were considered. The multivariate data analysis tool partial least-squares (PLS) was used to evaluate the information content of the datasets for all the different quantitative phenotypes defined. Depending on the product studied, different quantitative phenotypes were found to have the highest information content in specific metabolomics datasets. A detailed analysis of the metabolites that showed strong correlation with these quantitative phenotypes revealed that various sugar derivatives correlated with glucoamylase activity. For the reduction of protease activity, mainly as-yet-unidentified compounds correlated.
Collapse
Affiliation(s)
- Machtelt Braaksma
- Kluyver Centre for Genomics of Industrial Fermentation, PO Box 5057, 2600 GA Delft, The Netherlands
- TNO Quality of Life, PO Box 360, 3700 AJ Zeist, The Netherlands
| | - Sabina Bijlsma
- TNO Quality of Life, PO Box 360, 3700 AJ Zeist, The Netherlands
| | - Leon Coulier
- TNO Quality of Life, PO Box 360, 3700 AJ Zeist, The Netherlands
| | - Peter J Punt
- Kluyver Centre for Genomics of Industrial Fermentation, PO Box 5057, 2600 GA Delft, The Netherlands
- TNO Quality of Life, PO Box 360, 3700 AJ Zeist, The Netherlands
| | - Mariët J van der Werf
- Kluyver Centre for Genomics of Industrial Fermentation, PO Box 5057, 2600 GA Delft, The Netherlands
- TNO Quality of Life, PO Box 360, 3700 AJ Zeist, The Netherlands
| |
Collapse
|
15
|
Smilde AK, Westerhuis JA, Hoefsloot HCJ, Bijlsma S, Rubingh CM, Vis DJ, Jellema RH, Pijl H, Roelfsema F, van der Greef J. Dynamic metabolomic data analysis: a tutorial review. Metabolomics 2010; 6:3-17. [PMID: 20339444 PMCID: PMC2834778 DOI: 10.1007/s11306-009-0191-1] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/28/2009] [Accepted: 11/09/2009] [Indexed: 12/23/2022]
Abstract
In metabolomics, time-resolved, dynamic or temporal data is more and more collected. The number of methods to analyze such data, however, is very limited and in most cases the dynamic nature of the data is not even taken into account. This paper reviews current methods in use for analyzing dynamic metabolomic data. Moreover, some methods from other fields of science that may be of use to analyze such dynamic metabolomics data are described in some detail. The methods are put in a general framework after providing a formal definition on what constitutes a 'dynamic' method. Some of the methods are illustrated with real-life metabolomics examples.
Collapse
Affiliation(s)
- A. K. Smilde
- Biosystems Data Analysis, Swammerdam Institute for LifeSciences, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands
| | - J. A. Westerhuis
- Biosystems Data Analysis, Swammerdam Institute for LifeSciences, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands
| | - H. C. J. Hoefsloot
- Biosystems Data Analysis, Swammerdam Institute for LifeSciences, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands
| | - S. Bijlsma
- TNO Quality of Life, Utrechtseweg 48, 3704 HE Zeist, TheNetherlands
| | - C. M. Rubingh
- TNO Quality of Life, Utrechtseweg 48, 3704 HE Zeist, TheNetherlands
| | - D. J. Vis
- Biosystems Data Analysis, Swammerdam Institute for LifeSciences, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands
| | - R. H. Jellema
- TNO Quality of Life, Utrechtseweg 48, 3704 HE Zeist, TheNetherlands
| | - H. Pijl
- Department of Endocrinology and Metabolic Diseases, Leiden University Medical Center, Leiden, The Netherlands
| | - F. Roelfsema
- Department of Endocrinology and Metabolic Diseases, Leiden University Medical Center, Leiden, The Netherlands
| | - J. van der Greef
- TNO Quality of Life, Utrechtseweg 48, 3704 HE Zeist, TheNetherlands
| |
Collapse
|
16
|
Peters S, Janssen HG, Vivó-Truyols G. Trend analysis of time-series data: A novel method for untargeted metabolite discovery. Anal Chim Acta 2010; 663:98-104. [PMID: 20172103 DOI: 10.1016/j.aca.2010.01.038] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Revised: 01/19/2010] [Accepted: 01/19/2010] [Indexed: 10/19/2022]
Abstract
A new strategy for biomarker discovery is presented that uses time-series metabolomics data. Data sets from samples analysed at different time points after an intervention are searched for compounds that show a meaningful trend following the intervention. Obviously, this requires new data-analytical tools to distinguish such compounds from those showing only random variation. Two univariate methods, autocorrelation and curve-fitting, are used either as stand-alone methods or in combination to discover unknown metabolites in data sets originating from target-compound analysis. Both techniques reduce the long list of detected compounds in the kinetic sample set to include only those having a pre-defined interesting time profile. Thus, new metabolites may be discovered within data structures that are usually only used for target-compound analysis. The new strategy is tested on a sample set obtained from a gut fermentation study of a polyphenol-rich diet. For this study, the initial list of over 9000 potentially interesting features was reduced to less than 150, thus significantly reducing the expensive and time-consuming manual examination.
Collapse
Affiliation(s)
- Sonja Peters
- Unilever Research and Development, Advanced Measurement and Data Modelling, P.O. Box 114, 3130 AC Vlaardingen, The Netherlands.
| | | | | |
Collapse
|
17
|
van den Berg RA, Van Mechelen I, Wilderjans TF, Van Deun K, Kiers HAL, Smilde AK. Integrating functional genomics data using maximum likelihood based simultaneous component analysis. BMC Bioinformatics 2009; 10:340. [PMID: 19835617 PMCID: PMC2771021 DOI: 10.1186/1471-2105-10-340] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2009] [Accepted: 10/16/2009] [Indexed: 12/02/2022] Open
Abstract
Background In contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of collecting this type of data is to discover biological mechanisms that underlie the behavior of the variables in the different data blocks. The simultaneous component analysis (SCA) family of data analysis methods is suited for this task. However, a SCA may be hampered by the data blocks being subjected to different amounts of measurement error, or noise. To unveil the true mechanisms underlying the data, it could be fruitful to take noise heterogeneity into consideration in the data analysis. Maximum likelihood based SCA (MxLSCA-P) was developed for this purpose. In a previous simulation study it outperformed normal SCA-P. This previous study, however, did not mimic in many respects typical functional genomics data sets, such as, data blocks coupled via the experimental mode, more variables than experimental units, and medium to high correlations between variables. Here, we present a new simulation study in which the usefulness of MxLSCA-P compared to ordinary SCA-P is evaluated within a typical functional genomics setting. Subsequently, the performance of the two methods is evaluated by analysis of a real life Escherichia coli metabolomics data set. Results In the simulation study, MxLSCA-P outperforms SCA-P in terms of recovery of the true underlying scores of the common mode and of the true values underlying the data entries. MxLSCA-P further performed especially better when the simulated data blocks were subject to different noise levels. In the analysis of an E. coli metabolomics data set, MxLSCA-P provided a slightly better and more consistent interpretation. Conclusion MxLSCA-P is a promising addition to the SCA family. The analysis of coupled functional genomics data blocks could benefit from its ability to take different noise levels per data block into consideration and improve the recovery of the true patterns underlying the data. Moreover, the maximum likelihood based approach underlying MxLSCA-P could be extended to custom-made solutions to specific problems encountered.
Collapse
|