1
|
Ricci M, Gasperi F, Betta E, Menghi L, Endrizzi I, Cliceri D, Franceschi P, Aprea E. Multivariate data analysis strategy to monitor Trentingrana cheese real-scale production through volatile organic compounds profiling. Lebensm Wiss Technol 2022. [DOI: 10.1016/j.lwt.2022.114364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
2
|
Jarmund AH, Madssen TS, Giskeødegård GF. ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods. Front Mol Biosci 2022; 9:962431. [PMID: 36387276 PMCID: PMC9645785 DOI: 10.3389/fmolb.2022.962431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/20/2022] [Indexed: 01/24/2023] Open
Abstract
The increasing availability of multivariate data within biomedical research calls for appropriate statistical methods that can describe and model complex relationships between variables. The extended ANOVA simultaneous component analysis (ASCA+) framework combines general linear models and principal component analysis (PCA) to decompose and visualize the separate effects of experimental factors. It has recently been demonstrated how linear mixed models can be included in the framework to analyze data from longitudinal experimental designs with repeated measurements (RM-ASCA+). The ALASCA package for R makes the ASCA+ framework accessible for general use and includes multiple methods for validation and visualization. The package is especially useful for longitudinal data and the ability to easily adjust for covariates is an important strength. This paper demonstrates how the ALASCA package can be applied to gain insights into multivariate data from interventional as well as observational designs. Publicly available data sets from four studies are used to demonstrate the methods available (proteomics, metabolomics, and transcriptomics).
Collapse
Affiliation(s)
- Anders Hagen Jarmund
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim, Norway,Centre of Molecular Inflammation Research (CEMIR), NTNU, Trondheim, Norway,*Correspondence: Anders Hagen Jarmund,
| | | | - Guro F. Giskeødegård
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Trondheim, Norway
| |
Collapse
|
3
|
Effect of Dairy, Season, and Sampling Position on Physical Properties of Trentingrana Cheese: Application of an LMM-ASCA Model. Foods 2022; 11:foods11010127. [PMID: 35010253 PMCID: PMC8750008 DOI: 10.3390/foods11010127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 12/22/2021] [Accepted: 01/01/2022] [Indexed: 02/04/2023] Open
Abstract
Trentingrana hard cheese is a geographic specification of the PDO Grana Padano. It is produced according to an internal regulation by many cooperative dairy factories in the Trentino region (northern Italy), using a semi-artisanal process (the only allowed ingredients are milk, salt, and rennet). Within the PSR project TRENTINGRANA, colorimetric and textural measurements have been collected from 317 cheese wheels, which were sampled bi-monthly from all the consortium dairies (n = 15) within the timeframe of two years, to estimate the effect on physical properties related to the season of the year and the dairy factory implant. To estimate the effect of the dairy and the time of the year, considering the internal variability of each cheese wheel, a linear mixed-effect model combined with a simultaneous component analysis (LMM-ASCA) is proposed. Results show that all the factors have a significant effect on the colorimetric and textural properties of the cheese. There are five clusters of dairies producing cheese with similar properties, three different couples of months of the year when the cheese produced is significantly different from all the others, and the effect of the geometry of the cheese wheel is reported as well.
Collapse
|
4
|
Madssen TS, Giskeødegård GF, Smilde AK, Westerhuis JA. Repeated measures ASCA+ for analysis of longitudinal intervention studies with multivariate outcome data. PLoS Comput Biol 2021; 17:e1009585. [PMID: 34752455 PMCID: PMC8604364 DOI: 10.1371/journal.pcbi.1009585] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 11/19/2021] [Accepted: 10/22/2021] [Indexed: 11/19/2022] Open
Abstract
Longitudinal intervention studies with repeated measurements over time are an important type of experimental design in biomedical research. Due to the advent of "omics"-sciences (genomics, transcriptomics, proteomics, metabolomics), longitudinal studies generate increasingly multivariate outcome data. Analysis of such data must take both the longitudinal intervention structure and multivariate nature of the data into account. The ASCA+-framework combines general linear models with principal component analysis and can be used to separate and visualize the multivariate effect of different experimental factors. However, this methodology has not yet been developed for the more complex designs often found in longitudinal intervention studies, which may be unbalanced, involve randomized interventions, and have substantial missing data. Here we describe a new methodology, repeated measures ASCA+ (RM-ASCA+), and show how it can be used to model metabolic changes over time, and compare metabolic changes between groups, in both randomized and non-randomized intervention studies. Tools for both visualization and model validation are discussed. This approach can facilitate easier interpretation of data from longitudinal clinical trials with multivariate outcomes.
Collapse
Affiliation(s)
- Torfinn S. Madssen
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway
| | - Guro F. Giskeødegård
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway
| | - Age K. Smilde
- Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Johan A. Westerhuis
- Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
5
|
Bertinetto C, Engel J, Jansen J. ANOVA simultaneous component analysis: A tutorial review. Anal Chim Acta X 2020; 6:100061. [PMID: 33392497 PMCID: PMC7772684 DOI: 10.1016/j.acax.2020.100061] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 09/29/2020] [Accepted: 10/02/2020] [Indexed: 12/27/2022] Open
Abstract
When analyzing experimental chemical data, it is often necessary to incorporate the structure of the study design into the chemometric/statistical models to effectively address the research questions of interest. ANOVA-Simultaneous Component Analysis (ASCA) is one of the most prominent methods to include such information in the quantitative analysis of multivariate data, especially when the number of variables is large. This tutorial review intends to explain in a simple way how ASCA works, how it is operated and how to correctly interpret ASCA results, with approachable mathematical and visual descriptions. Two examples are given: the first, a simulated chemical reaction, serves to illustrate the ASCA steps and the second, from a real chemical ecology data set, the interpretation of results. An overview of methods closely related to ASCA is also provided, pointing out their differences and scope, to give a wide-ranging picture of the available options to build multivariate models that take experimental design into account. ASCA is a multivariate method for analysis of multi-factor data. An overview of the (mathematical) principles of ASCA is presented. Key aspects for practical application of ASCA are discussed. Detailed explanation of ASCA output in terms of score and loading plots is given. Literature review of other multivariate techniques for analysis of multi-factor data.
Collapse
Affiliation(s)
- Carlo Bertinetto
- Department of Analytical Chemistry, Institute of Molecular Materials, Radboud University, the Netherlands
| | - Jasper Engel
- Biometris, Wageningen UR, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Jeroen Jansen
- Department of Analytical Chemistry, Institute of Molecular Materials, Radboud University, the Netherlands
| |
Collapse
|
6
|
Removal of batch effects using stratified subsampling of metabolomic data for in vitro endocrine disruptors screening. Talanta 2019; 195:77-86. [DOI: 10.1016/j.talanta.2018.11.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 10/31/2018] [Accepted: 11/05/2018] [Indexed: 01/31/2023]
|
7
|
Combining ANOVA-PCA with POCHEMON to analyse micro-organism development in a polymicrobial environment. Anal Chim Acta 2017; 963:1-16. [DOI: 10.1016/j.aca.2017.01.064] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Revised: 01/26/2017] [Accepted: 01/31/2017] [Indexed: 11/23/2022]
|
8
|
Boccard J, Rudaz S. Exploring Omics data from designed experiments using analysis of variance multiblock Orthogonal Partial Least Squares. Anal Chim Acta 2016; 920:18-28. [PMID: 27114219 DOI: 10.1016/j.aca.2016.03.042] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Revised: 03/22/2016] [Accepted: 03/23/2016] [Indexed: 11/17/2022]
Abstract
Many experimental factors may have an impact on chemical or biological systems. A thorough investigation of the potential effects and interactions between the factors is made possible by rationally planning the trials using systematic procedures, i.e. design of experiments. However, assessing factors' influences remains often a challenging task when dealing with hundreds to thousands of correlated variables, whereas only a limited number of samples is available. In that context, most of the existing strategies involve the ANOVA-based partitioning of sources of variation and the separate analysis of ANOVA submatrices using multivariate methods, to account for both the intrinsic characteristics of the data and the study design. However, these approaches lack the ability to summarise the data using a single model and remain somewhat limited for detecting and interpreting subtle perturbations hidden in complex Omics datasets. In the present work, a supervised multiblock algorithm based on the Orthogonal Partial Least Squares (OPLS) framework, is proposed for the joint analysis of ANOVA submatrices. This strategy has several advantages: (i) the evaluation of a unique multiblock model accounting for all sources of variation; (ii) the computation of a robust estimator (goodness of fit) for assessing the ANOVA decomposition reliability; (iii) the investigation of an effect-to-residuals ratio to quickly evaluate the relative importance of each effect and (iv) an easy interpretation of the model with appropriate outputs. Case studies from metabolomics and transcriptomics, highlighting the ability of the method to handle Omics data obtained from fixed-effects full factorial designs, are proposed for illustration purposes. Signal variations are easily related to main effects or interaction terms, while relevant biochemical information can be derived from the models.
Collapse
Affiliation(s)
- Julien Boccard
- School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland.
| | - Serge Rudaz
- School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland
| |
Collapse
|
9
|
Engel J, Blanchet L, Bloemen B, van den Heuvel LP, Engelke UHF, Wevers RA, Buydens LMC. Regularized MANOVA (rMANOVA) in untargeted metabolomics. Anal Chim Acta 2015; 899:1-12. [PMID: 26547490 DOI: 10.1016/j.aca.2015.06.042] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Revised: 06/09/2015] [Accepted: 06/11/2015] [Indexed: 12/14/2022]
Abstract
Many advanced metabolomics experiments currently lead to data where a large number of response variables were measured while one or several factors were changed. Often the number of response variables vastly exceeds the sample size and well-established techniques such as multivariate analysis of variance (MANOVA) cannot be used to analyze the data. ANOVA simultaneous component analysis (ASCA) is an alternative to MANOVA for analysis of metabolomics data from an experimental design. In this paper, we show that ASCA assumes that none of the metabolites are correlated and that they all have the same variance. Because of these assumptions, ASCA may relate the wrong variables to a factor. This reduces the power of the method and hampers interpretation. We propose an improved model that is essentially a weighted average of the ASCA and MANOVA models. The optimal weight is determined in a data-driven fashion. Compared to ASCA, this method assumes that variables can correlate, leading to a more realistic view of the data. Compared to MANOVA, the model is also applicable when the number of samples is (much) smaller than the number of variables. These advantages are demonstrated by means of simulated and real data examples. The source code of the method is available from the first author upon request, and at the following github repository: https://github.com/JasperE/regularized-MANOVA.
Collapse
Affiliation(s)
- J Engel
- Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg 135, Nijmegen, The Netherlands; Translational Metabolic Laboratory at the Department of Laboratory Medicine, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands
| | - L Blanchet
- Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg 135, Nijmegen, The Netherlands; Department of Biochemistry, Nijmegen Centre for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands
| | - B Bloemen
- Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg 135, Nijmegen, The Netherlands
| | - L P van den Heuvel
- Translational Metabolic Laboratory at the Department of Laboratory Medicine, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands
| | - U H F Engelke
- Translational Metabolic Laboratory at the Department of Laboratory Medicine, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands
| | - R A Wevers
- Translational Metabolic Laboratory at the Department of Laboratory Medicine, Radboud University Medical Centre, Geert Grooteplein 10, Nijmegen, The Netherlands
| | - L M C Buydens
- Radboud University Nijmegen, Institute for Molecules and Materials, Heyendaalseweg 135, Nijmegen, The Netherlands.
| |
Collapse
|
10
|
Timmerman ME, Hoefsloot HCJ, Smilde AK, Ceulemans E. Scaling in ANOVA-simultaneous component analysis. Metabolomics 2015; 11:1265-1276. [PMID: 26366136 PMCID: PMC4559107 DOI: 10.1007/s11306-015-0785-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 02/05/2015] [Indexed: 12/01/2022]
Abstract
In omics research often high-dimensional data is collected according to an experimental design. Typically, the manipulations involved yield differential effects on subsets of variables. An effective approach to identify those effects is ANOVA-simultaneous component analysis (ASCA), which combines analysis of variance with principal component analysis. So far, pre-treatment in ASCA received hardly any attention, whereas its effects can be huge. In this paper, we describe various strategies for scaling, and identify a rational approach. We present the approaches in matrix algebra terms and illustrate them with an insightful simulated example. We show that scaling directly influences which data aspects are stressed in the analysis, and hence become apparent in the solution. Therefore, the cornerstone for proper scaling is to use a scaling factor that is free from the effect of interest. This implies that proper scaling depends on the effect(s) of interest, and that different types of scaling may be proper for the different effect matrices. We illustrate that different scaling approaches can greatly affect the ASCA interpretation with a real-life example from nutritional research. The principle that scaling factors should be free from the effect of interest generalizes to other statistical methods that involve scaling, as classification methods.
Collapse
Affiliation(s)
- Marieke E. Timmerman
- University of Groningen, Grote Kruisstraat 2/1, 9712TS Groningen, The Netherlands
| | - Huub C. J. Hoefsloot
- Biosystems Data Analysis, Faculty of Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Age K. Smilde
- Biosystems Data Analysis, Faculty of Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | | |
Collapse
|
11
|
High-dimensional nested analysis of variance to assess the effect of production season, quality grade and steam pasteurization on the phenolic composition of fermented rooibos herbal tea. Talanta 2013; 115:590-9. [DOI: 10.1016/j.talanta.2013.06.023] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Revised: 06/10/2013] [Accepted: 06/14/2013] [Indexed: 11/19/2022]
|
12
|
De Roover K, Timmerman ME, Mesquita B, Ceulemans E. Common and cluster-specific simultaneous component analysis. PLoS One 2013; 8:e62280. [PMID: 23667463 PMCID: PMC3648553 DOI: 10.1371/journal.pone.0062280] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 03/19/2013] [Indexed: 11/30/2022] Open
Abstract
In many fields of research, so-called 'multiblock' data are collected, i.e., data containing multivariate observations that are nested within higher-level research units (e.g., inhabitants of different countries). Each higher-level unit (e.g., country) then corresponds to a 'data block'. For such data, it may be interesting to investigate the extent to which the correlation structure of the variables differs between the data blocks. More specifically, when capturing the correlation structure by means of component analysis, one may want to explore which components are common across all data blocks and which components differ across the data blocks. This paper presents a common and cluster-specific simultaneous component method which clusters the data blocks according to their correlation structure and allows for common and cluster-specific components. Model estimation and model selection procedures are described and simulation results validate their performance. Also, the method is applied to data from cross-cultural values research to illustrate its empirical value.
Collapse
Affiliation(s)
- Kim De Roover
- Methodology of Educational Sciences Research Unit, KU Leuven, Leuven, Belgium.
| | | | | | | |
Collapse
|