1
|
Das S, West FD, Park C. Sparse multiway canonical correlation analysis for multimodal stroke recovery data. Biom J 2024; 66:e2300037. [PMID: 38368275 DOI: 10.1002/bimj.202300037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 10/09/2023] [Accepted: 10/26/2023] [Indexed: 02/19/2024]
Abstract
Conventional canonical correlation analysis (CCA) measures the association between two datasets and identifies relevant contributors. However, it encounters issues with execution and interpretation when the sample size is smaller than the number of variables or there are more than two datasets. Our motivating example is a stroke-related clinical study on pigs. The data are multimodal and consist of measurements taken at multiple time points and have many more variables than observations. This study aims to uncover important biomarkers and stroke recovery patterns based on physiological changes. To address the issues in the data, we develop two sparse CCA methods for multiple datasets. Various simulated examples are used to illustrate and contrast the performance of the proposed methods with that of the existing methods. In analyzing the pig stroke data, we apply the proposed sparse CCA methods along with dimension reduction techniques, interpret the recovery patterns, and identify influential variables in recovery.
Collapse
Affiliation(s)
- Subham Das
- Department of Statistics, University of Georgia, Athens, Georgia, USA
| | - Franklin D West
- Department of Animal & Dairy Science, University of Georgia, Athens, Georgia, USA
| | - Cheolwoo Park
- Department of Mathematical Sciences, KAIST, Daejeon, South Korea
| |
Collapse
|
2
|
Gajewicz-Skretna A, Wyrzykowska E, Gromelski M. Quantitative multi-species toxicity modeling: Does a multi-species, machine learning model provide better performance than a single-species model for the evaluation of acute aquatic toxicity by organic pollutants? THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 861:160590. [PMID: 36473653 DOI: 10.1016/j.scitotenv.2022.160590] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/25/2022] [Accepted: 11/26/2022] [Indexed: 06/17/2023]
Abstract
The toxicological profile of any chemical is defined by multiple endpoints and testing procedures, including representative test species from different trophic levels. While computer-aided methods play an increasingly important role in supporting ecotoxicology research and chemical hazard assessment, most of the recently developed machine learning models are directed towards a single, specific endpoint. To overcome this limitation and accelerate the process of identifying potentially hazardous environmental pollutants, we are introducing an effective approach for quantitative, multi-species modeling. The proposed approach is based on canonical correlation analysis that finds a pair(s) of uncorrelated, linear combinations of the original variables that best defines the overall variability within and between multiple biological responses and predictor variables. Its effectiveness was confirmed by the machine learning model for estimating acute toxicity of diverse organic pollutants in aquatic species from three trophic levels: algae (Pseudokirchneriella subcapitata), daphnia (Daphnia magna), and fish (Oryzias latipes). The multi-species model achieved a favorable predictive performance that were in line with predictive models derived for the aquatic organisms individually. The chemical bioavailability and reactivity parameters (n-octanol/water partition coefficient, chemical potential, and molecular size and volume) were important to accurately predict acute ecotoxicity to the three aquatic organisms. To facilitate the use of this approach, an open-source, Python-based script, named qMTM (quantitative Multi-species Toxicity Modeling) has been provided.
Collapse
Affiliation(s)
- Agnieszka Gajewicz-Skretna
- Laboratory of Environmental Chemoinformatics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland.
| | - Ewelina Wyrzykowska
- Laboratory of Environmental Chemoinformatics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Maciej Gromelski
- Laboratory of Environmental Chemoinformatics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| |
Collapse
|
3
|
Sheng J, Wang L, Cheng H, Zhang Q, Zhou R, Shi Y. Strategies for multivariate analyses of imaging genetics study in Alzheimer's disease. Neurosci Lett 2021; 762:136147. [PMID: 34332030 DOI: 10.1016/j.neulet.2021.136147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 03/27/2021] [Accepted: 07/26/2021] [Indexed: 11/16/2022]
Abstract
Alzheimer's disease (AD) is an incurable neurodegenerative disease primarily affecting the elderly population. Early diagnosis of AD is critical for the management of this disease. Imaging genetics examines the influence of genetic variants (i.e., single nucleotide polymorphisms (SNPs)) on brain structure and function and many novel approaches of imaging genetics are proposed for studying AD. We review and synthesize the Alzheimer's Disease Neuroimaging Initiative (ADNI) genetic associations with quantitative disease endophenotypes including structural and functional neuroimaging, diffusion tensor imaging (DTI), positron emission tomography (PET), and fluid biomarker assays. In this review, we survey recent publications using neuroimaging and genetic data of AD, with a focus on methods capturing multivariate effects accommodating the large number variables from both imaging data and genetic data. We review methods focused on bridging the imaging and genetic data by establishing genotype-phenotype association, including sparse canonical correlation analysis, parallel independent component analysis, sparse reduced rank regression, sparse partial least squares, genome-wide association study, and so on. The broad availability and wide scope of ADNI genetic and phenotypic data has advanced our understanding of the genetic basis of AD and has nominated novel targets for future pharmaceutical therapy and biomarker development.
Collapse
Affiliation(s)
- Jinhua Sheng
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China.
| | - Luyun Wang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China; College of Information Engineering, Hangzhou Vocational & Technical College, Hangzhou, Zhejiang 310018, China
| | - Hu Cheng
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, USA
| | | | - Rougang Zhou
- Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China; School of Mechanical Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Mstar Technologies Inc., Hangzhou, Zhejiang 310018, China
| | - Yuchen Shi
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China
| |
Collapse
|
4
|
Poythress JC, Park C, Ahn J. Dimension-wise sparse low-rank approximation of a matrix with application to variable selection in high-dimensional integrative analyzes of association. J Appl Stat 2021; 49:3889-3907. [DOI: 10.1080/02664763.2021.1967892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- J. C. Poythress
- Department of Mathematics and Statistics, University of New Hampshire, Durham, NH, USA
| | - Cheolwoo Park
- Department of Mathematical Sciences, KAIST, Daejeon, The Republic of Korea
| | - Jeongyoun Ahn
- Department of Industrial and Systems Engineering, KAIST, Daejeon, The Republic of Korea
| |
Collapse
|
5
|
Sparse estimations in kink regression model. Soft comput 2021. [DOI: 10.1007/s00500-021-05797-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
6
|
Zhong Y, Chalise P, He J. Nested cross-validation with ensemble feature selection and classification model for high-dimensional biological data. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2020.1850790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Yi Zhong
- Department of Biostatistics and Data Science, University of Kansas Medical Center, Kansas City, KS, USA
| | - Prabhakar Chalise
- Department of Biostatistics and Data Science, University of Kansas Medical Center, Kansas City, KS, USA
| | - Jianghua He
- Department of Biostatistics and Data Science, University of Kansas Medical Center, Kansas City, KS, USA
| |
Collapse
|
7
|
Rodosthenous T, Shahrezaei V, Evangelou M. Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study. Bioinformatics 2020; 36:4616-4625. [PMID: 32437529 PMCID: PMC7750936 DOI: 10.1093/bioinformatics/btaa530] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 04/22/2020] [Accepted: 05/16/2020] [Indexed: 01/08/2023] Open
Abstract
Motivation Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p≫n) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. Results Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. Availability and implementation https://github.com/theorod93/sCCA. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Vahid Shahrezaei
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| | - Marina Evangelou
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
8
|
Safo SE, Ahn J, Jeon Y, Jung S. Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data. Biometrics 2018; 74:1362-1371. [DOI: 10.1111/biom.12886] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 03/01/2018] [Accepted: 03/01/2018] [Indexed: 11/29/2022]
Affiliation(s)
- Sandra E. Safo
- Division of BiostatisticsUniversity of MinnesotaMinneapolisMinnesotaU.S.A
| | - Jeongyoun Ahn
- Department of StatisticsUniversity of GeorgiaAthensGeorgiaU.S.A
| | - Yongho Jeon
- Department of Applied StatisticsYonsei UniversitySeoulSouth Korea
| | - Sungkyu Jung
- Department of StatisticsUniversity of PittsburghPittsburghPennsylvaniaU.S.A
| |
Collapse
|
9
|
Safo SE, Li S, Long Q. Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information. Biometrics 2018; 74:300-312. [PMID: 28482123 PMCID: PMC5677597 DOI: 10.1111/biom.12715] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 03/01/2017] [Accepted: 04/01/2017] [Indexed: 01/09/2023]
Abstract
Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach.
Collapse
Affiliation(s)
- Sandra E Safo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, U.S.A
| | - Shuzhao Li
- Department of Medicine, Division of Pulmonary, Allergy and Critical Care Medicine, Emory University, Atlanta, Georgia, U.S.A
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, U.S.A
| |
Collapse
|
10
|
Szefer E, Lu D, Nathoo F, Beg MF, Graham J. Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation. Stat Appl Genet Mol Biol 2017; 16:349-365. [PMID: 29091582 PMCID: PMC9008768 DOI: 10.1515/sagmb-2016-0077] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
AbstractUsing publicly-available data from the Alzheimer’s Disease Neuroimaging Initiative, we investigate the joint association between single-nucleotide polymorphisms (SNPs) in previously established linkage regions for Alzheimer’s disease (AD) and rates of decline in brain structure. In an initial, discovery stage of analysis, we applied a weighted
Collapse
Affiliation(s)
- Elena Szefer
- Department of Statistics and Actuarial Science, Simon Fraser University, 8888 University Dr, Burnaby, BC V5A 1S6, Canada
| | - Donghuan Lu
- School of Engineering Science, Simon Fraser University, 8888 University Dr, Burnaby, BC V5A 1S6, Canada
| | - Farouk Nathoo
- Department of Mathematics and Statistics, University of Victoria, PO Box 1700 STN CSC Victoria, BC V8W 2Y2, Canada
| | - Mirza Faisal Beg
- School of Engineering Science, Simon Fraser University, 8888 University Dr, Burnaby, BC V5A 1S6, Canada
| | - Jinko Graham
- Corresponding author: Jinko Graham, Department of Statistics and Actuarial Science, Simon Fraser University, 8888 University Dr, Burnaby, BC V5A 1S6, Canada,
| | | |
Collapse
|
11
|
Robust sparse canonical correlation analysis. BMC SYSTEMS BIOLOGY 2016; 10:72. [PMID: 27516087 PMCID: PMC4982144 DOI: 10.1186/s12918-016-0317-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Accepted: 07/11/2016] [Indexed: 11/15/2022]
Abstract
Background Canonical correlation analysis (CCA) is a multivariate statistical method which describes the associations between two sets of variables. The objective is to find linear combinations of the variables in each data set having maximal correlation. In genomics, CCA has become increasingly important to estimate the associations between gene expression data and DNA copy number change data. The identification of such associations might help to increase our understanding of the development of diseases such as cancer. However, these data sets are typically high-dimensional, containing a lot of variables relative to the number of objects. Moreover, the data sets might contain atypical observations since it is likely that objects react differently to treatments. We discuss a method for Robust Sparse CCA, thereby providing a solution to both issues. Sparse estimation produces canonical vectors with some of their elements estimated as exactly zero. As such, their interpretability is improved. Robust methods can cope with atypical observations in the data. Results We illustrate the good performance of the Robust Sparse CCA method by several simulation studies and three biometric examples. Robust Sparse CCA considerably outperforms its main alternatives in (1) correctly detecting the main associations between the data sets, in (2) accurately estimating these associations, and in (3) detecting outliers. Conclusions Robust Sparse CCA delivers interpretable canonical vectors, while at the same time coping with outlying observations. The proposed method is able to describe the associations between high-dimensional data sets, which are nowadays commonplace in genomics. Furthermore, the Robust Sparse CCA method allows to characterize outliers. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0317-9) contains supplementary material, which is available to authorized users.
Collapse
|
12
|
Wilms I, Croux C. Sparse canonical correlation analysis from a predictive point of view. Biom J 2015; 57:834-51. [DOI: 10.1002/bimj.201400226] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 04/02/2015] [Accepted: 04/21/2015] [Indexed: 11/05/2022]
Affiliation(s)
- Ines Wilms
- Leuven Statistics Research Centre (LStat); KU Leuven; Naamsestraat 69 3000 Leuven Belgium
| | - Christophe Croux
- Leuven Statistics Research Centre (LStat); KU Leuven; Naamsestraat 69 3000 Leuven Belgium
| |
Collapse
|
13
|
Holzinger ER, Dudek SM, Frase AT, Pendergrass SA, Ritchie MD. ATHENA: the analysis tool for heritable and environmental network associations. Bioinformatics 2014; 30:698-705. [PMID: 24149050 PMCID: PMC3933870 DOI: 10.1093/bioinformatics/btt572] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Revised: 09/03/2013] [Accepted: 09/26/2013] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Advancements in high-throughput technology have allowed researchers to examine the genetic etiology of complex human traits in a robust fashion. Although genome-wide association studies have identified many novel variants associated with hundreds of traits, a large proportion of the estimated trait heritability remains unexplained. One hypothesis is that the commonly used statistical techniques and study designs are not robust to the complex etiology that may underlie these human traits. This etiology could include non-linear gene × gene or gene × environment interactions. Additionally, other levels of biological regulation may play a large role in trait variability. RESULTS To address the need for computational tools that can explore enormous datasets to detect complex susceptibility models, we have developed a software package called the Analysis Tool for Heritable and Environmental Network Associations (ATHENA). ATHENA combines various variable filtering methods with machine learning techniques to analyze high-throughput categorical (i.e. single nucleotide polymorphisms) and quantitative (i.e. gene expression levels) predictor variables to generate multivariable models that predict either a categorical (i.e. disease status) or quantitative (i.e. cholesterol levels) outcomes. The goal of this article is to demonstrate the utility of ATHENA using simulated and biological datasets that consist of both single nucleotide polymorphisms and gene expression variables to identify complex prediction models. Importantly, this method is flexible and can be expanded to include other types of high-throughput data (i.e. RNA-seq data and biomarker measurements). AVAILABILITY ATHENA is freely available for download. The software, user manual and tutorial can be downloaded from http://ritchielab.psu.edu/ritchielab/software.
Collapse
Affiliation(s)
- Emily R Holzinger
- Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD, USA and Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Pennsylvania State University, University Park, PA, USA
| | | | | | | | | |
Collapse
|
14
|
Lin D, Zhang J, Li J, Calhoun VD, Deng HW, Wang YP. Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics 2013; 14:245. [PMID: 23937249 PMCID: PMC3751310 DOI: 10.1186/1471-2105-14-245] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 08/08/2013] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group). RESULTS We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features. CONCLUSIONS The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature selection simultaneously. It outperforms the two sCCA methods (CCA-l1 and CCA-group) by identifying the correlated features with more true positives while controlling total discordance at a lower level on the simulated data, even if the group effect does not exist or there are irrelevant features grouped with true correlated features. Compared with our proposed CCA-group sparse models, CCA-l1 tends to select less true correlated features while CCA-group inclines to select more redundant features.
Collapse
Affiliation(s)
- Dongdong Lin
- Biomedical Engineering Department, Tulane University, New Orleans, LA, USA
- Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
| | - Jigang Zhang
- Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, USA
| | - Jingyao Li
- Biomedical Engineering Department, Tulane University, New Orleans, LA, USA
- Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
| | - Vince D Calhoun
- The Mind Research Network, Albuquerque, NM, 87131, USA
- Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM, 87131, USA
| | - Hong-Wen Deng
- Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, USA
| | - Yu-Ping Wang
- Biomedical Engineering Department, Tulane University, New Orleans, LA, USA
- Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, USA
| |
Collapse
|
15
|
Jafarpour A, Barnes G, Fuentemilla L, Duzel E, Penny WD. Population level inference for multivariate MEG analysis. PLoS One 2013; 8:e71305. [PMID: 23940738 PMCID: PMC3734032 DOI: 10.1371/journal.pone.0071305] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 06/25/2013] [Indexed: 11/19/2022] Open
Abstract
Multivariate analysis is a very general and powerful technique for analysing Magnetoencephalography (MEG) data. An outstanding problem however is how to make inferences that are consistent over a group of subjects as to whether there are condition-specific differences in data features, and what are those features that maximise these differences. Here we propose a solution based on Canonical Variates Analysis (CVA) model scoring at the subject level and random effects Bayesian model selection at the group level. We apply this approach to beamformer reconstructed MEG data in source space. CVA estimates those multivariate patterns of activation that correlate most highly with the experimental design; the order of a CVA model is then determined by the number of significant canonical vectors. Random effects Bayesian model comparison then provides machinery for inferring the optimal order over the group of subjects. Absence of a multivariate dependence is indicated by the null model being the most likely. This approach can also be applied to CVA models with a fixed number of canonical vectors but supplied with different feature sets. We illustrate the method by identifying feature sets based on variable-dimension MEG power spectra in the primary visual cortex and fusiform gyrus that are maximally discriminative of data epochs before versus after visual stimulation.
Collapse
Affiliation(s)
- Anna Jafarpour
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom.
| | | | | | | | | |
Collapse
|