1
|
Wu Q, Zhang Y, Huang X, Ma T, Hong LE, Kochunov P, Chen S. A multivariate to multivariate approach for voxel-wise genome-wide association analysis. Stat Med 2024. [PMID: 38922949 DOI: 10.1002/sim.10101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 03/02/2024] [Accepted: 04/24/2024] [Indexed: 06/28/2024]
Abstract
The joint analysis of imaging-genetics data facilitates the systematic investigation of genetic effects on brain structures and functions with spatial specificity. We focus on voxel-wise genome-wide association analysis, which may involve trillions of single nucleotide polymorphism (SNP)-voxel pairs. We attempt to identify underlying organized association patterns of SNP-voxel pairs and understand the polygenic and pleiotropic networks on brain imaging traits. We propose a bi-clique graph structure (ie, a set of SNPs highly correlated with a cluster of voxels) for the systematic association pattern. Next, we develop computational strategies to detect latent SNP-voxel bi-cliques and an inference model for statistical testing. We further provide theoretical results to guarantee the accuracy of our computational algorithms and statistical inference. We validate our method by extensive simulation studies, and then apply it to the whole genome genetic and voxel-level white matter integrity data collected from 1052 participants of the human connectome project. The results demonstrate multiple genetic loci influencing white matter integrity measures on splenium and genu of the corpus callosum.
Collapse
Affiliation(s)
- Qiong Wu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yuan Zhang
- Department of Statistics, Ohio State University, Columbus, Ohio, USA
| | - Xiaoqi Huang
- Department of Mathematics, Louisiana State University, Baton Rouge, Louisiana, USA
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, Maryland, USA
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - L Elliot Hong
- Faillace Department of Psychiatry and Behavioral Sciences at McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Peter Kochunov
- Faillace Department of Psychiatry and Behavioral Sciences at McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Shuo Chen
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, USA
- Faillace Department of Psychiatry and Behavioral Sciences at McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
- Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, University of Maryland, Baltimore, Maryland, USA
- The University of Maryland Institute for Health Computing, University of Maryland, North Bethesda, USA
| |
Collapse
|
2
|
Cheek CL, Lindner P, Grigorenko EL. Statistical and Machine Learning Analysis in Brain-Imaging Genetics: A Review of Methods. Behav Genet 2024; 54:233-251. [PMID: 38336922 DOI: 10.1007/s10519-024-10177-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 01/24/2024] [Indexed: 02/12/2024]
Abstract
Brain-imaging-genetic analysis is an emerging field of research that aims at aggregating data from neuroimaging modalities, which characterize brain structure or function, and genetic data, which capture the structure and function of the genome, to explain or predict normal (or abnormal) brain performance. Brain-imaging-genetic studies offer great potential for understanding complex brain-related diseases/disorders of genetic etiology. Still, a combined brain-wide genome-wide analysis is difficult to perform as typical datasets fuse multiple modalities, each with high dimensionality, unique correlational landscapes, and often low statistical signal-to-noise ratios. In this review, we outline the progress in brain-imaging-genetic methodologies starting from early massive univariate to current deep learning approaches, highlighting each approach's strengths and weaknesses and elongating it with the field's development. We conclude by discussing selected remaining challenges and prospects for the field.
Collapse
Affiliation(s)
- Connor L Cheek
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA.
- Department of Physics, University of Houston, Houston, TX, USA.
| | - Peggy Lindner
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA
- Department of Information Science Technology, University of Houston, Houston, TX, USA
| | - Elena L Grigorenko
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA
- Department of Psychology, University of Houston, Houston, TX, USA
- Baylor College of Medicine, Houston, TX, USA
- Sirius University of Science and Technology, Sochi, Russia
| |
Collapse
|
3
|
Tan X, Wang W, Zeng D, Liu GF, Diao G, Jafari N, Alt EM, Ibrahim JG. Safety signal detection with control of latent factors. Stat Med 2024; 43:1397-1418. [PMID: 38297431 DOI: 10.1002/sim.10015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 10/26/2023] [Accepted: 12/27/2023] [Indexed: 02/02/2024]
Abstract
Postmarket drug safety database like vaccine adverse event reporting system (VAERS) collect thousands of spontaneous reports annually, with each report recording occurrences of any adverse events (AEs) and use of vaccines. We hope to identify signal vaccine-AE pairs, for which certain vaccines are statistically associated with certain adverse events (AE), using such data. Thus, the outcomes of interest are multiple AEs, which are binary outcomes and could be correlated because they might share certain latent factors; and the primary covariates are vaccines. Appropriately accounting for the complex correlation among AEs could improve the sensitivity and specificity of identifying signal vaccine-AE pairs. We propose a two-step approach in which we first estimate the shared latent factors among AEs using a working multivariate logistic regression model, and then use univariate logistic regression model to examine the vaccine-AE associations after controlling for the latent factors. Our simulation studies show that this approach outperforms current approaches in terms of sensitivity and specificity. We apply our approach in analyzing VAERS data and report our findings.
Collapse
Affiliation(s)
- Xianming Tan
- Department of Biostatistics at Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - William Wang
- Merck and Co., Inc., North Wales, Pennsylvania, USA
| | - Donglin Zeng
- Department of Biostatistics at Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Guoqing Diao
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| | | | - Ethan M Alt
- Department of Biostatistics at Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Joseph G Ibrahim
- Department of Biostatistics at Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
4
|
Jin Z, Kang J, Yu T. Bayesian nonparametric method for genetic dissection of brain activation region. Front Neurosci 2023; 17:1235321. [PMID: 37920300 PMCID: PMC10618557 DOI: 10.3389/fnins.2023.1235321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/26/2023] [Indexed: 11/04/2023] Open
Abstract
Biological evidence indicewates that the brain atrophy can be involved at the onset of neuropathological pathways of Alzheimer's disease. However, there is lack of formal statistical methods to perform genetic dissection of brain activation phenotypes such as shape and intensity. To this end, we propose a Bayesian hierarchical model which consists of two levels of hierarchy. At level 1, we develop a Bayesian nonparametric level set (BNLS) model for studying the brain activation region shape. At level 2, we construct a regression model to select genetic variants that are strongly associated with the brain activation intensity, where a spike-and-slab prior and a Gaussian prior are chosen for feature selection. We develop efficient posterior computation algorithms based on the Markov chain Monte Carlo (MCMC) method. We demonstrate the advantages of the proposed method via extensive simulation studies and analyses of imaging genetics data in the Alzheimer's disease neuroimaging initiative (ADNI) study.
Collapse
Affiliation(s)
- Zhuxuan Jin
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
| | - Tianwei Yu
- School of Data Science, Chinese University of Hong Kong - Shenzhen, Shenzhen, China
- Guangdong Provincial Key Laboratory of Big Data Computing, Shenzhen, China
| |
Collapse
|
5
|
Wang Z, Bai Y, Härdle WK, Tian M. Smoothed quantile regression for partially functional linear models in high dimensions. Biom J 2023; 65:e2200060. [PMID: 37147793 DOI: 10.1002/bimj.202200060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Revised: 11/21/2022] [Accepted: 12/11/2022] [Indexed: 05/07/2023]
Abstract
Practitioners of current data analysis are regularly confronted with the situation where the heavy-tailed skewed response is related to both multiple functional predictors and high-dimensional scalar covariates. We propose a new class of partially functional penalized convolution-type smoothed quantile regression to characterize the conditional quantile level between a scalar response and predictors of both functional and scalar types. The new approach overcomes the lack of smoothness and severe convexity of the standard quantile empirical loss, considerably improving the computing efficiency of partially functional quantile regression. We investigate a folded concave penalized estimator for simultaneous variable selection and estimation by the modified local adaptive majorize-minimization (LAMM) algorithm. The functional predictors can be dense or sparse and are approximated by the principal component basis. Under mild conditions, the consistency and oracle properties of the resulting estimators are established. Simulation studies demonstrate a competitive performance against the partially functional standard penalized quantile regression. A real application using Alzheimer's Disease Neuroimaging Initiative data is utilized to illustrate the practicality of the proposed model.
Collapse
Affiliation(s)
- Zhihao Wang
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, P. R. China
- School of Statistics and Data Science, Xinjiang University of Finance and Economics, Urumqi, P. R. China
| | - Yongxin Bai
- School of Science, Beijing Information Science and Technology University, Beijing, P. R. China
| | - Wolfgang K Härdle
- School of Business and Economics, Humboldt-Universität Zu Berlin, Berlin, Germany
- Department of Information Management and Finance, National Yang Ming Chiao Tung University (NYCU), Hsinchu City, Taiwan
| | - Maozai Tian
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, P. R. China
- School of Statistics and Data Science, Xinjiang University of Finance and Economics, Urumqi, P. R. China
| |
Collapse
|
6
|
Beaulac C, Wu S, Gibson E, Miranda MF, Cao J, Rocha L, Beg MF, Nathoo FS. Neuroimaging feature extraction using a neural network classifier for imaging genetics. BMC Bioinformatics 2023; 24:271. [PMID: 37391692 DOI: 10.1186/s12859-023-05394-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 06/21/2023] [Indexed: 07/02/2023] Open
Abstract
BACKGROUND Dealing with the high dimension of both neuroimaging data and genetic data is a difficult problem in the association of genetic data to neuroimaging. In this article, we tackle the latter problem with an eye toward developing solutions that are relevant for disease prediction. Supported by a vast literature on the predictive power of neural networks, our proposed solution uses neural networks to extract from neuroimaging data features that are relevant for predicting Alzheimer's Disease (AD) for subsequent relation to genetics. The neuroimaging-genetic pipeline we propose is comprised of image processing, neuroimaging feature extraction and genetic association steps. We present a neural network classifier for extracting neuroimaging features that are related with the disease. The proposed method is data-driven and requires no expert advice or a priori selection of regions of interest. We further propose a multivariate regression with priors specified in the Bayesian framework that allows for group sparsity at multiple levels including SNPs and genes. RESULTS We find the features extracted with our proposed method are better predictors of AD than features used previously in the literature suggesting that single nucleotide polymorphisms (SNPs) related to the features extracted by our proposed method are also more relevant for AD. Our neuroimaging-genetic pipeline lead to the identification of some overlapping and more importantly some different SNPs when compared to those identified with previously used features. CONCLUSIONS The pipeline we propose combines machine learning and statistical methods to benefit from the strong predictive performance of blackbox models to extract relevant features while preserving the interpretation provided by Bayesian models for genetic association. Finally, we argue in favour of using automatic feature extraction, such as the method we propose, in addition to ROI or voxelwise analysis to find potentially novel disease-relevant SNPs that may not be detected when using ROIs or voxels alone.
Collapse
Affiliation(s)
- Cédric Beaulac
- School of Engineering Science, Simon Fraser University, Burnaby, Canada.
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada.
| | - Sidi Wu
- Department of Statistics and Actuarial Sciences, Simon Fraser University, Burnaby, Canada
| | - Erin Gibson
- School of Engineering Science, Simon Fraser University, Burnaby, Canada
| | - Michelle F Miranda
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
| | - Jiguo Cao
- Department of Statistics and Actuarial Sciences, Simon Fraser University, Burnaby, Canada
| | - Leno Rocha
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
| | - Mirza Faisal Beg
- School of Engineering Science, Simon Fraser University, Burnaby, Canada
| | - Farouk S Nathoo
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
| |
Collapse
|
7
|
Xin Y, Sheng J, Miao M, Wang L, Yang Z, Huang H. A review ofimaging genetics in Alzheimer's disease. J Clin Neurosci 2022; 100:155-163. [PMID: 35487021 DOI: 10.1016/j.jocn.2022.04.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Revised: 03/01/2022] [Accepted: 04/15/2022] [Indexed: 01/18/2023]
Abstract
Determining the association between genetic variation and phenotype is a key step to study the mechanism of Alzheimer's disease (AD), laying the foundation for studying drug therapies and biomarkers. AD is the most common type of dementia in the aged population. At present, three early-onset AD genes (APP, PSEN1, PSEN2) and one late-onset AD susceptibility gene apolipoprotein E (APOE) have been determined. However, the pathogenesis of AD remains unknown. Imaging genetics, an emerging interdisciplinary field, is able to reveal the complex mechanisms from the genetic level to human cognition and mental disorders via macroscopic intermediates. This paper reviews methods of establishing genotype-phenotype to explore correlations, including sparse canonical correlation analysis, sparse reduced rank regression, sparse partial least squares and so on. We found that most research work did poorly in supervised learning and exploring the nonlinear relationship between SNP-QT.
Collapse
Affiliation(s)
- Yu Xin
- College of Computer Science, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China
| | - Jinhua Sheng
- College of Computer Science, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China.
| | - Miao Miao
- Beijing Hospital, Beijing 100730, China; National Center of Gerontology, Beijing 100730, China; Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing 100730, China
| | - Luyun Wang
- College of Computer Science, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China; Hangzhou Vocational & Technical College, Hangzhou, Zhejiang 310018, China
| | - Ze Yang
- College of Computer Science, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China
| | - He Huang
- College of Computer Science, Hangzhou Dianzi University, Hangzhou, Zhejiang 310018, China; Key Laboratory of Intelligent Image Analysis for Sensory and Cognitive Health, Ministry of Industry and Information Technology of China, Hangzhou, Zhejiang 310018, China
| |
Collapse
|
8
|
Dai X, Li L. Orthogonalized Kernel Debiased Machine Learning for Multimodal Data Analysis. J Am Stat Assoc 2022; 118:1796-1810. [PMID: 37771509 PMCID: PMC10530774 DOI: 10.1080/01621459.2021.2013851] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 11/23/2021] [Indexed: 10/19/2022]
Abstract
Multimodal imaging has transformed neuroscience research. While it presents unprecedented opportunities, it also imposes serious challenges. Particularly, it is difficult to combine the merits of the interpretability attributed to a simple association model with the flexibility achieved by a highly adaptive nonlinear model. In this article, we propose an orthogonalized kernel debiased machine learning approach, which is built upon the Neyman orthogonality and a form of decomposition orthogonality, for multimodal data analysis. We target the setting that naturally arises in almost all multimodal studies, where there is a primary modality of interest, plus additional auxiliary modalities. We establish the root-N-consistency and asymptotic normality of the estimated primary parameter, the semi-parametric estimation efficiency, and the asymptotic validity of the confidence band of the predicted primary modality effect. Our proposal enjoys, to a good extent, both model interpretability and model flexibility. It is also considerably different from the existing statistical methods for multimodal data integration, as well as the orthogonality-based methods for high-dimensional inferences. We demonstrate the efficacy of our method through both simulations and an application to a multimodal neuroimaging study of Alzheimer's disease.
Collapse
Affiliation(s)
| | - Lexin Li
- University of California at Berkeley
| |
Collapse
|
9
|
Vilor-Tejedor N, Garrido-Martín D, Rodriguez-Fernandez B, Lamballais S, Guigó R, Gispert JD. Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO! Comput Struct Biotechnol J 2021; 19:5800-5810. [PMID: 34765095 PMCID: PMC8567328 DOI: 10.1016/j.csbj.2021.10.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 10/08/2021] [Accepted: 10/12/2021] [Indexed: 12/01/2022] Open
Abstract
Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neuroimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimaging data.
Collapse
Affiliation(s)
- Natalia Vilor-Tejedor
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Diego Garrido-Martín
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
| | | | - Sander Lamballais
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan Domingo Gispert
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
- IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
- Centro de Investigación Biomédica en Red Bioingeniería, Biomateriales y Nanomedicina, Madrid, Spain
| |
Collapse
|
10
|
Wen C, Ba H, Pan W, Huang M. Co-sparse reduced-rank regression for association analysis between imaging phenotypes and genetic variants. Bioinformatics 2021; 36:5214-5222. [PMID: 32683450 DOI: 10.1093/bioinformatics/btaa650] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 05/22/2020] [Accepted: 07/14/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The association analysis between genetic variants and imaging phenotypes must be carried out to understand the inherited neuropsychiatric disorders via imaging genetic studies. Given the high dimensionality in imaging and genetic data, traditional methods based on massive univariate regression entail large computational cost and disregard many-to-many correlations between phenotypes and genetic variants. Several multivariate imaging genetic methods have been proposed to alleviate the above problems. However, most of these methods are based on the l1 penalty, which might cause the over-selection of variables and thus mislead scientists in analyzing data from the field of neuroimaging genetics. RESULTS To address these challenges in both statistics and computation, we propose a novel co-sparse reduced-rank regression model that identifies complex correlations in a dimensional reduction manner. We developed an iterative algorithm based on a group primal dual-active set formulation to detect simultaneously important genetic variants and imaging phenotypes efficiently and precisely via non-convex penalty. The simulation studies showed that our method achieved accurate and stable performance in parameter estimation and variable selection. In real application, the proposed approach successfully detected several novel Alzheimer's disease-related genetic variants and regions of interest, which indicate that our method may be a valuable statistical toolbox for imaging genetic studies. AVAILABILITY AND IMPLEMENTATION The R package csrrr, and the code for experiments in this article is available in Github: https://github.com/hailongba/csrrr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Canhong Wen
- International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
| | - Hailong Ba
- International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
| | - Wenliang Pan
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou 510275, China
| | - Meiyan Huang
- School of Biomedical Engineering, Guangzhou 510515, China.,Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China
| | | |
Collapse
|
11
|
Song Y, Ge S, Cao J, Wang L, Nathoo FS. A Bayesian spatial model for imaging genetics. Biometrics 2021; 78:742-753. [PMID: 33765325 DOI: 10.1111/biom.13460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/08/2021] [Accepted: 02/24/2021] [Indexed: 11/29/2022]
Abstract
We develop a Bayesian bivariate spatial model for multivariate regression analysis applicable to studies examining the influence of genetic variation on brain structure. Our model is motivated by an imaging genetics study of the Alzheimer's Disease Neuroimaging Initiative (ADNI), where the objective is to examine the association between images of volumetric and cortical thickness values summarizing the structure of the brain as measured by magnetic resonance imaging (MRI) and a set of 486 single nucleotide polymorphism (SNPs) from 33 Alzheimer's disease (AD) candidate genes obtained from 632 subjects. A bivariate spatial process model is developed to accommodate the correlation structures typically seen in structural brain imaging data. First, we allow for spatial correlation on a graph structure in the imaging phenotypes obtained from a neighborhood matrix for measures on the same hemisphere of the brain. Second, we allow for correlation in the same measures obtained from different hemispheres (left/right) of the brain. We develop a mean-field variational Bayes algorithm and a Gibbs sampling algorithm to fit the model. We also incorporate Bayesian false discovery rate (FDR) procedures to select SNPs. We implement the methodology in a new release of the R package bgsmtr. We show that the new spatial model demonstrates superior performance over a standard model in our application. Data used in the preparation of this article were obtained from the ADNI database (https://adni.loni.usc.edu).
Collapse
Affiliation(s)
- Yin Song
- Department of Mathematics and Statistics, University of Victoria, British Columbia, Canada
| | - Shufei Ge
- Institute of Mathematical Sciences, ShanghaiTech University, Shanghai, China
| | - Jiguo Cao
- Statistics and Actuarial Science, Simon Fraser University, British Columbia, Canada
| | - Liangliang Wang
- Statistics and Actuarial Science, Simon Fraser University, British Columbia, Canada
| | - Farouk S Nathoo
- Department of Mathematics and Statistics, University of Victoria, British Columbia, Canada
| |
Collapse
|
12
|
Ding X, Yu D, Zhang Z, Kong D. Multivariate functional response low‐rank regression with an application to brain imaging data. CAN J STAT 2021. [DOI: 10.1002/cjs.11604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Xiucai Ding
- Department of Statistics University of California, Davis Davis CA 95616 U.S.A
| | - Dengdeng Yu
- Department of Statistical Sciences University of Toronto Toronto CA M5G 1X6 Canada
| | - Zhengwu Zhang
- Department of Biostatistics and Computational Biology University of Rochester Rochester NY 14642 U.S.A
| | - Dehan Kong
- Department of Statistical Sciences University of Toronto Toronto CA M5G 1X6 Canada
| |
Collapse
|
13
|
Wang X, Song X, Zhu H. Bayesian latent factor on image regression with nonignorable missing data. Stat Med 2020; 40:920-932. [PMID: 33169396 DOI: 10.1002/sim.8810] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 09/16/2020] [Accepted: 10/29/2020] [Indexed: 11/06/2022]
Abstract
Medical imaging data have been widely used in modern health care, particularly in the prognosis, screening, diagnosis, and treatment of various diseases. In this study, we consider a latent factor-on-image (LoI) regression model that regresses a latent factor on ultrahigh dimensional imaging covariates. The latent factor is characterized by multiple manifest variables through a factor analysis model, while the manifest variables are subject to nonignorable missingness. We propose a two-stage approach for statistical inference. At the first stage, an efficient functional principal component analysis method is applied to reduce the dimension and extract useful features/eigenimages. At the second stage, a factor analysis mode is proposed to characterize the latent response variable. Moreover, an LoI model is used to detect influential risk factors, and an exponential tiling model applied to accommodate nonignoreable nonresponses. A fully Bayesian method with an adjust spike-and-slab absolute shrinkage and selection operator (lasso) procedure is developed for the estimation and selection of influential features/eigenimages. Simulation studies show the proposed method exhibits satisfactory performance. The proposed methodology is applied to a study on the Alzheimer's Disease Neuroimaging Initiative data set.
Collapse
Affiliation(s)
- Xiaoqing Wang
- Department of Statistics, The Chinese University of Hong Kong, Sha Tin, Hong Kong
| | - Xinyuan Song
- Department of Statistics, The Chinese University of Hong Kong, Sha Tin, Hong Kong
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
14
|
Yang D, Goh G, Wang H. A fully Bayesian approach to sparse reduced-rank multivariate regression. STAT MODEL 2020. [DOI: 10.1177/1471082x20948697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In the context of high-dimensional multivariate linear regression, sparse reduced-rank regression (SRRR) provides a way to handle both variable selection and low-rank estimation problems. Although there has been extensive research on SRRR, statistical inference procedures that deal with the uncertainty due to variable selection and rank reduction are still limited. To fill this research gap, we develop a fully Bayesian approach to SRRR. A major difficulty that occurs in a fully Bayesian framework is that the dimension of parameter space varies with the selected variables and the reduced-rank. Due to the varying-dimensional problems, traditional Markov chain Monte Carlo (MCMC) methods such as Gibbs sampler and Metropolis-Hastings algorithm are inapplicable in our Bayesian framework. To address this issue, we propose a new posterior computation procedure based on the Laplace approximation within the collapsed Gibbs sampler. A key feature of our fully Bayesian method is that the model uncertainty is automatically integrated out by the proposed MCMC computation. The proposed method is examined via simulation study and real data analysis.
Collapse
Affiliation(s)
- Dunfu Yang
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Gyuhyeong Goh
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Haiyan Wang
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| |
Collapse
|
15
|
Nie Y, Opoku E, Yasmin L, Song Y, Wang J, Wu S, Scarapicchia V, Gawryluk J, Wang L, Cao J, Nathoo FS. Spectral dynamic causal modelling of resting-state fMRI: an exploratory study relating effective brain connectivity in the default mode network to genetics. Stat Appl Genet Mol Biol 2020; 19:/j/sagmb.ahead-of-print/sagmb-2019-0058/sagmb-2019-0058.xml. [PMID: 32866136 DOI: 10.1515/sagmb-2019-0058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Accepted: 07/27/2020] [Indexed: 11/15/2022]
Abstract
We conduct an imaging genetics study to explore how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer's disease and mild cognitive impairment. We develop an analysis of longitudinal resting-state functional magnetic resonance imaging (rs-fMRI) and genetic data obtained from a sample of 111 subjects with a total of 319 rs-fMRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. A Dynamic Causal Model (DCM) is fit to the rs-fMRI scans to estimate effective brain connectivity within the DMN and related to a set of single nucleotide polymorphisms (SNPs) contained in an empirical disease-constrained set which is obtained out-of-sample from 663 ADNI subjects having only genome-wide data. We relate longitudinal effective brain connectivity estimated using spectral DCM to SNPs using both linear mixed effect (LME) models as well as function-on-scalar regression (FSR). In both cases we implement a parametric bootstrap for testing SNP coefficients and make comparisons with p-values obtained from asymptotic null distributions. In both networks at an initial q-value threshold of 0.1 no effects are found. We report on exploratory patterns of associations with relatively high ranks that exhibit stability to the differing assumptions made by both FSR and LME.
Collapse
Affiliation(s)
- Yunlong Nie
- Department of Statistics and Actuarial Science, Simon Fraser University, Room SC K10545 8888 University Drive, Burnaby, BCV5A 1S6,Canada
| | - Eugene Opoku
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
| | - Laila Yasmin
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
| | - Yin Song
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
| | - Jie Wang
- Department of Statistics and Actuarial Science, Simon Fraser University, Room SC K10545 8888 University Drive, Burnaby, BCV5A 1S6,Canada
| | - Sidi Wu
- Department of Statistics and Actuarial Science, Simon Fraser University, Room SC K10545 8888 University Drive, Burnaby, BCV5A 1S6,Canada
| | - Vanessa Scarapicchia
- Department of Psychology, University of Victoria, P. O. Box 1700 STN CSC, Victoria, British Columbia, V8W 2Y2Canada
| | - Jodie Gawryluk
- Department of Psychology, University of Victoria, P. O. Box 1700 STN CSC, Victoria, British Columbia, V8W 2Y2Canada
| | - Liangliang Wang
- Department of Statistics and Actuarial Science, Simon Fraser University, Room SC K10545 8888 University Drive, Burnaby, BCV5A 1S6,Canada
| | - Jiguo Cao
- Department of Statistics and Actuarial Science, Simon Fraser University, Room SC K10545 8888 University Drive, Burnaby, BCV5A 1S6,Canada
| | - Farouk S Nathoo
- Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
| |
Collapse
|
16
|
Kong D, An B, Zhang J, Zhu H. L2RM: Low-rank Linear Regression Models for High-dimensional Matrix Responses. J Am Stat Assoc 2020; 115:403-424. [PMID: 33408427 PMCID: PMC7781207 DOI: 10.1080/01621459.2018.1555092] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 11/11/2018] [Accepted: 11/26/2018] [Indexed: 10/27/2022]
Abstract
The aim of this paper is to develop a low-rank linear regression model (L2RM) to correlate a high-dimensional response matrix with a high dimensional vector of covariates when coefficient matrices have low-rank structures. We propose a fast and efficient screening procedure based on the spectral norm of each coefficient matrix in order to deal with the case when the number of covariates is extremely large. We develop an efficient estimation procedure based on the trace norm regularization, which explicitly imposes the low rank structure of coefficient matrices. When both the dimension of response matrix and that of covariate vector diverge at the exponential order of the sample size, we investigate the sure independence screening property under some mild conditions. We also systematically investigate some theoretical properties of our estimation procedure including estimation consistency, rank consistency and non-asymptotic error bound under some mild conditions. We further establish a theoretical guarantee for the overall solution of our two-step screening and estimation procedure. We examine the finite-sample performance of our screening and estimation methods using simulations and a large-scale imaging genetic dataset collected by the Philadelphia Neurodevelopmental Cohort (PNC) study.
Collapse
Affiliation(s)
- Dehan Kong
- Department of Statistical Sciences, University of Toronto
| | - Baiguo An
- School of Statistics, Capital University of Economics and Business
| | - Jingwen Zhang
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill
| |
Collapse
|
17
|
Shen L, Thompson PM. Brain Imaging Genomics: Integrated Analysis and Machine Learning. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2020; 108:125-162. [PMID: 31902950 PMCID: PMC6941751 DOI: 10.1109/jproc.2019.2947272] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Brain imaging genomics is an emerging data science field, where integrated analysis of brain imaging and genomics data, often combined with other biomarker, clinical and environmental data, is performed to gain new insights into the phenotypic, genetic and molecular characteristics of the brain as well as their impact on normal and disordered brain function and behavior. It has enormous potential to contribute significantly to biomedical discoveries in brain science. Given the increasingly important role of statistical and machine learning in biomedicine and rapidly growing literature in brain imaging genomics, we provide an up-to-date and comprehensive review of statistical and machine learning methods for brain imaging genomics, as well as a practical discussion on method selection for various biomedical applications.
Collapse
Affiliation(s)
- Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Paul M Thompson
- Imaging Genetics Center, Mark & Mary Stevens Institute for Neuroimaging & Informatics, Keck School of Medicine, University of Southern California, Los Angeles, CA 90232, USA
| |
Collapse
|
18
|
Hao X, Yao X, Risacher SL, Saykin AJ, Yu J, Wang H, Tan L, Shen L, Zhang D. Identifying Candidate Genetic Associations with MRI-Derived AD-Related ROI via Tree-Guided Sparse Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1986-1996. [PMID: 29993890 PMCID: PMC7144227 DOI: 10.1109/tcbb.2018.2833487] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Imaging genetics has attracted significant interests in recent studies. Traditional work has focused on mass-univariate statistical approaches that identify important single nucleotide polymorphisms (SNPs) associated with quantitative traits (QTs) of brain structure or function. More recently, to address the problem of multiple comparison and weak detection, multivariate analysis methods such as the least absolute shrinkage and selection operator (Lasso) are often used to select the most relevant SNPs associated with QTs. However, one problem of Lasso, as well as many other feature selection methods for imaging genetics, is that some useful prior information, e.g., the hierarchical structure among SNPs, are rarely used for designing a more powerful model. In this paper, we propose to identify the associations between candidate genetic features (i.e., SNPs) and magnetic resonance imaging (MRI)-derived measures using a tree-guided sparse learning (TGSL) method. The advantage of our method is that it explicitly models the complex hierarchical structure among the SNPs in the objective function for feature selection. Specifically, motivated by the biological knowledge, the hierarchical structures involving gene groups and linkage disequilibrium (LD) blocks as well as individual SNPs are imposed as a tree-guided regularization term in our TGSL model. Experimental studies on simulation data and the Alzheimer's Disease Neuroimaging Initiative (ADNI) data show that our method not only achieves better predictions than competing methods on the MRI-derived measures of AD-related region of interests (ROIs) (i.e., hippocampus, parahippocampal gyrus, and precuneus), but also identifies sparse SNP patterns at the block level to better guide the biological interpretation.
Collapse
|
19
|
Zhu X, Shen D. Robust and Discriminative Brain Genome Association Study. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2019; 11767:456-464. [PMID: 34296224 PMCID: PMC8294458 DOI: 10.1007/978-3-030-32251-9_50] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Brain Genome Association (BGA) study, which investigates the associations between brain structure/function (characterized by neuroimaging phenotypes) and genetic variations (characterized by Single Nucleotide Polymorphisms (SNPs)), is important in pathological analysis of neurological disease. However, the current BGA studies are limited as they did not explicitly consider the disease labels, source importance, and sample importance in their formulations. We address these issues by proposing a robust and discriminative BGA formulation. Specifically, we learn two transformation matrices for mapping two heterogeneous data sources (i.e., neuroimaging data and genetic data) into a common space, so that the samples from the same subject (but diffrent sources) are close to each other, and also the samples with diffrent labels are separable. In addition, we add a sparsity constraint on the transformation matrices to enable feature selection on both data sources. Furthermore, both sample importance and source importance are also considered in the formulation via adaptive parameter-free sample and source weightings. We have conducted various experiments, using Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, to test how well the neuroimaging phenotypes and SNPs can represent each other in the common space.
Collapse
Affiliation(s)
- Xiaofeng Zhu
- University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Dinggang Shen
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
20
|
Uematsu Y, Fan Y, Chen K, Lv J, Lin W. SOFAR: Large-Scale Association Network Learning. IEEE TRANSACTIONS ON INFORMATION THEORY 2019; 65:4924-4939. [PMID: 33746241 PMCID: PMC7970712 DOI: 10.1109/tit.2019.2909889] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper we suggest the method of sparse orthogonal factor regression (SOFAR) via the sparse singular value decomposition with orthogonality constrained optimization to learn the underlying association networks, with broad applications to both unsupervised and supervised learning tasks such as biclustering with sparse singular value decomposition, sparse principal component analysis, sparse factor analysis, and spare vector autoregression analysis. Exploiting the framework of convexity-assisted nonconvex optimization, we derive nonasymptotic error bounds for the suggested procedure characterizing the theoretical advantages. The statistical guarantees are powered by an efficient SOFAR algorithm with convergence property. Both computational and theoretical advantages of our procedure are demonstrated with several simulations and real data examples.
Collapse
Affiliation(s)
- Yoshimasa Uematsu
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Yingying Fan
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Kun Chen
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Jinchi Lv
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Wei Lin
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| |
Collapse
|
21
|
Nathoo FS, Kong L, Zhu H. A Review of Statistical Methods in Imaging Genetics. CAN J STAT 2019; 47:108-131. [PMID: 31274952 PMCID: PMC6605768 DOI: 10.1002/cjs.11487] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 10/08/2018] [Indexed: 12/24/2022]
Abstract
With the rapid growth of modern technology, many biomedical studies are being conducted to collect massive datasets with volumes of multi-modality imaging, genetic, neurocognitive, and clinical information from increasingly large cohorts. Simultaneously extracting and integrating rich and diverse heterogeneous information in neuroimaging and/or genomics from these big datasets could transform our understanding of how genetic variants impact brain structure and function, cognitive function, and brain-related disease risk across the lifespan. Such understanding is critical for diagnosis, prevention, and treatment of numerous complex brain-related disorders (e.g., schizophrenia and Alzheimer's disease). However, the development of analytical methods for the joint analysis of both high-dimensional imaging phenotypes and high-dimensional genetic data, a big data squared (BD2) problem, presents major computational and theoretical challenges for existing analytical methods. Besides the high-dimensional nature of BD2, various neuroimaging measures often exhibit strong spatial smoothness and dependence and genetic markers may have a natural dependence structure arising from linkage disequilibrium. We review some recent developments of various statistical techniques for imaging genetics, including massive univariate and voxel-wise approaches, reduced rank regression, mixture models, and group sparse multi-task regression. By doing so, we hope that this review may encourage others in the statistical community to enter into this new and exciting field of research.
Collapse
Affiliation(s)
- Farouk S Nathoo
- Department of Mathematics and Statistics, University of Victoria
| | - Linglong Kong
- Department of Mathematical and Statistical Sciences, University of Alberta
| | - Hongtu Zhu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
| |
Collapse
|
22
|
Ma H, Li T, Zhu H, Zhu Z. Quantile regression for functional partially linear model in ultra-high dimensions. Comput Stat Data Anal 2019. [DOI: 10.1016/j.csda.2018.06.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
23
|
Huang M, Deng C, Yu Y, Lian T, Yang W, Feng Q. Spatial correlations exploitation based on nonlocal voxel-wise GWAS for biomarker detection of AD. Neuroimage Clin 2018; 21:101642. [PMID: 30584014 PMCID: PMC6413305 DOI: 10.1016/j.nicl.2018.101642] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Revised: 11/19/2018] [Accepted: 12/10/2018] [Indexed: 02/05/2023]
Abstract
Potential biomarker detection is a crucial area of study for the prediction, diagnosis, and monitoring of Alzheimer's disease (AD). The voxelwise genome-wide association study (vGWAS) is widely used in imaging genomics studies that is usually applied to the detection of AD biomarkers in both imaging and genetic data. However, performing vGWAS remains a challenge because of the computational complexity of the technique and our ignorance of the spatial correlations within the imaging data. In this paper, we propose a novel method based on the exploitation of spatial correlations that may help to detect potential AD biomarkers using a fast vGWAS. To incorporate spatial correlations, we applied a nonlocal method that supposed that a given voxel could be represented by weighting the sum of the other voxels. Three commonly used weighting methods were adopted to calculate the weights among different voxels in this study. Then, a fast vGWAS approach was used to assess the association between the image and the genetic data. The proposed method was estimated using both simulated and real data. In the simulation studies, we designed a set of experiments to evaluate the effectiveness of the nonlocal method for incorporating spatial correlations in vGWAS. The experiments showed that incorporating spatial correlations by the nonlocal method could improve the detecting accuracy of AD biomarkers. For real data, we successfully identified three genes, namely, ANK3, MEIS2, and TLR4, which have significant associations with mental retardation, learning disabilities and age according to previous research. These genes have profound impacts on AD or other neurodegenerative diseases. Our results indicated that our method might be an effective and valuable tool for detecting potential biomarkers of AD.
Collapse
Affiliation(s)
- Meiyan Huang
- Guangdong Provincial Key Laboratory of Medical Image Processing, School of Biomedical Engineering, Southern Medical University, Guangzhou, China
| | - Chunyan Deng
- Guangdong Provincial Key Laboratory of Medical Image Processing, School of Biomedical Engineering, Southern Medical University, Guangzhou, China; Department of Radiation Oncology, Cancer Hospital of Shantou University Medical College, Shantou, China
| | - Yuwei Yu
- Guangdong Provincial Key Laboratory of Medical Image Processing, School of Biomedical Engineering, Southern Medical University, Guangzhou, China
| | - Tao Lian
- Guangdong Provincial Key Laboratory of Medical Image Processing, School of Biomedical Engineering, Southern Medical University, Guangzhou, China
| | - Wei Yang
- Guangdong Provincial Key Laboratory of Medical Image Processing, School of Biomedical Engineering, Southern Medical University, Guangzhou, China
| | - Qianjin Feng
- Guangdong Provincial Key Laboratory of Medical Image Processing, School of Biomedical Engineering, Southern Medical University, Guangzhou, China.
| |
Collapse
|
24
|
Bai R, Ghosh M. High-dimensional multivariate posterior consistency under global–local shrinkage priors. J MULTIVARIATE ANAL 2018. [DOI: 10.1016/j.jmva.2018.04.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
25
|
Park Y, Su Z, Zhu H. Groupwise envelope models for imaging genetic analysis. Biometrics 2017; 73:1243-1253. [PMID: 28323341 PMCID: PMC5608647 DOI: 10.1111/biom.12689] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 02/01/2017] [Accepted: 02/01/2017] [Indexed: 11/28/2022]
Abstract
Motivated by searching for associations between genetic variants and brain imaging phenotypes, the aim of this article is to develop a groupwise envelope model for multivariate linear regression in order to establish the association between both multivariate responses and covariates. The groupwise envelope model allows for both distinct regression coefficients and distinct error structures for different groups. Statistically, the proposed envelope model can dramatically improve efficiency of tests and of estimation. Theoretical properties of the proposed model are established. Numerical experiments as well as the analysis of an imaging genetic data set obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study show the effectiveness of the model in efficient estimation. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.
Collapse
Affiliation(s)
- Yeonhee Park
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, U.S.A
| | - Zhihua Su
- Department of Statistics, University of Florida, Gainesville, FL 32611, U.S.A
| | - Hongtu Zhu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, U.S.A
| |
Collapse
|
26
|
Xu Z, Wu C, Pan W. Imaging-wide association study: Integrating imaging endophenotypes in GWAS. Neuroimage 2017; 159:159-169. [PMID: 28736311 PMCID: PMC5671364 DOI: 10.1016/j.neuroimage.2017.07.036] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 06/22/2017] [Accepted: 07/18/2017] [Indexed: 10/19/2022] Open
Abstract
A new and powerful approach, called imaging-wide association study (IWAS), is proposed to integrate imaging endophenotypes with GWAS to boost statistical power and enhance biological interpretation for GWAS discoveries. IWAS extends the promising transcriptome-wide association study (TWAS) from using gene expression endophenotypes to using imaging and other endophenotypes with a much wider range of possible applications. As illustration, we use gray-matter volumes of several brain regions of interest (ROIs) drawn from the ADNI-1 structural MRI data as imaging endophenotypes, which are then applied to the individual-level GWAS data of ADNI-GO/2 and a large meta-analyzed GWAS summary statistics dataset (based on about 74,000 individuals), uncovering some novel genes significantly associated with Alzheimer's disease (AD). We also compare the performance of IWAS with TWAS, showing much larger numbers of significant AD-associated genes discovered by IWAS, presumably due to the stronger link between brain atrophy and AD than that between gene expression of normal individuals and the risk for AD. The proposed IWAS is general and can be applied to other imaging endophenotypes, and GWAS individual-level or summary association data.
Collapse
Affiliation(s)
- Zhiyuan Xu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Chong Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
27
|
Huang C, Thompson P, Wang Y, Yu Y, Zhang J, Kong D, Colen RR, Knickmeyer RC, Zhu H. FGWAS: Functional genome wide association analysis. Neuroimage 2017; 159:107-121. [PMID: 28735012 PMCID: PMC5984052 DOI: 10.1016/j.neuroimage.2017.07.030] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 07/12/2017] [Accepted: 07/14/2017] [Indexed: 12/11/2022] Open
Abstract
Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs.
Collapse
Affiliation(s)
- Chao Huang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Paul Thompson
- Imaging Genetics Center, Stevens Institute for Neuroimaging and Informatics, University of Southern California, Marina del Rey, CA, USA
| | - Yalin Wang
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA
| | - Yang Yu
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jingwen Zhang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dehan Kong
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Rivka R Colen
- Department of Diagnostic Radiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rebecca C Knickmeyer
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Hongtu Zhu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
28
|
Greenlaw K, Szefer E, Graham J, Lesperance M, Nathoo FS. A Bayesian group sparse multi-task regression model for imaging genetics. Bioinformatics 2017; 33:2513-2522. [PMID: 28419235 PMCID: PMC5870710 DOI: 10.1093/bioinformatics/btx215] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2016] [Revised: 02/20/2017] [Accepted: 04/12/2017] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Recent advances in technology for brain imaging and high-throughput genotyping have motivated studies examining the influence of genetic variation on brain structure. Wang et al. have developed an approach for the analysis of imaging genomic studies using penalized multi-task regression with regularization based on a novel group l2,1-norm penalty which encourages structured sparsity at both the gene level and SNP level. While incorporating a number of useful features, the proposed method only furnishes a point estimate of the regression coefficients; techniques for conducting statistical inference are not provided. A new Bayesian method is proposed here to overcome this limitation. RESULTS We develop a Bayesian hierarchical modeling formulation where the posterior mode corresponds to the estimator proposed by Wang et al. and an approach that allows for full posterior inference including the construction of interval estimates for the regression parameters. We show that the proposed hierarchical model can be expressed as a three-level Gaussian scale mixture and this representation facilitates the use of a Gibbs sampling algorithm for posterior simulation. Simulation studies demonstrate that the interval estimates obtained using our approach achieve adequate coverage probabilities that outperform those obtained from the nonparametric bootstrap. Our proposed methodology is applied to the analysis of neuroimaging and genetic data collected as part of the Alzheimer's Disease Neuroimaging Initiative (ADNI), and this analysis of the ADNI cohort demonstrates clearly the value added of incorporating interval estimation beyond only point estimation when relating SNPs to brain imaging endophenotypes. AVAILABILITY AND IMPLEMENTATION Software and sample data is available as an R package 'bgsmtr' that can be downloaded from The Comprehensive R Archive Network (CRAN). CONTACT nathoo@uvic.ca. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Keelin Greenlaw
- Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| | - Elena Szefer
- Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada
| | - Jinko Graham
- Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada
| | - Mary Lesperance
- Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| | - Farouk S Nathoo
- Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| | | |
Collapse
|
29
|
Abstract
The use of imaging markers to predict clinical outcomes can have a great impact in public health. The aim of this paper is to develop a class of generalized scalar-on-image regression models via total variation (GSIRM-TV), in the sense of generalized linear models, for scalar response and imaging predictor with the presence of scalar covariates. A key novelty of GSIRM-TV is that it is assumed that the slope function (or image) of GSIRM-TV belongs to the space of bounded total variation in order to explicitly account for the piecewise smooth nature of most imaging data. We develop an efficient penalized total variation optimization to estimate the unknown slope function and other parameters. We also establish nonasymptotic error bounds on the excess risk. These bounds are explicitly specified in terms of sample size, image size, and image smoothness. Our simulations demonstrate a superior performance of GSIRM-TV against many existing approaches. We apply GSIRM-TV to the analysis of hippocampus data obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset.
Collapse
Affiliation(s)
- Xiao Wang
- Associate Professor of Statistics, Department of Statistics, Purdue University, West Lafayette, IN 47907
| | - Hongtu Zhu
- Professor of Biostatistics, Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77230, and University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
30
|
Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, Green RC, Harvey D, Jack CR, Jagust W, Morris JC, Petersen RC, Saykin AJ, Shaw LM, Toga AW, Trojanowski JQ. Recent publications from the Alzheimer's Disease Neuroimaging Initiative: Reviewing progress toward improved AD clinical trials. Alzheimers Dement 2017; 13:e1-e85. [PMID: 28342697 DOI: 10.1016/j.jalz.2016.11.007] [Citation(s) in RCA: 170] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Revised: 11/21/2016] [Accepted: 11/28/2016] [Indexed: 01/31/2023]
Abstract
INTRODUCTION The Alzheimer's Disease Neuroimaging Initiative (ADNI) has continued development and standardization of methodologies for biomarkers and has provided an increased depth and breadth of data available to qualified researchers. This review summarizes the over 400 publications using ADNI data during 2014 and 2015. METHODS We used standard searches to find publications using ADNI data. RESULTS (1) Structural and functional changes, including subtle changes to hippocampal shape and texture, atrophy in areas outside of hippocampus, and disruption to functional networks, are detectable in presymptomatic subjects before hippocampal atrophy; (2) In subjects with abnormal β-amyloid deposition (Aβ+), biomarkers become abnormal in the order predicted by the amyloid cascade hypothesis; (3) Cognitive decline is more closely linked to tau than Aβ deposition; (4) Cerebrovascular risk factors may interact with Aβ to increase white-matter (WM) abnormalities which may accelerate Alzheimer's disease (AD) progression in conjunction with tau abnormalities; (5) Different patterns of atrophy are associated with impairment of memory and executive function and may underlie psychiatric symptoms; (6) Structural, functional, and metabolic network connectivities are disrupted as AD progresses. Models of prion-like spreading of Aβ pathology along WM tracts predict known patterns of cortical Aβ deposition and declines in glucose metabolism; (7) New AD risk and protective gene loci have been identified using biologically informed approaches; (8) Cognitively normal and mild cognitive impairment (MCI) subjects are heterogeneous and include groups typified not only by "classic" AD pathology but also by normal biomarkers, accelerated decline, and suspected non-Alzheimer's pathology; (9) Selection of subjects at risk of imminent decline on the basis of one or more pathologies improves the power of clinical trials; (10) Sensitivity of cognitive outcome measures to early changes in cognition has been improved and surrogate outcome measures using longitudinal structural magnetic resonance imaging may further reduce clinical trial cost and duration; (11) Advances in machine learning techniques such as neural networks have improved diagnostic and prognostic accuracy especially in challenges involving MCI subjects; and (12) Network connectivity measures and genetic variants show promise in multimodal classification and some classifiers using single modalities are rivaling multimodal classifiers. DISCUSSION Taken together, these studies fundamentally deepen our understanding of AD progression and its underlying genetic basis, which in turn informs and improves clinical trial design.
Collapse
Affiliation(s)
- Michael W Weiner
- Department of Veterans Affairs Medical Center, Center for Imaging of Neurodegenerative Diseases, San Francisco, CA, USA; Department of Radiology, University of California, San Francisco, CA, USA; Department of Medicine, University of California, San Francisco, CA, USA; Department of Psychiatry, University of California, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, CA, USA.
| | - Dallas P Veitch
- Department of Veterans Affairs Medical Center, Center for Imaging of Neurodegenerative Diseases, San Francisco, CA, USA
| | - Paul S Aisen
- Alzheimer's Therapeutic Research Institute, University of Southern California, San Diego, CA, USA
| | - Laurel A Beckett
- Division of Biostatistics, Department of Public Health Sciences, University of California, Davis, CA, USA
| | - Nigel J Cairns
- Knight Alzheimer's Disease Research Center, Washington University School of Medicine, Saint Louis, MO, USA; Department of Neurology, Washington University School of Medicine, Saint Louis, MO, USA
| | - Robert C Green
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Danielle Harvey
- Division of Biostatistics, Department of Public Health Sciences, University of California, Davis, CA, USA
| | | | - William Jagust
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA
| | - John C Morris
- Alzheimer's Therapeutic Research Institute, University of Southern California, San Diego, CA, USA
| | | | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Leslie M Shaw
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Arthur W Toga
- Laboratory of Neuroimaging, Institute of Neuroimaging and Informatics, Keck School of Medicine of University of Southern California, Los Angeles, CA, USA
| | - John Q Trojanowski
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Institute on Aging, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Alzheimer's Disease Core Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Udall Parkinson's Research Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | |
Collapse
|
31
|
Abstract
Many modern statistical problems can be cast in the framework of multivariate regression, where the main task is to make statistical inference for a possibly sparse and low-rank coefficient matrix. The low-rank structure in the coefficient matrix is of intrinsic multivariate nature, which, when combined with sparsity, can further lift dimension reduction, conduct variable selection, and facilitate model interpretation. Using a Bayesian approach, we develop a unified sparse and low-rank multivariate regression method to both estimate the coefficient matrix and obtain its credible region for making inference. The newly developed sparse and low-rank prior for the coefficient matrix enables rank reduction, predictor selection and response selection simultaneously. We utilize the marginal likelihood to determine the regularization hyperparameter, so our method maximizes its posterior probability given the data. For theoretical aspect, the posterior consistency is established to discuss an asymptotic behavior of the proposed method. The efficacy of the proposed approach is demonstrated via simulation studies and a real application on yeast cell cycle data.
Collapse
Affiliation(s)
- Gyuhyeong Goh
- Department of Statistics, Kansas State University, Manhattan, KS 66506, United States
| | - Dipak K Dey
- Department of Statistics, University of Connecticut, Storrs, CT 06269, United States
| | - Kun Chen
- Department of Statistics, University of Connecticut, Storrs, CT 06269, United States
| |
Collapse
|
32
|
|
33
|
Lu ZH, Khondker Z, Ibrahim JG, Wang Y, Zhu H. Bayesian longitudinal low-rank regression models for imaging genetic data from longitudinal studies. Neuroimage 2017; 149:305-322. [PMID: 28143775 DOI: 10.1016/j.neuroimage.2017.01.052] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 12/27/2016] [Accepted: 01/22/2017] [Indexed: 12/29/2022] Open
Abstract
To perform a joint analysis of multivariate neuroimaging phenotypes and candidate genetic markers obtained from longitudinal studies, we develop a Bayesian longitudinal low-rank regression (L2R2) model. The L2R2 model integrates three key methodologies: a low-rank matrix for approximating the high-dimensional regression coefficient matrices corresponding to the genetic main effects and their interactions with time, penalized splines for characterizing the overall time effect, and a sparse factor analysis model coupled with random effects for capturing within-subject spatio-temporal correlations of longitudinal phenotypes. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. Simulations show that the L2R2 model outperforms several other competing methods. We apply the L2R2 model to investigate the effect of single nucleotide polymorphisms (SNPs) on the top 10 and top 40 previously reported Alzheimer disease-associated genes. We also identify associations between the interactions of these SNPs with patient age and the tissue volumes of 93 regions of interest from patients' brain images obtained from the Alzheimer's Disease Neuroimaging Initiative.
Collapse
Affiliation(s)
- Zhao-Hua Lu
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Zakaria Khondker
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Joseph G Ibrahim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yue Wang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | | |
Collapse
|
34
|
Depth-based nonparametric description of functional data, with emphasis on use of spatial depth. Comput Stat Data Anal 2017. [DOI: 10.1016/j.csda.2016.07.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
35
|
Kundu S, Kang J. Semiparametric Bayes conditional graphical models for imaging genetics applications. Stat (Int Stat Inst) 2016; 5:322-337. [PMID: 28616224 DOI: 10.1002/sta4.119] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Motivated by the need for understanding neurological disorders, large-scale imaging genetic studies are being increasingly conducted. A salient objective in such studies is to identify important neuroimaging biomarkers such as the brain functional connectivity, as well as genetic biomarkers, which are predictive of disorders. However, typical approaches for estimating the group level brain functional connectivity do not account for potential variation, resulting from demographic and genetic factors, while usual methods for discovering genetic biomarkers do not factor in the influence of the brain network on the imaging phenotype. We propose a novel semiparametric Bayesian conditional graphical model for joint variable selection and graph estimation, which simultaneously estimates the brain network after accounting for heterogeneity, and infers significant genetic biomarkers. The proposed approach specifies priors on the regression coefficients, which clusters brain regions having similar activation patterns depending on covariates, leading to dimension reduction. A novel graphical prior is proposed, which encourages modularity in brain organization by specifying denser and sparse connections within and across clusters, respectively. The posterior computation proceeds via a Markov chain Monte Carlo. We apply the approach to data obtained from the Alzheimer's disease neuroimaging initiative and demonstrate numerical advantages via simulation studies.
Collapse
Affiliation(s)
- Suprateek Kundu
- Department of Biostatistics, Emory University, 1518 Clifton Road NE, Atlanta, GA 30322, USA
| | - Jian Kang
- Department of Biostatistics, University of Michigan, 3651 Tower, 1415 Washington Heights, Ann Arbor, MI 48019, USA
| |
Collapse
|
36
|
Zhu W, Yuan Y, Zhang J, Zhou F, Knickmeyer RC, Zhu H. Genome-wide association analysis of secondary imaging phenotypes from the Alzheimer's disease neuroimaging initiative study. Neuroimage 2016; 146:983-1002. [PMID: 27717770 DOI: 10.1016/j.neuroimage.2016.09.055] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Revised: 08/13/2016] [Accepted: 09/21/2016] [Indexed: 11/17/2022] Open
Abstract
The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme.
Collapse
Affiliation(s)
- Wensheng Zhu
- School of Mathematics & Statistics and KLAS, Northeast Normal University, Changchun 130024, China; Departments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ying Yuan
- Takeda Pharmaceuticals U.S.A., Inc., 300 Massachusetts Ave, Cambridge, MA 02139, USA
| | - Jingwen Zhang
- Departments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Fan Zhou
- Departments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rebecca C Knickmeyer
- Departments of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hongtu Zhu
- Departments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| |
Collapse
|
37
|
Tao C, Nichols TE, Hua X, Ching CRK, Rolls ET, Thompson PM, Feng J. Generalized reduced rank latent factor regression for high dimensional tensor fields, and neuroimaging-genetic applications. Neuroimage 2016; 144:35-57. [PMID: 27666385 DOI: 10.1016/j.neuroimage.2016.08.027] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2015] [Revised: 08/01/2016] [Accepted: 08/14/2016] [Indexed: 11/18/2022] Open
Abstract
We propose a generalized reduced rank latent factor regression model (GRRLF) for the analysis of tensor field responses and high dimensional covariates. The model is motivated by the need from imaging-genetic studies to identify genetic variants that are associated with brain imaging phenotypes, often in the form of high dimensional tensor fields. GRRLF identifies from the structure in the data the effective dimensionality of the data, and then jointly performs dimension reduction of the covariates, dynamic identification of latent factors, and nonparametric estimation of both covariate and latent response fields. After accounting for the latent and covariate effects, GRLLF performs a nonparametric test on the remaining factor of interest. GRRLF provides a better factorization of the signals compared with common solutions, and is less susceptible to overfitting because it exploits the effective dimensionality. The generality and the flexibility of GRRLF also allow various statistical models to be handled in a unified framework and solutions can be efficiently computed. Within the field of neuroimaging, it improves the sensitivity for weak signals and is a promising alternative to existing approaches. The operation of the framework is demonstrated with both synthetic datasets and a real-world neuroimaging example in which the effects of a set of genes on the structure of the brain at the voxel level were measured, and the results compared favorably with those from existing approaches.
Collapse
Affiliation(s)
- Chenyang Tao
- Centre for Computational Systems Biology and School of Mathematical Sciences, Fudan University, Shanghai, PR China; Department of Computer Science, Warwick University, Coventry, UK
| | | | - Xue Hua
- Imaging Genetics Center, Institute for Neuroimaging & Informatics, University of Southern California, Los Angeles, CA, USA
| | - Christopher R K Ching
- Imaging Genetics Center, Institute for Neuroimaging & Informatics, University of Southern California, Los Angeles, CA, USA; Interdepartmental Neuroscience Graduate Program, UCLA School of Medicine, Los Angeles, CA, USA
| | - Edmund T Rolls
- Department of Computer Science, Warwick University, Coventry, UK; Oxford Centre for Computational Neuroscience, Oxford, UK
| | - Paul M Thompson
- Imaging Genetics Center, Institute for Neuroimaging & Informatics, University of Southern California, Los Angeles, CA, USA; Departments of Neurology, Psychiatry, Radiology, Engineering, Pediatrics, and Ophthalmology, USC, Los Angeles, CA, USA
| | - Jianfeng Feng
- Centre for Computational Systems Biology and School of Mathematical Sciences, Fudan University, Shanghai, PR China; Department of Computer Science, Warwick University, Coventry, UK; School of Life Science and the Collaborative Innovation Center for Brain Science, Fudan University, Shanghai 200433, PR China.
| |
Collapse
|
38
|
Chekouo T, Stingo FC, Guindani M, Do KA. A Bayesian predictive model for imaging genetics with application to schizophrenia. Ann Appl Stat 2016. [DOI: 10.1214/16-aoas948] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
39
|
|
40
|
Abstract
Motivated from problems in canonical correlation analysis, reduced rank regression and sufficient dimension reduction, we introduce a double dimension reduction model where a single index of the multivariate response is linked to the multivariate covariate through a single index of these covariates, hence the name double single index model. Since nonlinear association between two sets of multivariate variables can be arbitrarily complex and even intractable in general, we aim at seeking a principal one-dimensional association structure where a response index is fully characterized by a single predictor index. The functional relation between the two single-indices is left unspecified, allowing flexible exploration of any potential nonlinear association. We argue that such double single index association is meaningful and easy to interpret, and the rest of the multi-dimensional dependence structure can be treated as nuisance in model estimation. We investigate the estimation and inference of both indices and the regression function, and derive the asymptotic properties of our procedure. We illustrate the numerical performance in finite samples and demonstrate the usefulness of the modeling and estimation procedure in a multi-covariate multi-response problem concerning concrete.
Collapse
Affiliation(s)
- Kun Chen
- Department of Statistics, University of Connecticut, 215 Glenbrook Road U-4120, Storrs, Connecticut 06269, U.S.A
| | - Yanyuan Ma
- Department of Statistics, University of South Carolina, 1523 Greene Street Columbia, SC 29208, U.S.A
| |
Collapse
|
41
|
Chen K, Chan KS. A note on rank reduction in sparse multivariate regression. JOURNAL OF STATISTICAL THEORY AND PRACTICE 2016; 10:100-120. [PMID: 26997938 PMCID: PMC4797956 DOI: 10.1080/15598608.2015.1081573] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
A reduced-rank regression with sparse singular value decomposition (RSSVD) approach was proposed by Chen et al. for conducting variable selection in a reduced-rank model. To jointly model the multivariate response, the method efficiently constructs a prespecified number of latent variables as some sparse linear combinations of the predictors. Here, we generalize the method to also perform rank reduction, and enable its usage in reduced-rank vector autoregressive (VAR) modeling to perform automatic rank determination and order selection. We show that in the context of stationary time-series data, the generalized approach correctly identifies both the model rank and the sparse dependence structure between the multivariate response and the predictors, with probability one asymptotically. We demonstrate the efficacy of the proposed method by simulations and analyzing a macro-economical multivariate time series using a reduced-rank VAR model.
Collapse
Affiliation(s)
- Kun Chen
- Department of Statistics, University of Connecticut, Storrs, Connecticut, USA
| | - Kung-Sik Chan
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
42
|
Nonlinear association criterion, nonlinear Granger causality and related issues with applications to neuroimage studies. J Neurosci Methods 2016; 262:110-32. [PMID: 26791806 DOI: 10.1016/j.jneumeth.2016.01.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Revised: 12/21/2015] [Accepted: 01/02/2016] [Indexed: 11/20/2022]
Abstract
BACKGROUND Quantifying associations in neuroscience (and many other scientific disciplines) is often challenged by high-dimensionality, nonlinearity and noisy observations. Many classic methods have either poor power or poor scalability on data sets of the same or different scales such as genetical, physiological and image data. NEW METHOD Based on the framework of reproducing kernel Hilbert spaces we proposed a new nonlinear association criteria (NAC) with an efficient numerical algorithm and p-value approximation scheme. We also presented mathematical justification that links the proposed method to related methods such as kernel generalized variance, kernel canonical correlation analysis and Hilbert-Schmidt independence criteria. NAC allows the detection of association between arbitrary input domain as long as a characteristic kernel is defined. A MATLAB package was provided to facilitate applications. RESULTS Extensive simulation examples and four real world neuroscience examples including functional MRI causality, Calcium imaging and imaging genetic studies on autism [Brain, 138(5):13821393 (2015)] and alcohol addiction [PNAS, 112(30):E4085-E4093 (2015)] are used to benchmark NAC. It demonstrates the superior performance over the existing procedures we tested and also yields biologically significant results for the real world examples. COMPARISON WITH EXISTING METHOD(S) NAC beats its linear counterparts when nonlinearity is presented in the data. It also shows more robustness against different experimental setups compared with its nonlinear counterparts. CONCLUSIONS In this work we presented a new and robust statistical approach NAC for measuring associations. It could serve as an interesting alternative to the existing methods for datasets where nonlinearity and other confounding factors are present.
Collapse
|
43
|
Multivariate Analysis of Genotype-Phenotype Association. Genetics 2016; 202:1345-63. [PMID: 26896328 DOI: 10.1534/genetics.115.181339] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 02/15/2016] [Indexed: 11/18/2022] Open
Abstract
With the advent of modern imaging and measurement technology, complex phenotypes are increasingly represented by large numbers of measurements, which may not bear biological meaning one by one. For such multivariate phenotypes, studying the pairwise associations between all measurements and all alleles is highly inefficient and prevents insight into the genetic pattern underlying the observed phenotypes. We present a new method for identifying patterns of allelic variation (genetic latent variables) that are maximally associated-in terms of effect size-with patterns of phenotypic variation (phenotypic latent variables). This multivariate genotype-phenotype mapping (MGP) separates phenotypic features under strong genetic control from less genetically determined features and thus permits an analysis of the multivariate structure of genotype-phenotype association, including its dimensionality and the clustering of genetic and phenotypic variables within this association. Different variants of MGP maximize different measures of genotype-phenotype association: genetic effect, genetic variance, or heritability. In an application to a mouse sample, scored for 353 SNPs and 11 phenotypic traits, the first dimension of genetic and phenotypic latent variables accounted for >70% of genetic variation present in all 11 measurements; 43% of variation in this phenotypic pattern was explained by the corresponding genetic latent variable. The first three dimensions together sufficed to account for almost 90% of genetic variation in the measurements and for all the interpretable genotype-phenotype association. Each dimension can be tested as a whole against the hypothesis of no association, thereby reducing the number of statistical tests from 7766 to 3-the maximal number of meaningful independent tests. Important alleles can be selected based on their effect size (additive or nonadditive effect on the phenotypic latent variable). This low dimensionality of the genotype-phenotype map has important consequences for gene identification and may shed light on the evolvability of organisms.
Collapse
|
44
|
Kim J, Pan W. A cautionary note on using secondary phenotypes in neuroimaging genetic studies. Neuroimage 2015; 121:136-45. [PMID: 26220747 PMCID: PMC4604049 DOI: 10.1016/j.neuroimage.2015.07.058] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Revised: 06/12/2015] [Accepted: 07/20/2015] [Indexed: 11/18/2022] Open
Abstract
Almost all genome-wide association studies (GWASs), including Alzheimer's Disease Neuroimaging Initiative (ADNI), are based on the case-control study design, implying that the resulting case-control data are likely a biased, not random, sample of the target population. Although association analysis of the disease (e.g. Alzheimer's disease in the ADNI) can be conducted using a standard logistic regression by ignoring the biased case-control sampling, a standard linear regression analysis on a secondary phenotype (e.g. any neuroimaging phenotype in the ADNI) may in general lead to biased inference, including biased parameter estimates, inflated Type I errors and reduced power for association testing. Despite of this well known result in genetic epidemiology, to our surprise, all the published studies on secondary phenotypes with the ADNI data have ignored this potential problem. Here we aim to answer whether such a standard analysis of a secondary phenotype is valid or problematic with the ADNI data. Through both real data analyses and simulation studies, we found that, strikingly, such an analysis was generally valid (with only small biases or slightly inflated Type I errors) for the ADNI data, though cautions must be taken when analyzing other data. We also illustrate applications and possible problems of two methods specifically developed for valid analysis of secondary phenotypes.
Collapse
Affiliation(s)
- Junghi Kim
- Division of Biostatistics, University of Minnesota, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, USA.
| |
Collapse
|
45
|
Huang M, Nichols T, Huang C, Yang Y, Lu Z, Feng Q, Knickmeyer RC, Zhu H. FVGWAS: Fast voxelwise genome wide association analysis of large-scale imaging genetic data. Neuroimage 2015; 118:613-27. [PMID: 26025292 PMCID: PMC4554832 DOI: 10.1016/j.neuroimage.2015.05.043] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Revised: 04/09/2015] [Accepted: 05/16/2015] [Indexed: 01/17/2023] Open
Abstract
More and more large-scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical data to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. Several major big-data challenges arise from testing genome-wide (NC>12 million known variants) associations with signals at millions of locations (NV~10(6)) in the brain from thousands of subjects (n~10(3)). The aim of this paper is to develop a Fast Voxelwise Genome Wide Association analysiS (FVGWAS) framework to efficiently carry out whole-genome analyses of whole-brain data. FVGWAS consists of three components including a heteroscedastic linear model, a global sure independence screening (GSIS) procedure, and a detection procedure based on wild bootstrap methods. Specifically, for standard linear association, the computational complexity is O (nNVNC) for voxelwise genome wide association analysis (VGWAS) method compared with O ((NC+NV)n(2)) for FVGWAS. Simulation studies show that FVGWAS is an efficient method of searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. Finally, we have successfully applied FVGWAS to a large-scale imaging genetic data analysis of ADNI data with 708 subjects, 193,275voxels in RAVENS maps, and 501,584 SNPs, and the total processing time was 203,645s for a single CPU. Our FVGWAS may be a valuable statistical toolbox for large-scale imaging genetic analysis as the field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing.
Collapse
Affiliation(s)
- Meiyan Huang
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
| | - Thomas Nichols
- Department of Statistics, University of Warwick, Coventry, UK
| | - Chao Huang
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yu Yang
- Department of Statistics and Operation Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Zhaohua Lu
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Qianjing Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
| | - Rebecca C Knickmeyer
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hongtu Zhu
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | |
Collapse
|