1
|
Tian X, Wang Y, Wang S, Zhao Y, Zhao Y. Bayesian mixed model inference for genetic association under related samples with brain network phenotype. Biostatistics 2024; 25:1195-1209. [PMID: 38494649 DOI: 10.1093/biostatistics/kxae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 01/22/2024] [Accepted: 02/19/2024] [Indexed: 03/19/2024] Open
Abstract
Genetic association studies for brain connectivity phenotypes have gained prominence due to advances in noninvasive imaging techniques and quantitative genetics. Brain connectivity traits, characterized by network configurations and unique biological structures, present distinct challenges compared to other quantitative phenotypes. Furthermore, the presence of sample relatedness in the most imaging genetics studies limits the feasibility of adopting existing network-response modeling. In this article, we fill this gap by proposing a Bayesian network-response mixed-effect model that considers a network-variate phenotype and incorporates population structures including pedigrees and unknown sample relatedness. To accommodate the inherent topological architecture associated with the genetic contributions to the phenotype, we model the effect components via a set of effect network configurations and impose an inter-network sparsity and intra-network shrinkage to dissect the phenotypic network configurations affected by the risk genetic variant. A Markov chain Monte Carlo (MCMC) algorithm is further developed to facilitate uncertainty quantification. We evaluate the performance of our model through extensive simulations. By further applying the method to study, the genetic bases for brain structural connectivity using data from the Human Connectome Project with excessive family structures, we obtain plausible and interpretable results. Beyond brain connectivity genetic studies, our proposed model also provides a general linear mixed-effect regression framework for network-variate outcomes.
Collapse
Affiliation(s)
- Xinyuan Tian
- Department of Biostatistics, Yale University, 60 College St, New Haven, CT 06520, United States
| | - Yiting Wang
- Department of Biostatistics, Yale University, 60 College St, New Haven, CT 06520, United States
| | - Selena Wang
- Department of Biostatistics, Yale University, 60 College St, New Haven, CT 06520, United States
| | - Yi Zhao
- Department of Biostatistics and Health Data Science, Indiana University, 410W. 10th St, Indianapolis, IN 46202, United States
| | - Yize Zhao
- Department of Biostatistics, Yale University, 60 College St, New Haven, CT 06520, United States
| |
Collapse
|
2
|
Reeder HT, Haneuse S, Lee KH. Group lasso priors for Bayesian accelerated failure time models with left-truncated and interval-censored data. Stat Methods Med Res 2024; 33:1412-1423. [PMID: 39053572 DOI: 10.1177/09622802241262523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
An important task in health research is to characterize time-to-event outcomes such as disease onset or mortality in terms of a potentially high-dimensional set of risk factors. For example, prospective cohort studies of Alzheimer's disease (AD) typically enroll older adults for observation over several decades to assess the long-term impact of genetic and other factors on cognitive decline and mortality. The accelerated failure time model is particularly well-suited to such studies, structuring covariate effects as "horizontal" changes to the survival quantiles that conceptually reflect shifts in the outcome distribution due to lifelong exposures. However, this modeling task is complicated by the enrollment of adults at differing ages, and intermittent follow-up visits leading to interval-censored outcome information. Moreover, genetic and clinical risk factors are not only high-dimensional, but characterized by underlying grouping structures, such as by function or gene location. Such grouped high-dimensional covariates require shrinkage methods that directly acknowledge this structure to facilitate variable selection and estimation. In this paper, we address these considerations directly by proposing a Bayesian accelerated failure time model with a group-structured lasso penalty, designed for left-truncated and interval-censored time-to-event data. We develop an R package with a Markov chain Monte Carlo sampler for estimation. We present a simulation study examining the performance of this method relative to an ordinary lasso penalty and apply the proposed method to identify groups of predictive genetic and clinical risk factors for AD in the Religious Orders Study and Memory and Aging Project prospective cohort studies of AD and dementia.
Collapse
Affiliation(s)
- Harrison T Reeder
- Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | - Kyu Ha Lee
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
3
|
Gong Y, Xu J, Wu M, Gao R, Sun J, Yu Z, Zhang Y. Single-cell biclustering for cell-specific transcriptomic perturbation detection in AD progression. CELL REPORTS METHODS 2024; 4:100742. [PMID: 38554701 PMCID: PMC11045878 DOI: 10.1016/j.crmeth.2024.100742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 10/30/2023] [Accepted: 03/07/2024] [Indexed: 04/02/2024]
Abstract
The pathogenesis of Alzheimer disease (AD) involves complex gene regulatory changes across different cell types. To help decipher this complexity, we introduce single-cell Bayesian biclustering (scBC), a framework for identifying cell-specific gene network biomarkers in scRNA and snRNA-seq data. Through biclustering, scBC enables the analysis of perturbations in functional gene modules at the single-cell level. Applying the scBC framework to AD snRNA-seq data reveals the perturbations within gene modules across distinct cell groups and sheds light on gene-cell correlations during AD progression. Notably, our method helps to overcome common challenges in single-cell data analysis, including batch effects and dropout events. Incorporating prior knowledge further enables the framework to yield more biologically interpretable results. Comparative analyses on simulated and real-world datasets demonstrate the precision and robustness of our approach compared to other state-of-the-art biclustering methods. scBC holds potential for unraveling the mechanisms underlying polygenic diseases characterized by intricate gene coexpression patterns.
Collapse
Affiliation(s)
- Yuqiao Gong
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Jingsi Xu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Maoying Wu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Ruitian Gao
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Jianle Sun
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Zhangsheng Yu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China; SJTU-Yale Joint Center for Biostatistics and Data Science Organization, Shanghai Jiao Tong University, Shanghai, China; Clinical Research Institute, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Center for Biomedical Data Science, Translational Science Institute, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Yue Zhang
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China; SJTU-Yale Joint Center for Biostatistics and Data Science Organization, Shanghai Jiao Tong University, Shanghai, China; Center for Biomedical Data Science, Translational Science Institute, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
4
|
Li W, Chang C, Kundu S, Long Q. Accounting for network noise in graph-guided Bayesian modeling of structured high-dimensional data. Biometrics 2024; 80:ujae012. [PMID: 38483282 PMCID: PMC10938547 DOI: 10.1093/biomtc/ujae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/31/2023] [Accepted: 02/14/2024] [Indexed: 03/17/2024]
Abstract
There is a growing body of literature on knowledge-guided statistical learning methods for analysis of structured high-dimensional data (such as genomic and transcriptomic data) that can incorporate knowledge of underlying networks derived from functional genomics and functional proteomics. These methods have been shown to improve variable selection and prediction accuracy and yield more interpretable results. However, these methods typically use graphs extracted from existing databases or rely on subject matter expertise, which are known to be incomplete and may contain false edges. To address this gap, we propose a graph-guided Bayesian modeling framework to account for network noise in regression models involving structured high-dimensional predictors. Specifically, we use 2 sources of network information, including the noisy graph extracted from existing databases and the estimated graph from observed predictors in the dataset at hand, to inform the model for the true underlying network via a latent scale modeling framework. This model is coupled with the Bayesian regression model with structured high-dimensional predictors involving an adaptive structured shrinkage prior. We develop an efficient Markov chain Monte Carlo algorithm for posterior sampling. We demonstrate the advantages of our method over existing methods in simulations, and through analyses of a genomics dataset and another proteomics dataset for Alzheimer's disease.
Collapse
Affiliation(s)
- Wenrui Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, PA 19104, United States
| | - Changgee Chang
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, United States
| | - Suprateek Kundu
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, PA 19104, United States
| |
Collapse
|
5
|
Zhang Q, Chang C, Shen L, Long Q. Incorporating graph information in Bayesian factor analysis with robust and adaptive shrinkage priors. Biometrics 2024; 80:ujad014. [PMID: 38281768 PMCID: PMC10826885 DOI: 10.1093/biomtc/ujad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 10/20/2023] [Accepted: 11/16/2023] [Indexed: 01/30/2024]
Abstract
There has been an increasing interest in decomposing high-dimensional multi-omics data into a product of low-rank and sparse matrices for the purpose of dimension reduction and feature engineering. Bayesian factor models achieve such low-dimensional representation of the original data through different sparsity-inducing priors. However, few of these models can efficiently incorporate the information encoded by the biological graphs, which has been already proven to be useful in many analysis tasks. In this work, we propose a Bayesian factor model with novel hierarchical priors, which incorporate the biological graph knowledge as a tool of identifying a group of genes functioning collaboratively. The proposed model therefore enables sparsity within networks by allowing each factor loading to be shrunk adaptively and by considering additional layers to relate individual shrinkage parameters to the underlying graph information, both of which yield a more accurate structure recovery of factor loadings. Further, this new priors overcome the phase transition phenomenon, in contrast to existing graph-incorporated approaches, so that it is robust to noisy edges that are inconsistent with the actual sparsity structure of the factor loadings. Finally, our model can handle both continuous and discrete data types. The proposed method is shown to outperform several existing factor analysis methods through simulation experiments and real data analyses.
Collapse
Affiliation(s)
- Qiyiwen Zhang
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Changgee Chang
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 47405, United States
| | - Li Shen
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Qi Long
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| |
Collapse
|
6
|
Liu Y, Chakraborty N, Qin ZS, Kundu S. Integrative Bayesian tensor regression for imaging genetics applications. Front Neurosci 2023; 17:1212218. [PMID: 37680967 PMCID: PMC10481528 DOI: 10.3389/fnins.2023.1212218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 07/17/2023] [Indexed: 09/09/2023] Open
Abstract
Identifying biomarkers for Alzheimer's disease with a goal of early detection is a fundamental problem in clinical research. Both medical imaging and genetics have contributed informative biomarkers in literature. To further improve the performance, recently, there is an increasing interest in developing analytic approaches that combine data across modalities such as imaging and genetics. However, there are limited methods in literature that are able to systematically combine high-dimensional voxel-level imaging and genetic data for accurate prediction of clinical outcomes of interest. Existing prediction models that integrate imaging and genetic features often use region level imaging summaries, and they typically do not consider the spatial configurations of the voxels in the image or incorporate the dependence between genes that may compromise prediction ability. We propose a novel integrative Bayesian scalar-on-image regression model for predicting cognitive outcomes based on high-dimensional spatially distributed voxel-level imaging data, along with correlated transcriptomic features. We account for the spatial dependencies in the imaging voxels via a tensor approach that also enables massive dimension reduction to address the curse of dimensionality, and models the dependencies between the transcriptomic features via a Graph-Laplacian prior. We implement this approach via an efficient Markov chain Monte Carlo (MCMC) computation strategy. We apply the proposed method to the analysis of longitudinal ADNI data for predicting cognitive scores at different visits by integrating voxel-level cortical thickness measurements derived from T1w-MRI scans and transcriptomics data. We illustrate that the proposed imaging transcriptomics approach has significant improvements in prediction compared to prediction using a subset of features from only one modality (imaging or genetics), as well as when using imaging and transcriptomics features but ignoring the inherent dependencies between the features. Our analysis is one of the first to conclusively demonstrate the advantages of prediction based on combining voxel-level cortical thickness measurements along with transcriptomics features, while accounting for inherent structural information.
Collapse
Affiliation(s)
- Yajie Liu
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Nilanjana Chakraborty
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Suprateek Kundu
- Department of Biostatistics, Division of Basic Science Research, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | | |
Collapse
|
7
|
Chang C, Dai Z, Oh J, Long Q. Integrative Learning of Structured High-Dimensional Data from Multiple Datasets. Stat Anal Data Min 2023; 16:120-134. [PMID: 37213790 PMCID: PMC10195070 DOI: 10.1002/sam.11601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 10/14/2022] [Indexed: 11/11/2022]
Abstract
Integrative learning of multiple datasets has the potential to mitigate the challenge of small n and large p that is often encountered in analysis of big biomedical data such as genomics data. Detection of weak yet important signals can be enhanced by jointly selecting features for all datasets. However, the set of important features may not always be the same across all datasets. Although some existing integrative learning methods allow heterogeneous sparsity structure where a subset of datasets can have zero coefficients for some selected features, they tend to yield reduced efficiency, reinstating the problem of losing weak important signals. We propose a new integrative learning approach which can not only aggregate important signals well in homogeneous sparsity structure, but also substantially alleviate the problem of losing weak important signals in heterogeneous sparsity structure. Our approach exploits a priori known graphical structure of features and encourages joint selection of features that are connected in the graph. Integrating such prior information over multiple datasets enhances the power, while also accounting for the heterogeneity across datasets. Theoretical properties of the proposed method are investigated. We also demonstrate the limitations of existing approaches and the superiority of our method using a simulation study and analysis of gene expression data from ADNI.
Collapse
Affiliation(s)
- Changgee Chang
- Perelman School of Medicine, University of Pennsylvania, Pennsylvania, U.S.A
| | - Zongyu Dai
- School of Arts and Science, University of Pennsylvania, Pennsylvania, U.S.A
| | - Jihwan Oh
- Perelman School of Medicine, University of Pennsylvania, Pennsylvania, U.S.A
| | - Qi Long
- Perelman School of Medicine, University of Pennsylvania, Pennsylvania, U.S.A
| |
Collapse
|
8
|
Bao J, Chang C, Zhang Q, Saykin AJ, Shen L, Long Q. Integrative analysis of multi-omics and imaging data with incorporation of biological information via structural Bayesian factor analysis. Brief Bioinform 2023; 24:bbad073. [PMID: 36882008 PMCID: PMC10387302 DOI: 10.1093/bib/bbad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/14/2023] [Accepted: 02/10/2023] [Indexed: 03/09/2023] Open
Abstract
MOTIVATION With the rapid development of modern technologies, massive data are available for the systematic study of Alzheimer's disease (AD). Though many existing AD studies mainly focus on single-modality omics data, multi-omics datasets can provide a more comprehensive understanding of AD. To bridge this gap, we proposed a novel structural Bayesian factor analysis framework (SBFA) to extract the information shared by multi-omics data through the aggregation of genotyping data, gene expression data, neuroimaging phenotypes and prior biological network knowledge. Our approach can extract common information shared by different modalities and encourage biologically related features to be selected, guiding future AD research in a biologically meaningful way. METHOD Our SBFA model decomposes the mean parameters of the data into a sparse factor loading matrix and a factor matrix, where the factor matrix represents the common information extracted from multi-omics and imaging data. Our framework is designed to incorporate prior biological network information. Our simulation study demonstrated that our proposed SBFA framework could achieve the best performance compared with the other state-of-the-art factor-analysis-based integrative analysis methods. RESULTS We apply our proposed SBFA model together with several state-of-the-art factor analysis models to extract the latent common information from genotyping, gene expression and brain imaging data simultaneously from the ADNI biobank database. The latent information is then used to predict the functional activities questionnaire score, an important measurement for diagnosis of AD quantifying subjects' abilities in daily life. Our SBFA model shows the best prediction performance compared with the other factor analysis models. AVAILABILITY Code are publicly available at https://github.com/JingxuanBao/SBFA. CONTACT qlong@upenn.edu.
Collapse
Affiliation(s)
- Jingxuan Bao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA
| | - Qiyiwen Zhang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University, Indianapolis, 46202, IN, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA
| | | |
Collapse
|
9
|
Zhao Y, Chang C, Zhang J, Zhang Z. Genetic underpinnings of brain structural connectome for young adults. J Am Stat Assoc 2023; 118:1473-1487. [PMID: 37982009 PMCID: PMC10655950 DOI: 10.1080/01621459.2022.2156349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 11/29/2022] [Indexed: 12/13/2022]
Abstract
With distinct advantages in power over behavioral phenotypes, brain imaging traits have become emerging endophenotypes to dissect molecular contributions to behaviors and neuropsychiatric illnesses. Among different imaging features, brain structural connectivity (i.e., structural connectome) which summarizes the anatomical connections between different brain regions is one of the most cutting edge while under-investigated traits; and the genetic influence on the structural connectome variation remains highly elusive. Relying on a landmark imaging genetics study for young adults, we develop a biologically plausible brain network response shrinkage model to comprehensively characterize the relationship between high dimensional genetic variants and the structural connectome phenotype. Under a unified Bayesian framework, we accommodate the topology of brain network and biological architecture within the genome; and eventually establish a mechanistic mapping between genetic biomarkers and the associated brain sub-network units. An efficient expectation-maximization algorithm is developed to estimate the model and ensure computing feasibility. In the application to the Human Connectome Project Young Adult (HCP-YA) data, we establish the genetic underpinnings which are highly interpretable under functional annotation and brain tissue eQTL analysis, for the brain white matter tracts connecting the hippocampus and two cerebral hemispheres. We also show the superiority of our method in extensive simulations.
Collapse
Affiliation(s)
- Yize Zhao
- Department of Biostatistics, Yale University
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania
| | - Jingwen Zhang
- Department of Biostatistics, Boston University, Boston, MA
| | - Zhengwu Zhang
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill
| |
Collapse
|
10
|
Semi-parametric Bayes regression with network-valued covariates. Mach Learn 2022. [DOI: 10.1007/s10994-022-06174-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
11
|
Sun W, Chang C, Long Q. Graph-guided Bayesian SVM with Adaptive Structured Shrinkage Prior for High-dimensional Data. PROCEEDINGS : ... IEEE INTERNATIONAL CONFERENCE ON BIG DATA. IEEE INTERNATIONAL CONFERENCE ON BIG DATA 2021; 2021:4472-4479. [PMID: 35187547 PMCID: PMC8855458 DOI: 10.1109/bigdata52589.2021.9671712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Support vector machine (SVM) is a popular classification method for the analysis of a wide range of data including big biomedical data. Many SVM methods with feature selection have been developed under the frequentist regularization or Bayesian shrinkage frameworks. On the other hand, the value of incorporating a priori known biological knowledge, such as those from functional genomics and functional proteomics, into statistical analysis of -omic data has been recognized in recent years. Such biological information is often represented by graphs. We propose a novel method that assigns Laplace priors to the regression coefficients and incorporates the underlying graph information via a hyper-prior for the shrinkage parameters in the Laplace priors. This enables smoothing of shrinkage parameters for connected variables in the graph and conditional independence between shrinkage parameters for disconnected variables. Extensive simulations demonstrate that our proposed methods achieve the best performance compared to the other existing SVM methods in terms of prediction accuracy. The proposed method are also illustrated in analysis of genomic data from cancer studies, demonstrating its advantage in generating biologically meaningful results and identifying potentially important features.
Collapse
Affiliation(s)
- Wenli Sun
- Dept of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Changgee Chang
- Dept of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Qi Long
- Dept of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
12
|
Zhao Y, Li T, Zhu H. Bayesian sparse heritability analysis with high-dimensional neuroimaging phenotypes. Biostatistics 2020; 23:467-484. [PMID: 32948880 PMCID: PMC9308456 DOI: 10.1093/biostatistics/kxaa035] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 07/15/2020] [Accepted: 08/11/2020] [Indexed: 12/24/2022] Open
Abstract
Heritability analysis plays a central role in quantitative genetics to describe genetic contribution to human complex traits and prioritize downstream analyses under large-scale phenotypes. Existing works largely focus on modeling single phenotype and currently available multivariate phenotypic methods often suffer from scaling and interpretation. In this article, motivated by understanding how genetic underpinning impacts human brain variation, we develop an integrative Bayesian heritability analysis to jointly estimate heritabilities for high-dimensional neuroimaging traits. To induce sparsity and incorporate brain anatomical configuration, we impose hierarchical selection among both regional and local measurements based on brain structural network and voxel dependence. We also use a nonparametric Dirichlet process mixture model to realize grouping among single nucleotide polymorphism-associated phenotypic variations, providing biological plausibility. Through extensive simulations, we show the proposed method outperforms existing ones in heritability estimation and heritable traits selection under various scenarios. We finally apply the method to two large-scale imaging genetics datasets: the Alzheimer's Disease Neuroimaging Initiative and United Kingdom Biobank and show biologically meaningful results.
Collapse
Affiliation(s)
- Yize Zhao
- Department of Biostatistics, Yale University, 300 George Street, New Haven, CT 06511, USA
| | - Tengfei Li
- Department of Radiology, University of North Carolina at Chapel Hill, 101 Manning Dr, Chapel Hill, NC 27514, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27514, USA
| |
Collapse
|
13
|
Li Z, Chang C, Kundu S, Long Q. Bayesian generalized biclustering analysis via adaptive structured shrinkage. Biostatistics 2020; 21:610-624. [PMID: 30596887 PMCID: PMC7307984 DOI: 10.1093/biostatistics/kxy081] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 09/18/2018] [Accepted: 11/21/2018] [Indexed: 12/13/2022] Open
Abstract
Biclustering techniques can identify local patterns of a data matrix by clustering feature space and sample space at the same time. Various biclustering methods have been proposed and successfully applied to analysis of gene expression data. While existing biclustering methods have many desirable features, most of them are developed for continuous data and few of them can efficiently handle -omics data of various types, for example, binomial data as in single nucleotide polymorphism data or negative binomial data as in RNA-seq data. In addition, none of existing methods can utilize biological information such as those from functional genomics or proteomics. Recent work has shown that incorporating biological information can improve variable selection and prediction performance in analyses such as linear regression and multivariate analysis. In this article, we propose a novel Bayesian biclustering method that can handle multiple data types including Gaussian, Binomial, and Negative Binomial. In addition, our method uses a Bayesian adaptive structured shrinkage prior that enables feature selection guided by existing biological information. Our simulation studies and application to multi-omics datasets demonstrate robust and superior performance of the proposed method, compared to other existing biclustering methods.
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Road, NE, Atlanta, GA, USA
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, USA
| | - Suprateek Kundu
- Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Road, NE, Atlanta, GA, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, USA
| |
Collapse
|
14
|
Cai Q, Kang J, Yu T. Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior. BAYESIAN ANALYSIS 2020; 15:79-102. [PMID: 32802246 PMCID: PMC7428197 DOI: 10.1214/18-ba1142] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Selecting informative nodes over large-scale networks becomes increasingly important in many research areas. Most existing methods focus on the local network structure and incur heavy computational costs for the large-scale problem. In this work, we propose a novel prior model for Bayesian network marker selection in the generalized linear model (GLM) framework: the Thresholded Graph Laplacian Gaussian (TGLG) prior, which adopts the graph Laplacian matrix to characterize the conditional dependence between neighboring markers accounting for the global network structure. Under mild conditions, we show the proposed model enjoys the posterior consistency with a diverging number of edges and nodes in the network. We also develop a Metropolis-adjusted Langevin algorithm (MALA) for efficient posterior computation, which is scalable to large-scale networks. We illustrate the superiorities of the proposed method compared with existing alternatives via extensive simulation studies and an analysis of the breast cancer gene expression dataset in the Cancer Genome Atlas (TCGA).
Collapse
Affiliation(s)
- Qingpo Cai
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
15
|
Chang C, Oh J, Long Q. GRIA: Graphical Regularization for Integrative Analysis. PROCEEDINGS OF THE ... SIAM INTERNATIONAL CONFERENCE ON DATA MINING. SIAM INTERNATIONAL CONFERENCE ON DATA MINING 2020; 2020:604-612. [PMID: 32440369 PMCID: PMC7241091 DOI: 10.1137/1.9781611976236.68] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Integrative analysis jointly analyzes multiple data sets to overcome curse of dimensionality. It can detect important but weak signals by jointly selecting features for all data sets, but unfortunately the sets of important features are not always the same for all data sets. Variations which allows heterogeneous sparsity structure-a subset of data sets can have a zero coefficient for a selected feature-have been proposed, but it compromises the effect of integrative analysis recalling the problem of losing weak important signals. We propose a new integrative analysis approach which not only aggregates weak important signals well in homogeneity setting but also substantially alleviates the problem of losing weak important signals in heterogeneity setting. Our approach exploits a priori known graphical structure of features by forcing joint selection of adjacent features, and integrating such information over multiple data sets can increase the power while taking into account the heterogeneity across data sets. We confirm the problem of existing approaches and demonstrate the superiority of our method through a simulation study and an application to gene expression data from ADNI.
Collapse
Affiliation(s)
- Changgee Chang
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania
| | - Jihwan Oh
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania
| | - Qi Long
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania
| |
Collapse
|
16
|
Gao B, Liu X, Li H, Cui Y. Integrative analysis of genetical genomics data incorporating network structures. Biometrics 2019; 75:1063-1075. [PMID: 31009063 PMCID: PMC6810723 DOI: 10.1111/biom.13072] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Revised: 03/15/2019] [Accepted: 03/28/2019] [Indexed: 12/18/2022]
Abstract
In a living organism, tens of thousands of genes are expressed and interact with each other to achieve necessary cellular functions. Gene regulatory networks contain information on regulatory mechanisms and the functions of gene expressions. Thus, incorporating network structures, discerned either through biological experiments or statistical estimations, could potentially increase the selection and estimation accuracy of genes associated with a phenotype of interest. Here, we considered a gene selection problem using gene expression data and the graphical structures found in gene networks. Because gene expression measurements are intermediate phenotypes between a trait and its associated genes, we adopted an instrumental variable regression approach. We treated genetic variants as instrumental variables to address the endogeneity issue. We proposed a two-step estimation procedure. In the first step, we applied the LASSO algorithm to estimate the effects of genetic variants on gene expression measurements. In the second step, the projected expression measurements obtained from the first step were treated as input variables. A graph-constrained regularization method was adopted to improve the efficiency of gene selection and estimation. We theoretically showed the selection consistency of the estimation method and derived the L ∞ bound of the estimates. Simulation and real data analyses were conducted to demonstrate the effectiveness of our method and to compare it with its counterparts.
Collapse
Affiliation(s)
- Bin Gao
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan
- Quantitative Sciences, Janssen Research & Development, LLC, Spring House, Pennsylvania
| | - Xu Liu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Hongzhe Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan
| |
Collapse
|
17
|
Sun W, Chang C, Long Q. Bayesian Non-linear Support Vector Machine for High-Dimensional Data with Incorporation of Graph Information on Features. PROCEEDINGS : ... IEEE INTERNATIONAL CONFERENCE ON BIG DATA. IEEE INTERNATIONAL CONFERENCE ON BIG DATA 2019; 2019:4874-4882. [PMID: 32455423 PMCID: PMC7243270 DOI: 10.1109/bigdata47090.2019.9006473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Support vector machine (SVM) is a popular classification method for analysis of high dimensional data such as genomics data. Recently a number of linear SVM methods have been developed to achieve feature selection through either frequentist regularization or Bayesian shrinkage, but the linear assumption may not be plausible for many real applications. In addition, recent work has demonstrated that incorporating known biological knowledge, such as those from functional genomics, into the statistical analysis of genomic data offers great promise of improved predictive accuracy and feature selection. Such biological knowledge can often be represented by graphs. In this article, we propose a novel knowledge-guided nonlinear Bayesian SVM approach for analysis of high-dimensional data. Our model uses graph information that represents the relationship among the features to guide feature selection. To achieve knowledge-guided feature selection, we assign an Ising prior to the indicators representing inclusion/exclusion of the features in the model. An efficient MCMC algorithm is developed for posterior inference. The performance of our method is evaluated and compared with several penalized linear SVM and the standard kernel SVM method in terms of prediction and feature selection in extensive simulation studies. Also, analyses of genomic data from a cancer study show that our method yields a more accurate prediction model for patient survival and reveals biologically more meaningful results than the existing methods.
Collapse
Affiliation(s)
- Wenli Sun
- Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104
| |
Collapse
|
18
|
Chang C, Oh J, Min EJ, Long Q. Knowledge-Guided Biclustering via Sparse Variational EM Algorithm. 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE : PROCEEDINGS : 10-11 NOVEMBER 2019, BEIJING, CHINA. IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (10TH : 2019 : BEIJING, CHINA) 2019; 2019:25-32. [PMID: 34290493 PMCID: PMC8291726 DOI: 10.1109/icbk.2019.00012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A biclustering in the analysis of a gene expression data matrix, for example, is defined as a set of biclusters where each bicluster is a group of genes and a group of samples for which the genes are differentially expressed. Although many data mining approaches for biclustering exist in the literature, only few are able to incorporate prior knowledge to the analysis, which can lead to great improvements in terms of accuracy and interpretability, and all are limited in handling discrete data types. We propose a generalized biclustering approach that can be used for integrative analysis of multi-omics data with different data types. Our method is capable of utilizing biological information that can be represented by graph such as functional genomics and functional proteomics and accommodating a combination of continuous and discrete data types. The proposed method builds on a generalized Bayesian factor analysis framework and a variational EM approach is used to obtain parameter estimates, where the latent quantities in the loglikelihood are iteratively imputed by their conditional expectations. The biclusters are retrieved via the sparse estimates of the factor loadings and the conditional expectation of the latent factors. In order to obtain the sparse conditional expectation of the latent factors, a novel sparse variational EM algorithm is used. We demonstrate the superiority of our method over several existing biclustering methods in extensive simulation experiements and in integrative analysis of multi-omics data.
Collapse
Affiliation(s)
- Changgee Chang
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Jihwan Oh
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Eun Jeong Min
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
19
|
Zhao Y, Chang C, Long Q. Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology. JCO Precis Oncol 2019; 3:PO.19.00018. [PMID: 35100722 PMCID: PMC9797232 DOI: 10.1200/po.19.00018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2019] [Indexed: 12/31/2022] Open
Abstract
High-dimensional -omics data such as genomic, transcriptomic, and metabolomic data offer great promise in advancing precision medicine. In particular, such data have enabled the investigation of complex diseases such as cancer at an unprecedented scale and in multiple dimensions. However, a number of analytical challenges complicate analysis of high-dimensional -omics data. One is the growing recognition that complex diseases such as cancer are multifactorial and may be attributed to harmful changes on multiple -omics levels and on the pathway level. When individual genes in an important pathway have relatively weak signals, it can be challenging to detect them on their own, but the aggregated signal in the pathway can be considerably stronger and hence easier to detect with the same sample size. To address these challenges, there is a growing body of literature on knowledge-guided statistical learning methods for analysis of high-dimensional -omics data that can incorporate biological knowledge such as functional genomics and functional proteomics. These methods have been shown to improve predication and classification accuracy and yield biologically more interpretable results compared with statistical learning methods that do not use biological knowledge. In this review, we survey current knowledge-guided statistical learning methods, including both supervised learning and unsupervised learning, and their applications to precision oncology, and we discuss future research directions.
Collapse
Affiliation(s)
- Yize Zhao
- Weill Cornell Medicine, New York, NY
| | - Changgee Chang
- University of Pennsylvania Perelman School
of Medicine, Philadelphia, PA
| | - Qi Long
- University of Pennsylvania Perelman School
of Medicine, Philadelphia, PA
| |
Collapse
|
20
|
Chakraborty S, Lozano AC. A graph Laplacian prior for Bayesian variable selection and grouping. Comput Stat Data Anal 2019. [DOI: 10.1016/j.csda.2019.01.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
21
|
Sun W, Chang C, Zhao Y, Long Q. Knowledge-Guided Bayesian Support Vector Machine for High-Dimensional Data with Application to Analysis of Genomics Data. PROCEEDINGS : ... IEEE INTERNATIONAL CONFERENCE ON BIG DATA. IEEE INTERNATIONAL CONFERENCE ON BIG DATA 2018; 2018:1484-1493. [PMID: 31041431 PMCID: PMC6486656 DOI: 10.1109/bigdata.2018.8622484] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Support vector machine (SVM) is a popular classification method for the analysis of wide range of data including big data. Many SVM methods with feature selection have been developed under frequentist regularization or Bayesian shrinkage frameworks. On the other hand, the importance of incorporating a priori known biological knowledge, such as gene pathway information which stems from the gene regulatory network, into the statistical analysis of genomic data has been recognized in recent years. In this article, we propose a new Bayesian SVM approach that enables the feature selection to be guided by the knowledge on the graphical structure among predictors. The proposed method uses the spike-and-slab prior for feature selection, combined with the Ising prior that encourages group-wise selection of the predictors adjacent to each other on the known graph. Gibbs sampling algorithm is used for Bayesian inference. The performance of our method is evaluated and compared with existing SVM methods in terms of prediction and feature selection in extensive simulation settings. In addition, our method is illustrated in the analysis of genomic data from a cancer study, demonstrating its advantage in generating biologically meaningful results and identifying potentially important features.
Collapse
Affiliation(s)
- Wenli Sun
- Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104
| | - Yize Zhao
- Department of Healthcare Policy and Research Weill Cornell Medicine, Cornell University, New York, NY, 10065
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104
| |
Collapse
|
22
|
Min EJ, Chang C, Long Q. Generalized Bayesian Factor Analysis for Integrative Clustering with Applications to Multi-Omics Data. PROCEEDINGS OF THE ... INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS. IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS 2018; 2018:109-119. [PMID: 31106307 PMCID: PMC6521881 DOI: 10.1109/dsaa.2018.00021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Integrative clustering is a clustering approach for multiple datasets, which provide different views of a common group of subjects. It enables analyzing multi-omics data jointly to, for example, identify the subtypes of diseases, cells, and so on, capturing the complex underlying biological processes more precisely. On the other hand, there has been a great deal of interest in incorporating the prior structural knowledge on the features into statistical analyses over the past decade. The knowledge on the gene regulatory network (pathways) can potentially be incorporated into many genomic studies. In this paper, we propose a novel integrative clustering method which can incorporate the prior graph knowledge. We first develop a generalized Bayesian factor analysis (GBFA) framework, a sparse Bayesian factor analysis which can take into account the graph information. Our GBFA framework employs the spike and slab lasso (SSL) prior to impose sparsity on the factor loadings and the Markov random field (MRF) prior to encourage smoothing over the adjacent factor loadings, which establishes a unified shrinkage adaptive to the loading size and the graph structure. Then, we use the framework to extend iCluster+, a factor analysis based integrative clustering approach. A novel variational EM algorithm is proposed to efficiently estimate the MAP estimator for the factor loadings. Extensive simulation studies and the application to the NCI60 cell line dataset demonstrate that the propose method is superior and delivers more biologically meaningful outcomes.
Collapse
Affiliation(s)
- Eun Jeong Min
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Philadelpia, USA
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Philadelpia, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Philadelpia, USA
| |
Collapse
|
23
|
Higgins IA, Kundu S, Guo Y. Integrative Bayesian analysis of brain functional networks incorporating anatomical knowledge. Neuroimage 2018; 181:263-278. [PMID: 30017786 DOI: 10.1016/j.neuroimage.2018.07.015] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2018] [Revised: 07/04/2018] [Accepted: 07/05/2018] [Indexed: 12/31/2022] Open
Abstract
Recently, there has been increased interest in fusing multimodal imaging to better understand brain organization by integrating information on both brain structure and function. In particular, incorporating anatomical knowledge leads to desirable outcomes such as increased accuracy in brain network estimates and greater reproducibility of topological features across scanning sessions. Despite the clear advantages, major challenges persist in integrative analyses including an incomplete understanding of the structure-function relationship and inaccuracies in mapping anatomical structures due to inherent deficiencies in existing imaging technology. This calls for the development of advanced network modeling tools that appropriately incorporate anatomical structure in constructing brain functional networks. We propose a hierarchical Bayesian Gaussian graphical modeling approach which models the brain functional networks via sparse precision matrices whose degree of edge specific shrinkage is a random variable that is modeled using both anatomical structure and an independent baseline component. The proposed approach adaptively shrinks functional connections and flexibly identifies functional connections supported by structural connectivity knowledge. This enables robust brain network estimation even in the presence of misspecified anatomical knowledge, while accommodating heterogeneity in the structure-function relationship. We implement the approach via an efficient optimization algorithm which yields maximum a posteriori estimates. Extensive numerical studies involving multiple functional network structures reveal the clear advantages of the proposed approach over competing methods in accurately estimating brain functional connectivity, even when the anatomical knowledge is misspecified up to a certain degree. An application of the approach to data from the Philadelphia Neurodevelopmental Cohort (PNC) study reveals gender based connectivity differences across multiple age groups, and higher reproducibility in the estimation of network metrics compared to alternative methods.
Collapse
Affiliation(s)
- Ixavier A Higgins
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA
| | - Suprateek Kundu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA.
| | - Ying Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA
| |
Collapse
|