1
|
DEY D, DATTA A, BANERJEE S. Modeling Multivariate Spatial Dependencies Using Graphical Models. THE NEW ENGLAND JOURNAL OF STATISTICS IN DATA SCIENCE 2023; 1:283-295. [PMID: 37817840 PMCID: PMC10563032 DOI: 10.51387/23-nejsds47] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2023]
Abstract
Graphical models have witnessed significant growth and usage in spatial data science for modeling data referenced over a massive number of spatial-temporal coordinates. Much of this literature has focused on a single or relatively few spatially dependent outcomes. Recent attention has focused upon addressing modeling and inference for substantially large number of outcomes. While spatial factor models and multivariate basis expansions occupy a prominent place in this domain, this article elucidates a recent approach, graphical Gaussian Processes, that exploits the notion of conditional independence among a very large number of spatial processes to build scalable graphical models for fully model-based Bayesian analysis of multivariate spatial data.
Collapse
Affiliation(s)
- Debangan DEY
- Department of Biostatistics, Johns Hopkins University, USA
| | - Abhirup DATTA
- Department of Biostatistics, Johns Hopkins University, USA
| | - Sudipto BANERJEE
- Department of Biostatistics, University of California Los Angeles, USA
| |
Collapse
|
2
|
Affiliation(s)
- Wanfang Chen
- Academy of Statistics and Interdisciplinary Sciences East China Normal University Shanghai China
| | - Marc G. Genton
- Statistics Program, CEMSE Division King Abdullah University of Science and Technology Thuwal Saudi Arabia
| |
Collapse
|
3
|
Zhang L, Banerjee S. Spatial factor modeling: A Bayesian matrix-normal approach for misaligned data. Biometrics 2022; 78:560-573. [PMID: 33704776 PMCID: PMC10257482 DOI: 10.1111/biom.13452] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 12/18/2020] [Accepted: 02/24/2021] [Indexed: 11/30/2022]
Abstract
Multivariate spatially oriented data sets are prevalent in the environmental and physical sciences. Scientists seek to jointly model multiple variables, each indexed by a spatial location, to capture any underlying spatial association for each variable and associations among the different dependent variables. Multivariate latent spatial process models have proved effective in driving statistical inference and rendering better predictive inference at arbitrary locations for the spatial process. High-dimensional multivariate spatial data, which are the theme of this article, refer to data sets where the number of spatial locations and the number of spatially dependent variables is very large. The field has witnessed substantial developments in scalable models for univariate spatial processes, but such methods for multivariate spatial processes, especially when the number of outcomes are moderately large, are limited in comparison. Here, we extend scalable modeling strategies for a single process to multivariate processes. We pursue Bayesian inference, which is attractive for full uncertainty quantification of the latent spatial process. Our approach exploits distribution theory for the matrix-normal distribution, which we use to construct scalable versions of a hierarchical linear model of coregionalization (LMC) and spatial factor models that deliver inference over a high-dimensional parameter space including the latent spatial process. We illustrate the computational and inferential benefits of our algorithms over competing methods using simulation studies and an analysis of a massive vegetation index data set.
Collapse
Affiliation(s)
- Lu Zhang
- Department of Statistics, Columbia University, New York
| | - Sudipto Banerjee
- Department of Biostatistics, University of California, Los Angeles, California
| |
Collapse
|
4
|
Functional marked point processes: a natural structure to unify spatio-temporal frameworks and to analyse dependent functional data. TEST-SPAIN 2021. [DOI: 10.1007/s11749-020-00730-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
AbstractThis paper treats functional marked point processes (FMPPs), which are defined as marked point processes where the marks are random elements in some (Polish) function space. Such marks may represent, for example, spatial paths or functions of time. To be able to consider, for example, multivariate FMPPs, we also attach an additional, Euclidean, mark to each point. We indicate how the FMPP framework quite naturally connects the point process framework with both the functional data analysis framework and the geostatistical framework. We further show that various existing stochastic models fit well into the FMPP framework. To be able to carry out nonparametric statistical analyses for FMPPs, we study characteristics such as product densities and Palm distributions, which are the building blocks for many summary statistics. We proceed to defining a new family of summary statistics, so-called weighted marked reduced moment measures, together with their nonparametric estimators, in order to study features of the functional marks. We further show how other summary statistics may be obtained as special cases of these summary statistics. We finally apply these tools to analyse population structures, such as demographic evolution and sex ratio over time, in Spanish provinces.
Collapse
|
5
|
Song Y, Ge S, Cao J, Wang L, Nathoo FS. A Bayesian spatial model for imaging genetics. Biometrics 2021; 78:742-753. [PMID: 33765325 DOI: 10.1111/biom.13460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/08/2021] [Accepted: 02/24/2021] [Indexed: 11/29/2022]
Abstract
We develop a Bayesian bivariate spatial model for multivariate regression analysis applicable to studies examining the influence of genetic variation on brain structure. Our model is motivated by an imaging genetics study of the Alzheimer's Disease Neuroimaging Initiative (ADNI), where the objective is to examine the association between images of volumetric and cortical thickness values summarizing the structure of the brain as measured by magnetic resonance imaging (MRI) and a set of 486 single nucleotide polymorphism (SNPs) from 33 Alzheimer's disease (AD) candidate genes obtained from 632 subjects. A bivariate spatial process model is developed to accommodate the correlation structures typically seen in structural brain imaging data. First, we allow for spatial correlation on a graph structure in the imaging phenotypes obtained from a neighborhood matrix for measures on the same hemisphere of the brain. Second, we allow for correlation in the same measures obtained from different hemispheres (left/right) of the brain. We develop a mean-field variational Bayes algorithm and a Gibbs sampling algorithm to fit the model. We also incorporate Bayesian false discovery rate (FDR) procedures to select SNPs. We implement the methodology in a new release of the R package bgsmtr. We show that the new spatial model demonstrates superior performance over a standard model in our application. Data used in the preparation of this article were obtained from the ADNI database (https://adni.loni.usc.edu).
Collapse
Affiliation(s)
- Yin Song
- Department of Mathematics and Statistics, University of Victoria, British Columbia, Canada
| | - Shufei Ge
- Institute of Mathematical Sciences, ShanghaiTech University, Shanghai, China
| | - Jiguo Cao
- Statistics and Actuarial Science, Simon Fraser University, British Columbia, Canada
| | - Liangliang Wang
- Statistics and Actuarial Science, Simon Fraser University, British Columbia, Canada
| | - Farouk S Nathoo
- Department of Mathematics and Statistics, University of Victoria, British Columbia, Canada
| |
Collapse
|
6
|
Messick RM, Heaton MJ, Hansen N. Multivariate spatial mapping of soil water holding capacity with spatially varying cross-correlations. Ann Appl Stat 2017. [DOI: 10.1214/16-aoas991] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Datta A, Banerjee S, Finley AO, Gelfand AE. Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. J Am Stat Assoc 2016; 111:800-812. [PMID: 29720777 PMCID: PMC5927603 DOI: 10.1080/01621459.2015.1044091] [Citation(s) in RCA: 156] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Revised: 04/01/2015] [Indexed: 10/23/2022]
Abstract
Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online.
Collapse
|
8
|
Datta A, Banerjee S, Finley AO, Gelfand AE. On nearest-neighbor Gaussian process models for massive spatial data. ACTA ACUST UNITED AC 2016; 8:162-171. [PMID: 29657666 DOI: 10.1002/wics.1383] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Gaussian Process (GP) models provide a very flexible nonparametric approach to modeling location-and-time indexed datasets. However, the storage and computational requirements for GP models are infeasible for large spatial datasets. Nearest Neighbor Gaussian Processes (Datta A, Banerjee S, Finley AO, Gelfand AE. Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. J Am Stat Assoc 2016., JASA) provide a scalable alternative by using local information from few nearest neighbors. Scalability is achieved by using the neighbor sets in a conditional specification of the model. We show how this is equivalent to sparse modeling of Cholesky factors of large covariance matrices. We also discuss a general approach to construct scalable Gaussian Processes using sparse local kriging. We present a multivariate data analysis which demonstrates how the nearest neighbor approach yields inference indistinguishable from the full rank GP despite being several times faster. Finally, we also propose a variant of the NNGP model for automating the selection of the neighbor set size.
Collapse
Affiliation(s)
- Abhirup Datta
- Department of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| | - Sudipto Banerjee
- Department of Biostatistics, University of California, Los Angeles, CA, USA
| | - Andrew O Finley
- Department of Forestry, Michigan State University, East Lansing, MI, USA.,Department of Geography, Michigan State University, East Lansing, MI, USA
| | - Alan E Gelfand
- Department of Statistical Science, Duke University, Durham, NC, USA
| |
Collapse
|
9
|
Karabatsos G, Talbott E, Walker SG. A Bayesian nonparametric meta-analysis model. Res Synth Methods 2014; 6:28-44. [PMID: 26035468 DOI: 10.1002/jrsm.1117] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2012] [Revised: 02/04/2014] [Accepted: 03/10/2014] [Indexed: 11/09/2022]
Abstract
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall effect size, such models may be adequate, but for prediction, they surely are not if the effect-size distribution exhibits non-normal behavior. To address this issue, we propose a Bayesian nonparametric meta-analysis model, which can describe a wider range of effect-size distributions, including unimodal symmetric distributions, as well as skewed and more multimodal distributions. We demonstrate our model through the analysis of real meta-analytic data arising from behavioral-genetic research. We compare the predictive performance of the Bayesian nonparametric model against various conventional and more modern normal fixed-effects and random-effects models.
Collapse
Affiliation(s)
- George Karabatsos
- Department of Educational Psychology, Program in Measurement, Evaluation Statistics, and Assessments, College of Education, University of Illinois - Chicago, 1040 W. Harrison St. (MC 147), Chicago, IL, 60607, USA
| | - Elizabeth Talbott
- Department of Special Education, College of Education, University of Illinois - Chicago, Chicago, IL, USA
| | - Stephen G Walker
- Division of Statistics and Scientific Computation, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
10
|
Guhaniyogi R, Finley AO, Banerjee S, Kobe RK. Modeling Complex Spatial Dependencies: Low-Rank Spatially Varying Cross-Covariances With Application to Soil Nutrient Data. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2013. [DOI: 10.1007/s13253-013-0140-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
11
|
Quick H, Banerjee S, Carlin BP. MODELING TEMPORAL GRADIENTS IN REGIONALLY AGGREGATED CALIFORNIA ASTHMA HOSPITALIZATION DATA. Ann Appl Stat 2013; 7:154-176. [PMID: 29606992 DOI: 10.1214/12-aoas600] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Advances in Geographical Information Systems (GIS) have led to the enormous recent burgeoning of spatial-temporal databases and associated statistical modeling. Here we depart from the rather rich literature in space-time modeling by considering the setting where space is discrete (e.g., aggregated data over regions), but time is continuous. Our major objective in this application is to carry out inference on gradients of a temporal process in our data set of monthly county level asthma hospitalization rates in the state of California, while at the same time accounting for spatial similarities of the temporal process across neighboring counties. Use of continuous time models here allows inference at a finer resolution than at which the data are sampled. Rather than use parametric forms to model time, we opt for a more flexible stochastic process embedded within a dynamic Markov random field framework. Through the matrix-valued covariance function we can ensure that the temporal process realizations are mean square differentiable, and may thus carry out inference on temporal gradients in a posterior predictive fashion. We use this approach to evaluate temporal gradients where we are concerned with temporal changes in the residual and fitted rate curves after accounting for seasonality, spatiotemporal ozone levels and several spatially-resolved important sociodemographic covariates.
Collapse
|
12
|
|
13
|
Lindgren F, Rue H, Lindström J. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J R Stat Soc Series B Stat Methodol 2011. [DOI: 10.1111/j.1467-9868.2011.00777.x] [Citation(s) in RCA: 1311] [Impact Index Per Article: 100.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|