1
|
Wang G, Datta A, Lindquist MA. Improved fMRI-based pain prediction using Bayesian group-wise functional registration. Biostatistics 2024; 25:885-903. [PMID: 37805937 DOI: 10.1093/biostatistics/kxad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 08/22/2023] [Accepted: 08/27/2023] [Indexed: 10/10/2023] Open
Abstract
In recent years, the field of neuroimaging has undergone a paradigm shift, moving away from the traditional brain mapping approach towards the development of integrated, multivariate brain models that can predict categories of mental events. However, large interindividual differences in both brain anatomy and functional localization after standard anatomical alignment remain a major limitation in performing this type of analysis, as it leads to feature misalignment across subjects in subsequent predictive models. This article addresses this problem by developing and validating a new computational technique for reducing misalignment across individuals in functional brain systems by spatially transforming each subject's functional data to a common latent template map. Our proposed Bayesian functional group-wise registration approach allows us to assess differences in brain function across subjects and individual differences in activation topology. We achieve the probabilistic registration with inverse-consistency by utilizing the generalized Bayes framework with a loss function for the symmetric group-wise registration. It models the latent template with a Gaussian process, which helps capture spatial features in the template, producing a more precise estimation. We evaluate the method in simulation studies and apply it to data from an fMRI study of thermal pain, with the goal of using functional brain activity to predict physical pain. We find that the proposed approach allows for improved prediction of reported pain scores over conventional approaches. Received on 2 January 2017. Editorial decision on 8 June 2021.
Collapse
Affiliation(s)
- Guoqing Wang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, USA
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, USA
| | - Martin A Lindquist
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, USA
| |
Collapse
|
2
|
Wu Y, Bi J, Gassett AJ, Young MT, Szpiro AA, Kaufman JD. Integrating traffic pollution dispersion into spatiotemporal NO 2 prediction. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 925:171652. [PMID: 38485010 PMCID: PMC11027090 DOI: 10.1016/j.scitotenv.2024.171652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 02/18/2024] [Accepted: 03/09/2024] [Indexed: 03/25/2024]
Abstract
Accurately predicting ambient NO2 concentrations has great public health importance, as traffic-related air pollution is of major concern in urban areas. In this study, we present a novel approach incorporating traffic contribution to NO2 prediction in a fine-scale spatiotemporal model. We used nationally available traffic estimate dataset in a scalable dispersion model, Research LINE source dispersion model (RLINE). RLINE estimates then served as an additional input for a validated spatiotemporal pollution modeling approach. Our analysis uses measurement data collected by the Multi-Ethnic Study of Atherosclerosis and Air Pollution in the greater Los Angeles area between 2006 and 2009. We predicted road-type-specific annual average daily traffic (AADT) on road segments via national-level spatial regression models with nearest-neighbor Gaussian processes (spNNGP); the spNNGP models were trained based on over half a million point-level traffic volume measurements nationwide. AADT estimates on all highways were combined with meteorological data in RLINE models. We evaluated two strategies to integrate RLINE estimates into spatiotemporal NO2 models: 1) incorporating RLINE estimates as a space-only covariate and, 2) as a spatiotemporal covariate. The results showed that integrating the RLINE estimates as a space-only covariate improved overall cross-validation R2 from 0.83 to 0.84, and root mean squared error (RMSE) from 3.58 to 3.48 ppb. Incorporating the estimates as a spatiotemporal covariate resulted in similar model improvement. The improvement of our spatiotemporal model was more profound in roadside monitors alongside highways, with R2 increasing from 0.56 to 0.66 and RMSE decreasing from 3.52 to 3.11 ppb. The observed improvement indicates that the RLINE estimates enhanced the model's predictive capabilities for roadside NO2 concentration gradients even after considering a comprehensive list of geographic covariates including the distance to roads. Our proposed modeling framework can be generalized to improve high-resolution prediction of NO2 exposure - especially near major roads in the U.S.
Collapse
Affiliation(s)
- Yunhan Wu
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jianzhao Bi
- Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, USA.
| | - Amanda J Gassett
- Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, USA
| | - Michael T Young
- Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, USA
| | - Adam A Szpiro
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Joel D Kaufman
- Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, USA
| |
Collapse
|
3
|
Andreyeva T, Moore TE, Godoy LDC, Kenney EL. Federal Nutrition Assistance for Young Children: Underutilized and Unequally Accessed. Am J Prev Med 2024; 66:18-26. [PMID: 37709155 PMCID: PMC11000260 DOI: 10.1016/j.amepre.2023.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 09/07/2023] [Accepted: 09/07/2023] [Indexed: 09/16/2023]
Abstract
INTRODUCTION The federal Child and Adult Care Food Program (CACFP) improves nutrition and reduces food insecurity among young children by helping cover the food costs for child care providers and families. This nationwide study evaluated the extent and predictors of the CACFP's utilization among licensed child care centers to identify opportunities for expanding CACFP nutrition support. METHODS Administrative data from the CACFP and child care licensing agencies in 47 states and District of Columbia were compiled and geocoded for 93,227 licensed child care centers. CACFP participation was predicted using a multivariable Bayesian spatial logistic regression model in the sample of low-income areas to target CACFP eligible child care centers. Data were collected in 2020-2021 and analyzed in 2022. RESULTS Of all licensed child care centers, 36.5% participated in the CACFP, ranging from 15.2% to 65.3% across states; when restricted to low-income areas, 57.5% participated (range, 15.7%-85.7%). Income differences did not explain the large variation in CACFP participation rates across states. Having at least three CACFP sponsoring agencies per state predicted a 38% higher probability of CACFP participation (OR=1.38; 95% Credible Interval=1.08-1.78). CONCLUSIONS Currently CACFP participation rates among licensed child care centers point to program underutilization and unequal access, particularly in some states and regions. Work at the federal and state levels is warranted to expand participation in the program, above all in low-income areas, so that more young children could eat healthfully with the CACFP.
Collapse
Affiliation(s)
- Tatiana Andreyeva
- Department of Agricultural and Resource Economics, Rudd Center for Food Policy and Health, University of Connecticut, Storrs, Connecticut.
| | - Timothy E Moore
- Statistical Consulting Services, Center for Open Research Resources and Equipment, University of Connecticut, Storrs, Connecticut
| | | | - Erica L Kenney
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| |
Collapse
|
4
|
Di Loro PA, Mingione M, Lipsitt J, Batteate CM, Jerrett M, Banerjee S. BAYESIAN HIERARCHICAL MODELING AND ANALYSIS FOR ACTIGRAPH DATA FROM WEARABLE DEVICES. Ann Appl Stat 2023; 17:2865-2886. [PMID: 38283128 PMCID: PMC10815935 DOI: 10.1214/23-aoas1742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
The majority of Americans fail to achieve recommended levels of physical activity, which leads to numerous preventable health problems such as diabetes, hypertension, and heart diseases. This has generated substantial interest in monitoring human activity to gear interventions toward environmental features that may relate to higher physical activity. Wearable devices, such as wrist-worn sensors that monitor gross motor activity (actigraph units) continuously record the activity levels of a subject, producing massive amounts of high-resolution measurements. Analyzing actigraph data needs to account for spatial and temporal information on trajectories or paths traversed by subjects wearing such devices. Inferential objectives include estimating a subject's physical activity levels along a given trajectory; identifying trajectories that are more likely to produce higher levels of physical activity for a given subject; and predicting expected levels of physical activity in any proposed new trajectory for a given set of health attributes. Here, we devise a Bayesian hierarchical modeling framework for spatial-temporal actigraphy data to deliver fully model-based inference on trajectories while accounting for subject-level health attributes and spatial-temporal dependencies. We undertake a comprehensive analysis of an original dataset from the Physical Activity through Sustainable Transport Approaches in Los Angeles (PASTA-LA) study to ascertain spatial zones and trajectories exhibiting significantly higher levels of physical activity while accounting for various sources of heterogeneity.
Collapse
Affiliation(s)
| | | | - Jonah Lipsitt
- Department of Environmental Health Sciences, University of California, Los Angeles
| | - Christina M. Batteate
- Center of Occupational and Environmental Health, University of California, Los Angeles
| | - Michael Jerrett
- Department of Environmental Health Sciences, University of California, Los Angeles
| | - Sudipto Banerjee
- Department of Biostatistics, University of California, Los Angeles
| |
Collapse
|
5
|
Heffernan C, PenG R, Gentner DR, Koehler K, Datta A. A DYNAMIC SPATIAL FILTERING APPROACH TO MITIGATE UNDERESTIMATION BIAS IN FIELD CALIBRATED LOW-COST SENSOR AIR POLLUTION DATA. Ann Appl Stat 2023; 17:3056-3087. [PMID: 38646662 PMCID: PMC11031266 DOI: 10.1214/23-aoas1751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Low-cost air pollution sensors, offering hyper-local characterization of pollutant concentrations, are becoming increasingly prevalent in environmental and public health research. However, low-cost air pollution data can be noisy, biased by environmental conditions, and usually need to be field-calibrated by collocating low-cost sensors with reference-grade instruments. We show, theoretically and empirically, that the common procedure of regression-based calibration using collocated data systematically underestimates high air pollution concentrations, which are critical to diagnose from a health perspective. Current calibration practices also often fail to utilize the spatial correlation in pollutant concentrations. We propose a novel spatial filtering approach to collocation-based calibration of low-cost networks that mitigates the underestimation issue by using an inverse regression. The inverse-regression also allows for incorporating spatial correlations by a second-stage model for the true pollutant concentrations using a conditional Gaussian Process. Our approach works with one or more collocated sites in the network and is dynamic, leveraging spatial correlation with the latest available reference data. Through extensive simulations, we demonstrate how the spatial filtering substantially improves estimation of pollutant concentrations, and measures peak concentrations with greater accuracy. We apply the methodology for calibration of a low-cost PM2.5 network in Baltimore, Maryland, and diagnose air pollution peaks that are missed by the regression-calibration.
Collapse
Affiliation(s)
| | - Roger PenG
- Department of Statistics and Data Sciences, University of Texas, Austin
| | - Drew R. Gentner
- Department of Chemical & Environmental Engineering, Yale University
| | - Kirsten Koehler
- Department of Environmental Health and Engineering, Johns Hopkins University
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins University
| |
Collapse
|
6
|
Egbon OA, Nascimento D, Louzada F. Prior elicitation for Gaussian spatial process: An application to TMS brain mapping. Stat Med 2023; 42:3956-3980. [PMID: 37665049 DOI: 10.1002/sim.9842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/09/2023] [Accepted: 06/20/2023] [Indexed: 09/05/2023]
Abstract
The power and commensurate prior distributions are informative prior distributions that incorporate historical data as prior knowledge in Bayesian analysis to improve inference about a phenomenon under study. Although these distributions have been developed for analyzing non-spatial data, little or no attention has been given to spatial geostatistical data. In this study, we extend these informative prior distributions to a Gaussian spatial process, which enables the elicitation of prior knowledge from historical geostatistical data for Bayesian analysis. Three informative prior distributions were developed for spatial modeling, and an efficient Markov Chain Monte Carlo algorithm was developed for performing Bayesian analysis. Simulation studies were used to assess the adequacy of the informative prior distributions. Hierarchical models combined with the developed informative prior distributions were applied to analyze transcranial magnetic stimulation (TMS) brain mapping data to gain insights into the spatial pattern of a patient's response to motor cortex stimulation. The study quantified the uncertainty in motor response and found that the primary motor cortex of the hand is responsible for most of the movement of the right first dorsal interosseous muscle. The findings provide a deeper understanding of the neural mechanisms underlying motor function and ultimately aid the improvement of treatment options for individuals with health issues.
Collapse
Affiliation(s)
- Osafu Augustine Egbon
- Institute of Mathematical and Computer Sciences, Universidade de São Paulo, São Carlos, Brazil
- Department of Statistics, Universidade Federal de São Carlos, São Carlos, Brazil
- Institute of Statistics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Diego Nascimento
- Departamento de Matemáticas, Universidad de Atacama, Copiapó, Chile
| | - Francisco Louzada
- Institute of Mathematical and Computer Sciences, Universidade de São Paulo, São Carlos, Brazil
| |
Collapse
|
7
|
DEY D, DATTA A, BANERJEE S. Modeling Multivariate Spatial Dependencies Using Graphical Models. THE NEW ENGLAND JOURNAL OF STATISTICS IN DATA SCIENCE 2023; 1:283-295. [PMID: 37817840 PMCID: PMC10563032 DOI: 10.51387/23-nejsds47] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2023]
Abstract
Graphical models have witnessed significant growth and usage in spatial data science for modeling data referenced over a massive number of spatial-temporal coordinates. Much of this literature has focused on a single or relatively few spatially dependent outcomes. Recent attention has focused upon addressing modeling and inference for substantially large number of outcomes. While spatial factor models and multivariate basis expansions occupy a prominent place in this domain, this article elucidates a recent approach, graphical Gaussian Processes, that exploits the notion of conditional independence among a very large number of spatial processes to build scalable graphical models for fully model-based Bayesian analysis of multivariate spatial data.
Collapse
Affiliation(s)
- Debangan DEY
- Department of Biostatistics, Johns Hopkins University, USA
| | - Abhirup DATTA
- Department of Biostatistics, Johns Hopkins University, USA
| | - Sudipto BANERJEE
- Department of Biostatistics, University of California Los Angeles, USA
| |
Collapse
|
8
|
Diana A, Dennis EB, Matechou E, Morgan BJT. Fast Bayesian inference for large occupancy datasets. Biometrics 2023; 79:2503-2515. [PMID: 36579700 DOI: 10.1111/biom.13816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 12/06/2022] [Indexed: 12/30/2022]
Abstract
In recent years, the study of species' occurrence has benefited from the increased availability of large-scale citizen-science data. While abundance data from standardized monitoring schemes are biased toward well-studied taxa and locations, opportunistic data are available for many taxonomic groups, from a large number of locations and across long timescales. Hence, these data provide opportunities to measure species' changes in occurrence, particularly through the use of occupancy models, which account for imperfect detection. These opportunistic datasets can be substantially large, numbering hundreds of thousands of sites, and hence present a challenge from a computational perspective, especially within a Bayesian framework. In this paper, we develop a unifying framework for Bayesian inference in occupancy models that account for both spatial and temporal autocorrelation. We make use of the Pólya-Gamma scheme, which allows for fast inference, and incorporate spatio-temporal random effects using Gaussian processes (GPs), for which we consider two efficient approximations: subset of regressors and nearest neighbor GPs. We apply our model to data on two UK butterfly species, one common and widespread and one rare, using records from the Butterflies for the New Millennium database, producing occupancy indices spanning 45 years. Our framework can be applied to a wide range of taxa, providing measures of variation in species' occurrence, which are used to assess biodiversity change.
Collapse
Affiliation(s)
- Alex Diana
- School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, UK
| | - Emily Beth Dennis
- School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, UK
- Butterfly Conservation, Manor Yard, East Lulworth, Wareham, Dorset, UK
| | - Eleni Matechou
- School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, UK
| | | |
Collapse
|
9
|
Weber LM, Saha A, Datta A, Hansen KD, Hicks SC. nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes. Nat Commun 2023; 14:4059. [PMID: 37429865 DOI: 10.1038/s41467-023-39748-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 06/23/2023] [Indexed: 07/12/2023] Open
Abstract
Feature selection to identify spatially variable genes or other biologically informative genes is a key step during analyses of spatially-resolved transcriptomics data. Here, we propose nnSVG, a scalable approach to identify spatially variable genes based on nearest-neighbor Gaussian processes. Our method (i) identifies genes that vary in expression continuously across the entire tissue or within a priori defined spatial domains, (ii) uses gene-specific estimates of length scale parameters within the Gaussian process models, and (iii) scales linearly with the number of spatial locations. We demonstrate the performance of our method using experimental data from several technological platforms and simulations. A software implementation is available at https://bioconductor.org/packages/nnSVG .
Collapse
Affiliation(s)
- Lukas M Weber
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Arkajyoti Saha
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Kasper D Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
10
|
Orozco-Acosta E, Adin A, Ugarte MD. Big problems in spatio-temporal disease mapping: Methods and software. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 231:107403. [PMID: 36773590 DOI: 10.1016/j.cmpb.2023.107403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 01/12/2023] [Accepted: 02/01/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Fitting spatio-temporal models for areal data is crucial in many fields such as cancer epidemiology. However, when data sets are very large, many issues arise. The main objective of this paper is to propose a general procedure to analyze high-dimensional spatio-temporal areal data, with special emphasis on mortality/incidence relative risk estimation. METHODS We present a pragmatic and simple idea that permits hierarchical spatio-temporal models to be fitted when the number of small areas is very large. Model fitting is carried out using integrated nested Laplace approximations over a partition of the spatial domain. We also use parallel and distributed strategies to speed up computations in a setting where Bayesian model fitting is generally prohibitively time-consuming or even unfeasible. RESULTS Using simulated and real data, we show that our method outperforms classical global models. We implement the methods and algorithms that we develop in the open-source R package bigDM where specific vignettes have been included to facilitate the use of the methodology for non-expert users. CONCLUSIONS Our scalable methodology proposal provides reliable risk estimates when fitting Bayesian hierarchical spatio-temporal models for high-dimensional data.
Collapse
Affiliation(s)
- Erick Orozco-Acosta
- Department of Statistics, Computer Science and Mathematics, Public University of Navarre, Campus de Arrosadia, 31006 Pamplona, Spain; Institute for Advanced Materials and Mathematics (InaMat2), Public University of Navarre, Campus de Arrosadia, 31006 Pamplona, Spain.
| | - Aritz Adin
- Department of Statistics, Computer Science and Mathematics, Public University of Navarre, Campus de Arrosadia, 31006 Pamplona, Spain; Institute for Advanced Materials and Mathematics (InaMat2), Public University of Navarre, Campus de Arrosadia, 31006 Pamplona, Spain.
| | - María Dolores Ugarte
- Department of Statistics, Computer Science and Mathematics, Public University of Navarre, Campus de Arrosadia, 31006 Pamplona, Spain; Institute for Advanced Materials and Mathematics (InaMat2), Public University of Navarre, Campus de Arrosadia, 31006 Pamplona, Spain.
| |
Collapse
|
11
|
Townes FW, Engelhardt BE. Nonnegative spatial factorization applied to spatial genomics. Nat Methods 2023; 20:229-238. [PMID: 36587187 PMCID: PMC9911348 DOI: 10.1038/s41592-022-01687-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 10/17/2022] [Indexed: 01/01/2023]
Abstract
Nonnegative matrix factorization (NMF) is widely used to analyze high-dimensional count data because, in contrast to real-valued alternatives such as factor analysis, it produces an interpretable parts-based representation. However, in applications such as spatial transcriptomics, NMF fails to incorporate known structure between observations. Here, we present nonnegative spatial factorization (NSF), a spatially-aware probabilistic dimension reduction model based on transformed Gaussian processes that naturally encourages sparsity and scales to tens of thousands of observations. NSF recovers ground truth factors more accurately than real-valued alternatives such as MEFISTO in simulations, and has lower out-of-sample prediction error than probabilistic NMF on three spatial transcriptomics datasets from mouse brain and liver. Since not all patterns of gene expression have spatial correlations, we also propose a hybrid extension of NSF that combines spatial and nonspatial components, enabling quantification of spatial importance for both observations and features. A TensorFlow implementation of NSF is available from https://github.com/willtownes/nsf-paper .
Collapse
Affiliation(s)
- F. William Townes
- grid.147455.60000 0001 2097 0344Present Address: Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA USA
| | - Barbara E. Engelhardt
- grid.249878.80000 0004 0572 7110Present Address: Data Science and Biotechnology Institute, Gladstone Institutes, San Francisco, CA USA ,grid.168010.e0000000419368956Present Address: Department of Biomedical Data Science, Stanford University, Stanford, CA USA
| |
Collapse
|
12
|
Whiteman AS, Bartsch AJ, Kang J, Johnson TD. Bayesian inference for brain activity from functional magnetic resonance imaging collected at two spatial resolutions. Ann Appl Stat 2022; 16:2626-2647. [DOI: 10.1214/22-aoas1606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Andrew S. Whiteman
- Department of Biostatistics, University of Michigan School of Public Health
| | - Andreas J. Bartsch
- Radiologie Bamberg and Department of Neuroradiology, University of Heidelberg
| | - Jian Kang
- Department of Biostatistics, University of Michigan School of Public Health
| | - Timothy D. Johnson
- Department of Biostatistics, University of Michigan School of Public Health
| |
Collapse
|
13
|
Weinstein SM, Vandekar SN, Baller EB, Tu D, Adebimpe A, Tapera TM, Gur RC, Gur RE, Detre JA, Raznahan A, Alexander-Bloch AF, Satterthwaite TD, Shinohara RT, Park JY. Spatially-enhanced clusterwise inference for testing and localizing intermodal correspondence. Neuroimage 2022; 264:119712. [PMID: 36309332 PMCID: PMC10062374 DOI: 10.1016/j.neuroimage.2022.119712] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/16/2022] [Accepted: 10/25/2022] [Indexed: 11/05/2022] Open
Abstract
With the increasing availability of neuroimaging data from multiple modalities-each providing a different lens through which to study brain structure or function-new techniques for comparing, integrating, and interpreting information within and across modalities have emerged. Recent developments include hypothesis tests of associations between neuroimaging modalities, which can be used to determine the statistical significance of intermodal associations either throughout the entire brain or within anatomical subregions or functional networks. While these methods provide a crucial foundation for inference on intermodal relationships, they cannot be used to answer questions about where in the brain these associations are most pronounced. In this paper, we introduce a new method, called CLEAN-R, that can be used both to test intermodal correspondence throughout the brain and also to localize this correspondence. Our method involves first adjusting for the underlying spatial autocorrelation structure within each modality before aggregating information within small clusters to construct a map of enhanced test statistics. Using structural and functional magnetic resonance imaging data from a subsample of children and adolescents from the Philadelphia Neurodevelopmental Cohort, we conduct simulations and data analyses where we illustrate the high statistical power and nominal type I error levels of our method. By constructing an interpretable map of group-level correspondence using spatially-enhanced test statistics, our method offers insights beyond those provided by earlier methods.
Collapse
Affiliation(s)
- Sarah M Weinstein
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Simon N Vandekar
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Erica B Baller
- Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Danni Tu
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Azeez Adebimpe
- Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA; Strategy Innovation & Deployment Section, Johnson and Johnson, Raritan, NJ, 08869, USA
| | - Tinashe M Tapera
- Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Ruben C Gur
- Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Raquel E Gur
- Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - John A Detre
- Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Armin Raznahan
- Section on Developmental Neurogenomics, National Institute of Mental Health Intramural Research Program, Bethesda, MD 20892, USA
| | - Aaron F Alexander-Bloch
- Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA; Department of Child and Adolescent Psychiatry and Behavioral Science, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Theodore D Satterthwaite
- Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Russell T Shinohara
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Jun Young Park
- Department of Statistical Sciences and Department of Psychology, University of Toronto, Toronto, ON, M5G 1Z5, Canada.
| |
Collapse
|
14
|
Saha A, Datta A, Banerjee S. Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes. JOURNAL OF DATA SCIENCE : JDS 2022; 20:533-544. [PMID: 37786782 PMCID: PMC10544813 DOI: 10.6339/22-jds1073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
Spatial probit generalized linear mixed models (spGLMM) with a linear fixed effect and a spatial random effect, endowed with a Gaussian Process prior, are widely used for analysis of binary spatial data. However, the canonical Bayesian implementation of this hierarchical mixed model can involve protracted Markov Chain Monte Carlo sampling. Alternate approaches have been proposed that circumvent this by directly representing the marginal likelihood from spGLMM in terms of multivariate normal cummulative distribution functions (cdf). We present a direct and fast rendition of this latter approach for predictions from a spatial probit linear mixed model. We show that the covariance matrix of the cdf characterizing the marginal cdf of binary spatial data from spGLMM is amenable to approximation using Nearest Neighbor Gaussian Processes (NNGP). This facilitates a scalable prediction algorithm for spGLMM using NNGP that only involves sparse or small matrix computations and can be deployed in an embarrassingly parallel manner. We demonstrate the accuracy and scalability of the algorithm via numerous simulation experiments and an analysis of species presence-absence data.
Collapse
Affiliation(s)
- Arkajyoti Saha
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | - Sudipto Banerjee
- UCLA Department of Biostatistics, 650 Charles E. Young Drive South, University of California Los Angeles, CA 90095-1772, USA
| |
Collapse
|
15
|
Sauer A, Cooper A, Gramacy RB. Vecchia-approximated Deep Gaussian Processes for Computer Experiments. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2129662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]
|
16
|
Wang G, Datta A, Lindquist MA. BAYESIAN FUNCTIONAL REGISTRATION OF FMRI ACTIVATION MAPS. Ann Appl Stat 2022; 16:1676-1699. [PMID: 37396344 PMCID: PMC10312483 DOI: 10.1214/21-aoas1562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
Functional magnetic resonance imaging (fMRI) has provided invaluable insight into our understanding of human behavior. However, large inter-individual differences in both brain anatomy and functional localization after anatomical alignment remain a major limitation in conducting group analyses and performing population level inference. This paper addresses this problem by developing and validating a new computational technique for reducing misalignment across individuals in functional brain systems by spatially transforming each subjects functional data to a common reference map. Our proposed Bayesian functional registration approach allows us to assess differences in brain function across subjects and individual differences in activation topology. It combines intensity-based and feature-based information into an integrated framework, and allows inference to be performed on the transformation via the posterior samples. We evaluate the method in a simulation study and apply it to data from a study of thermal pain. We find that the proposed approach provides increased sensitivity for group-level inference.
Collapse
Affiliation(s)
- Guoqing Wang
- Department of Biostatistics, Johns Hopkins University
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins University
| | | |
Collapse
|
17
|
Park JY, Fiecas M. CLEAN: Leveraging spatial autocorrelation in neuroimaging data in clusterwise inference. Neuroimage 2022; 255:119192. [PMID: 35398279 DOI: 10.1016/j.neuroimage.2022.119192] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 04/02/2022] [Accepted: 04/05/2022] [Indexed: 01/01/2023] Open
Abstract
While clusterwise inference is a popular approach in neuroimaging that improves sensitivity, current methods do not account for explicit spatial autocorrelations because most use univariate test statistics to construct cluster-extent statistics. Failure to account for such dependencies could result in decreased reproducibility. To address methodological and computational challenges, we propose a new powerful and fast statistical method called CLEAN (Clusterwise inference Leveraging spatial Autocorrelations in Neuroimaging). CLEAN computes multivariate test statistics by modelling brain-wise spatial autocorrelations, constructs cluster-extent test statistics, and applies a refitting-free resampling approach to control false positives. We validate CLEAN using simulations and applications to the Human Connectome Project. This novel method provides a new direction in neuroimaging that paces with advances in high-resolution MRI data which contains a substantial amount of spatial autocorrelation.
Collapse
Affiliation(s)
- Jun Young Park
- Department of Statistical Sciences and Department of Psychology, University of Toronto, Toronto, ON M5S, Canada.
| | - Mark Fiecas
- Division of Biostatistics, University of Minnesota School of Public Health, Minneapolis, MN 55455, USA
| |
Collapse
|
18
|
Doser JW, Finley AO, Kéry M, Zipkin EF. spOccupancy
: An R package for single‐species, multi‐species, and integrated spatial occupancy models. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Jeffrey W. Doser
- Department of Forestry Michigan State University East Lansing MI USA
- Ecology, Evolution, and Behavior Program Michigan State University East Lansing MI USA
| | - Andrew O. Finley
- Department of Forestry Michigan State University East Lansing MI USA
- Ecology, Evolution, and Behavior Program Michigan State University East Lansing MI USA
| | - Marc Kéry
- Swiss Ornithological Institute Sempach Switzerland
| | - Elise F. Zipkin
- Ecology, Evolution, and Behavior Program Michigan State University East Lansing MI USA
- Department of Integrative Biology Michigan State University East Lansing MI USA
| |
Collapse
|
19
|
Improving performances of MCMC for Nearest Neighbor Gaussian Process models with full data augmentation. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2021.107368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
20
|
Peruzzi M, Dunson DB. Spatial Multivariate Trees for Big Data Bayesian Regression. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2022; 23:17. [PMID: 35891979 PMCID: PMC9311452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
High resolution geospatial data are challenging because standard geostatistical models based on Gaussian processes are known to not scale to large data sizes. While progress has been made towards methods that can be computed more efficiently, considerably less attention has been devoted to methods for large scale data that allow the description of complex relationships between several outcomes recorded at high resolutions by different sensors. Our Bayesian multivariate regression models based on spatial multivariate trees (SpamTrees) achieve scalability via conditional independence assumptions on latent random effects following a treed directed acyclic graph. Information-theoretic arguments and considerations on computational efficiency guide the construction of the tree and the related efficient sampling algorithms in imbalanced multivariate settings. In addition to simulated data examples, we illustrate SpamTrees using a large climate data set which combines satellite data with land-based station data. Software and source code are available on CRAN at https://CRAN.R-project.org/package=spamtree.
Collapse
Affiliation(s)
- Michele Peruzzi
- Department of Statistical Science, Duke University, Durham, NC 27708-0251, USA
| | - David B Dunson
- Department of Statistical Science, Duke University, Durham, NC 27708-0251, USA
| |
Collapse
|
21
|
Bailey MD, Bandyopadhyay S, Nychka DW. Adapting conditional simulation using circulant embedding for irregularly spaced spatial data. Stat (Int Stat Inst) 2021. [DOI: 10.1002/sta4.446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Maggie D. Bailey
- Department of Applied Mathematics and Statistics Colorado School of Mines Colorado USA
| | - Soutir Bandyopadhyay
- Department of Applied Mathematics and Statistics Colorado School of Mines Colorado USA
- CO USA
| | - Douglas W. Nychka
- Department of Applied Mathematics and Statistics Colorado School of Mines Colorado USA
| |
Collapse
|
22
|
Liu J, Chu T, Zhu J, Wang H. Large spatial data modeling and analysis: A Krylov subspace approach. Scand Stat Theory Appl 2021. [DOI: 10.1111/sjos.12555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Jialuo Liu
- Department of Statistics Colorado State University Fort Collins Colorado USA
| | - Tingjin Chu
- School of Mathematics and Statistics University of Melbourne Melbourne Victoria Australia
| | - Jun Zhu
- Department of Statistics University of Wisconsin‐Madison Madison Wisconsin USA
| | - Haonan Wang
- Department of Statistics Colorado State University Fort Collins Colorado USA
| |
Collapse
|
23
|
Chen X, Tokdar ST. Joint quantile regression for spatial data. J R Stat Soc Series B Stat Methodol 2021. [DOI: 10.1111/rssb.12467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Xu Chen
- Department of Statistical Science Duke University Durham North Carolina USA
| | - Surya T. Tokdar
- Department of Statistical Science Duke University Durham North Carolina USA
| |
Collapse
|
24
|
Affiliation(s)
- Arkajyoti Saha
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD
| | - Sumanta Basu
- Department of Statistics and Data Science, Cornell University, Ithaca, NY
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
25
|
Grenier I, Sansó B. Distributed nearest-neighbor Gaussian processes. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.1921798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Isabelle Grenier
- Department of Statistics, University of California, Santa Cruz, California, USA
| | - Bruno Sansó
- Department of Statistics, University of California, Santa Cruz, California, USA
| |
Collapse
|
26
|
Ma P, Mondal A, Konomi BA, Hobbs J, Song JJ, Kang EL. Computer Model Emulation with High-Dimensional Functional Output in Large-Scale Observing System Uncertainty Experiments. Technometrics 2021. [DOI: 10.1080/00401706.2021.1895890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Pulong Ma
- Statistical and Applied Mathematical Sciences Institute, Durham, NC
- Department of Statistical Sciences, Duke University, Durham, NC
| | - Anirban Mondal
- Department of Mathematics, Applied Mathematics, and Statistics, Case Western Reserve University, Cleveland, OH
| | - Bledar A. Konomi
- Division of Statistics and Data Science, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH
| | - Jonathan Hobbs
- Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA
| | - Joon Jin Song
- Department of Statistical Science, Baylor University, Waco, TX
| | - Emily L. Kang
- Division of Statistics and Data Science, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH
| |
Collapse
|
27
|
Gerber F, Nychka DW. Parallel cross-validation: A scalable fitting method for Gaussian process models. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2020.107113] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
28
|
Katzfuss M, Guinness J. A General Framework for Vecchia Approximations of Gaussian Processes. Stat Sci 2021. [DOI: 10.1214/19-sts755] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
29
|
Peruzzi M, Banerjee S, Finley AO. Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains. J Am Stat Assoc 2020; 117:969-982. [DOI: 10.1080/01621459.2020.1833889] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Michele Peruzzi
- Department of Forestry, Michigan State University, East Lansing, MI
- Department of Statistical Science, Duke University, Durham, NC
| | - Sudipto Banerjee
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, CA
| | - Andrew O. Finley
- Department of Forestry, Michigan State University, East Lansing, MI
| |
Collapse
|
30
|
Katzfuss M, Guinness J, Gong W, Zilber D. Vecchia Approximations of Gaussian-Process Predictions. JOURNAL OF AGRICULTURAL, BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2020. [DOI: 10.1007/s13253-020-00401-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
31
|
Evaluating Methodology for the Service Extent of Refugee Parks in Changchun, China. SUSTAINABILITY 2020. [DOI: 10.3390/su12145715] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Refugee parks are general parks that can serve as emergency shelters in cities. The core issue of refugee parks lies in their service extent they provided. Globally, the service extent of refugee parks is determined by the Euclidean or actual road network distance methods. The former lacks measurement accuracy, whereas the latter lacks the consideration of human dimension and proximity. Hence, we propose the nearest neighbor method, which considers not only the locations of refugee parks and sub-districts, but also road networks and census data. Using this method, we evaluated the service extent of refugee parks in Changchun, northern China. We compared our results with the Euclidean distance method. Results showed that the nearest neighbor method effectively accounted for the effect of road network resistance and results aligned with the refuge needs of residents. Differences in both methods were mainly affected by the size of the parks and local road network and population densities. The Euclidean approach determines the service extent based on a unified service radius, thus producing greater errors. The nearest neighbor method can reveal the spatial imbalance of refugee parks, as well as the mismatch between the park size and population distribution. Furthermore, the nearest neighbor method implements policies of spatial optimization of urban refugee parks. As a general method, it should be suited to different types of disasters.
Collapse
|
32
|
Risser MD, Turek D. Bayesian inference for high-dimensional nonstationary Gaussian processes. J STAT COMPUT SIM 2020. [DOI: 10.1080/00949655.2020.1792472] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Mark D. Risser
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | |
Collapse
|
33
|
Banerjee S. Modeling Massive Spatial Datasets Using a Conjugate Bayesian Linear Modeling Framework. SPATIAL STATISTICS 2020; 37:100417. [PMID: 35265456 PMCID: PMC8903183 DOI: 10.1016/j.spasta.2020.100417] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Geographic Information Systems (GIS) and related technologies have generated substantial interest among statisticians with regard to scalable methodologies for analyzing large spatial datasets. A variety of scalable spatial process models have been proposed that can be easily embedded within a hierarchical modeling framework to carry out Bayesian inference. While the focus of statistical research has mostly been directed toward innovative and more complex model development, relatively limited attention has been accorded to approaches for easily implementable scalable hierarchical models for the practicing scientist or spatial analyst. This article discusses how point-referenced spatial process models can be cast as a conjugate Bayesian linear regression that can rapidly deliver inference on spatial processes. The approach allows exact sampling directly (avoids iterative algorithms such as Markov chain Monte Carlo) from the joint posterior distribution of regression parameters, the latent process and the predictive random variables, and can be easily implemented on statistical programming environments such as R.
Collapse
Affiliation(s)
- Sudipto Banerjee
- Sudipto Banerjee is Professor and Chair of the Department of Biostatistics in the University of California, Los Angeles, USA
| |
Collapse
|
34
|
Tikhonov G, Duan L, Abrego N, Newell G, White M, Dunson D, Ovaskainen O. Computationally efficient joint species distribution modeling of big spatial data. Ecology 2020; 101:e02929. [PMID: 31725922 PMCID: PMC7027487 DOI: 10.1002/ecy.2929] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Revised: 07/24/2019] [Accepted: 08/23/2019] [Indexed: 11/19/2022]
Abstract
The ongoing global change and the increased interest in macroecological processes call for the analysis of spatially extensive data on species communities to understand and forecast distributional changes of biodiversity. Recently developed joint species distribution models can deal with numerous species efficiently, while explicitly accounting for spatial structure in the data. However, their applicability is generally limited to relatively small spatial data sets because of their severe computational scaling as the number of spatial locations increases. In this work, we propose a practical alleviation of this scalability constraint for joint species modeling by exploiting two spatial-statistics techniques that facilitate the analysis of large spatial data sets: Gaussian predictive process and nearest-neighbor Gaussian process. We devised an efficient Gibbs posterior sampling algorithm for Bayesian model fitting that allows us to analyze community data sets consisting of hundreds of species sampled from up to hundreds of thousands of spatial units. The performance of these methods is demonstrated using an extensive plant data set of 30,955 spatial units as a case study. We provide an implementation of the presented methods as an extension to the hierarchical modeling of species communities framework.
Collapse
Affiliation(s)
- Gleb Tikhonov
- Organismal and Evolutionary Biology Research ProgrammeUniversity of HelsinkiP.O. Box 65FI‐00014HelsinkiFinland
- Computational Systems Biology GroupDepartment of Computer ScienceAalto UniversityP.O. Box 11000FI‐00076EspooFinland
| | - Li Duan
- Department of StatisticsUniversity of FloridaP.O. Box 118545GainesvilleFlorida32611USA
| | - Nerea Abrego
- Faculty of Biological and Environmental SciencesUniversity of HelsinkiP.O. Box 65FI‐00014HelsinkiFinland
| | - Graeme Newell
- Biodiversity DivisionDepartment of Environment, Land, Water & PlanningArthur Rylah Institute for Environmental Research123 Brown StreetHeidelbergVictoria3084Australia
| | - Matt White
- Biodiversity DivisionDepartment of Environment, Land, Water & PlanningArthur Rylah Institute for Environmental Research123 Brown StreetHeidelbergVictoria3084Australia
| | - David Dunson
- Department of Statistical ScienceDuke UniversityP.O. Box 90251DurhamNorth CarolinaUSA
| | - Otso Ovaskainen
- Organismal and Evolutionary Biology Research ProgrammeUniversity of HelsinkiP.O. Box 65FI‐00014HelsinkiFinland
- Centre for Biodiversity DynamicsDepartment of BiologyNorwegian University of Science and TechnologyN‐7491TrondheimNorway
| |
Collapse
|
35
|
Taylor-Rodriguez D, Finley AO, Datta A, Babcock C, Andersen HE, Cook BD, Morton DC, Banerjee S. Spatial Factor Models for High-Dimensional and Large Spatial Data: An Application in Forest Variable Mapping. Stat Sin 2019; 29:1155-1180. [PMID: 33311955 PMCID: PMC7731981 DOI: 10.5705/ss.202018.0005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Gathering information about forest variables is an expensive and arduous activity. As such, directly collecting the data required to produce high-resolution maps over large spatial domains is infeasible. Next generation collection initiatives of remotely sensed Light Detection and Ranging (LiDAR) data are specifically aimed at producing complete-coverage maps over large spatial domains. Given that LiDAR data and forest characteristics are often strongly correlated, it is possible to make use of the former to model, predict, and map forest variables over regions of interest. This entails dealing with the high-dimensional (~102) spatially dependent LiDAR outcomes over a large number of locations (~105-106). With this in mind, we develop the Spatial Factor Nearest Neighbor Gaussian Process (SF-NNGP) model, and embed it in a two-stage approach that connects the spatial structure found in LiDAR signals with forest variables. We provide a simulation experiment that demonstrates inferential and predictive performance of the SF-NNGP, and use the two-stage modeling strategy to generate complete-coverage maps of forest variables with associated uncertainty over a large region of boreal forests in interior Alaska.
Collapse
Affiliation(s)
| | - Andrew O. Finley
- Department of Forestry, Michigan State University, East Lansing, MI
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins University, Baltimore, MA
| | - Chad Babcock
- School of Environmental and Forest Sciences, University of Washington, Seattle, WA
| | | | - Bruce D. Cook
- Biospheric Sciences Laboratory, NASA Goddard Space Flight Center, Greenbelt, MD
| | - Douglas C. Morton
- Biospheric Sciences Laboratory, NASA Goddard Space Flight Center, Greenbelt, MD
| | - Sudipto Banerjee
- Department of Biostatistics, University of California Los Angeles, Los Angeles, CA
| |
Collapse
|
36
|
Heaton MJ, Datta A, Finley AO, Furrer R, Guinness J, Guhaniyogi R, Gerber F, Gramacy RB, Hammerling D, Katzfuss M, Lindgren F, Nychka DW, Sun F, Zammit-Mangion A. A Case Study Competition Among Methods for Analyzing Large Spatial Data. JOURNAL OF AGRICULTURAL, BIOLOGICAL, AND ENVIRONMENTAL STATISTICS 2018; 24:398-425. [PMID: 31496633 PMCID: PMC6709111 DOI: 10.1007/s13253-018-00348-w] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 12/05/2018] [Indexed: 10/27/2022]
Abstract
The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online. ELECTRONIC SUPPLEMENTARY MATERIAL Supplementary materials for this article are available at 10.1007/s13253-018-00348-w.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Furong Sun
- Brigham Young University, Provo, UT USA
| | | |
Collapse
|