1
|
Watson GL, Reid CE, Jerrett M, Telesca D. Prediction and model evaluation for space-time data. J Appl Stat 2023; 51:2007-2024. [PMID: 39071250 PMCID: PMC11271132 DOI: 10.1080/02664763.2023.2252208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 08/21/2023] [Indexed: 07/30/2024]
Abstract
Evaluation metrics for prediction error, model selection and model averaging on space-time data are understudied and poorly understood. The absence of independent replication makes prediction ambiguous as a concept and renders evaluation procedures developed for independent data inappropriate for most space-time prediction problems. Motivated by air pollution data collected during California wildfires in 2008, this manuscript attempts a formalization of the true prediction error associated with spatial interpolation. We investigate a variety of cross-validation (CV) procedures employing both simulations and case studies to provide insight into the nature of the estimand targeted by alternative data partition strategies. Consistent with recent best practice, we find that location-based cross-validation is appropriate for estimating spatial interpolation error as in our analysis of the California wildfire data. Interestingly, commonly held notions of bias-variance trade-off of CV fold size do not trivially apply to dependent data, and we recommend leave-one-location-out (LOLO) CV as the preferred prediction error metric for spatial interpolation.
Collapse
Affiliation(s)
- G. L. Watson
- Department of Biostatistics, University of California, Los Angeles, CA, USA
| | - C. E. Reid
- Department of Geography, University of Colorado, Boulder, CO, USA
| | - M. Jerrett
- Department of Environmental Health Sciences, University of California, Los Angeles, CA, USA
| | - D. Telesca
- Department of Biostatistics, University of California, Los Angeles, CA, USA
| |
Collapse
|
2
|
Sharrock L, Kantas N. Two-timescale stochastic gradient descent in continuous time with applications to joint online parameter estimation and optimal sensor placement. BERNOULLI 2023. [DOI: 10.3150/22-bej1493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Affiliation(s)
- Louis Sharrock
- Department of Mathematics, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom
| | - Nikolas Kantas
- Department of Mathematics, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom
| |
Collapse
|
3
|
Liu X, Yeo K. Inverse Models for Estimating the Initial Condition of Spatio-Temporal Advection-Diffusion Processes. Technometrics 2023. [DOI: 10.1080/00401706.2023.2181222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Affiliation(s)
- Xiao Liu
- Department of Industrial Engineering, University of Arkansas
| | | |
Collapse
|
4
|
Terdik G. Spatiotemporal covariance functions for Laplacian ARMA fields in higher dimensions. THEORY OF PROBABILITY AND MATHEMATICAL STATISTICS 2022. [DOI: 10.1090/tpms/1173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This paper presents clear formulae of the covariance functions of Laplacian ARMA fields in terms of coefficients and Bessel functions in higher spatial dimensions. Spectral methods are used for the study of spatiotemporal Laplacian ARMA fields in Euclidean spaces and spheres therein with dimension
d
≥
2
d\geq 2
.
Collapse
|
5
|
Liu X, Yeo K, Lu S. Statistical Modeling for Spatio-Temporal Data From Stochastic Convection-Diffusion Processes. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2020.1863223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Xiao Liu
- Department of Industrial Engineering, University of Arkansas, Fayetteville, AR
| | - Kyongmin Yeo
- IBM T. J. Watson Research Center, Yorktown Heights, NY
| | - Siyuan Lu
- IBM T. J. Watson Research Center, Yorktown Heights, NY
| |
Collapse
|
6
|
Carrizo Vergara R, Allard D, Desassis N. A general framework for SPDE-based stationary random fields. BERNOULLI 2022. [DOI: 10.3150/20-bej1317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Denis Allard
- Biostatistics and Spatial Processes, BioSP, INRAE, 84914, Avignon, France
| | - Nicolas Desassis
- MINES ParisTech, PSL University, Geosciences, Geostatistics team, 35 rue St Honoré, 77300 Fontainebleau, France
| |
Collapse
|
7
|
Wikle NB, Hanks EM, Henneman LRF, Zigler CM. A Mechanistic Model of Annual Sulfate Concentrations in the United States. J Am Stat Assoc 2022; 117:1082-1093. [PMID: 36246415 PMCID: PMC9563091 DOI: 10.1080/01621459.2022.2027774] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Understanding how individual pollution sources contribute to ambient sulfate pollution is critical for assessing past and future air quality regulations. Since attribution to specific sources is typically not encoded in spatial air pollution data, we develop a mechanistic model which we use to estimate, with uncertainty, the contribution of ambient sulfate concentrations attributable specifically to sulfur dioxide (SO2) emissions from individual coal-fired power plants in the central United States. We propose a multivariate Ornstein-Uhlenbeck (OU) process approximation to the dynamics of the underlying space-time chemical transport process, and its distributional properties are leveraged to specify novel probability models for spatial data that are viewed as either a snapshot or time-averaged observation of the OU process. Using US EPA SO2 emissions data from 193 power plants and state-of-the-art estimates of ground-level annual mean sulfate concentrations, we estimate that in 2011 - a time of active power plant regulatory action - existing flue-gas desulfurization (FGD) technologies at 66 power plants reduced population-weighted exposure to ambient sulfate by 1.97 μg/m3 (95% CI: 1.80 - 2.15). Furthermore, we anticipate future regulatory benefits by estimating that installing FGD technologies at the five largest SO2-emitting facilities would reduce human exposure to ambient sulfate by an additional 0.45 μg/m3 (95% CI: 0.33 - 0.54).
Collapse
Affiliation(s)
- Nathan B Wikle
- Department of Statistics and Data Sciences, University of Texas at Austin
| | | | - Lucas R F Henneman
- Department of Civil, Environmental, and Infrastructure Engineering, George Mason University
| | - Corwin M Zigler
- Department of Statistics and Data Sciences, University of Texas at Austin
| |
Collapse
|
8
|
Arnone E, Sangalli LM, Vicini A. Smoothing spatio-temporal data with complex missing data patterns. STAT MODEL 2021. [DOI: 10.1177/1471082x211057959] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
We consider spatio-temporal data and functional data with spatial dependence, characterized by complicated missing data patterns. We propose a new method capable to efficiently handle these data structures, including the case where data are missing over large portions of the spatio-temporal domain. The method is based on regression with partial differential equation regularization. The proposed model can accurately deal with data scattered over domains with irregular shapes and can accurately estimate fields exhibiting complicated local features. We demonstrate the consistency and asymptotic normality of the estimators. Moreover, we illustrate the good performances of the method in simulations studies, considering different missing data scenarios, from sparse data to more challenging scenarios where the data are missing over large portions of the spatial and temporal domains and the missing data are clustered in space and/or in time. The proposed method is compared to competing techniques, considering predictive accuracy and uncertainty quantification measures. Finally, we show an application to the analysis of lake surface water temperature data, that further illustrates the ability of the method to handle data featuring complicated patterns of missingness and highlights its potentiality for environmental studies.
Collapse
Affiliation(s)
- Eleonora Arnone
- MOX ’ Dipartimento di Matematica, Politecnico di Milano, Milano, Italy
| | - Laura M. Sangalli
- MOX ’ Dipartimento di Matematica, Politecnico di Milano, Milano, Italy
| | - Andrea Vicini
- MOX ’ Dipartimento di Matematica, Politecnico di Milano, Milano, Italy
| |
Collapse
|
9
|
Iranzad R, Liu X, Chaovalitwongse WA, Hippe D, Wang S, Han J, Thammasorn P, Duan C, Zeng J, Bowen S. Gradient Boosted Trees for Spatial Data and Its Application to Medical Imaging Data. IISE TRANSACTIONS ON HEALTHCARE SYSTEMS ENGINEERING 2021; 12:165-179. [PMID: 36311209 PMCID: PMC9615557 DOI: 10.1080/24725579.2021.1995536] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Boosting Trees are one of the most successful statistical learning approaches that involve sequentially growing an ensemble of simple regression trees ("weak learners"). This paper proposes a gradient Boosted Trees algorithm for Spatial Data (Boost-S) with covariate information. Boost-S integrates the spatial correlation into the classical framework of eXtreme Gradient Boosting. Each tree is constructed by solving a regularized optimization problem, where the objective function takes into account the underlying spatial correlation and involves two penalty terms on tree complexity. A computationally-efficient greedy heuristic algorithm is proposed to obtain an ensemble of trees. The proposed Boost-S is applied to the spatially-correlated FDG-PET (fluorodeoxyglucose-positron emission tomography) imaging data collected from clinical trials of cancer chemoradiotherapy. Our numerical investigations successfully demonstrate the advantages of the proposed Boost-S over existing approaches for this particular application.
Collapse
Affiliation(s)
- Reza Iranzad
- Department of Industrial Engineering, University of Arkansas
| | - Xiao Liu
- Department of Industrial Engineering, University of Arkansas
| | | | - Daniel Hippe
- Department of Radiology, University of Washington
| | - Shouyi Wang
- Department of Industrial, Manufacturing & Systems Engineering, University of Texas at Arlington
| | - Jie Han
- Department of Industrial, Manufacturing & Systems Engineering, University of Texas at Arlington
| | | | - Chunyan Duan
- Department of Mechanical Engineering, Tongji University
| | - Jing Zeng
- Department of Radiation Oncology, University of Washington
| | - Stephen Bowen
- Department of Radiology, University of Washington
- Department of Radiation Oncology, University of Washington
| |
Collapse
|
10
|
Gehman AJ, Wei WWS. Testing for poolability of the space-time autoregressive moving-average model. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2020.1725052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Andrew J. Gehman
- Department of Statistical Science, Temple University, Philadelphia, PA, USA
| | - William W. S. Wei
- Department of Statistical Science, Temple University, Philadelphia, PA, USA
| |
Collapse
|
11
|
Fossum TO, Travelletti C, Eidsvik J, Ginsbourger D, Rajan K. Learning excursion sets of vector-valued Gaussian random fields for autonomous ocean sampling. Ann Appl Stat 2021. [DOI: 10.1214/21-aoas1451] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Trygve Olav Fossum
- Department of Marine Technology, Norwegian University of Science and Technology (NTNU)
| | - Cédric Travelletti
- Institute of Mathematical Statistics and Actuarial Science, University of Bern
| | - Jo Eidsvik
- Department of Mathematical Sciences, NTNU
| | - David Ginsbourger
- Institute of Mathematical Statistics and Actuarial Science, University of Bern
| | - Kanna Rajan
- Underwater Systems and Technology Laboratory, Faculty of Engineering, University of Porto
| |
Collapse
|
12
|
Altmeyer R, Reiß M. Nonparametric estimation for linear SPDEs from local measurements. ANN APPL PROBAB 2021. [DOI: 10.1214/20-aap1581] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Markus Reiß
- Institut für Mathematik, Humboldt-Universität zu Berlin
| |
Collapse
|
13
|
Paul R, Arif AA, Adeyemi O, Ghosh S, Han D. Progression of COVID-19 From Urban to Rural Areas in the United States: A Spatiotemporal Analysis of Prevalence Rates. J Rural Health 2020; 36:591-601. [PMID: 32602983 PMCID: PMC7361905 DOI: 10.1111/jrh.12486] [Citation(s) in RCA: 108] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Purpose There are growing signs that the COVID‐19 virus has started to spread to rural areas and can impact the rural health care system that is already stretched and lacks resources. To aid in the legislative decision process and proper channelizing of resources, we estimated and compared the county‐level change in prevalence rates of COVID‐19 by rural‐urban status over 3 weeks. Additionally, we identified hotspots based on estimated prevalence rates. Methods We used crowdsourced data on COVID‐19 and linked them to county‐level demographics, smoking rates, and chronic diseases. We fitted a Bayesian hierarchical spatiotemporal model using the Markov Chain Monte Carlo algorithm in R‐studio. We mapped the estimated prevalence rates using ArcGIS 10.8, and identified hotspots using Gettis‐Ord local statistics. Findings In the rural counties, the mean prevalence of COVID‐19 increased from 3.6 per 100,000 population to 43.6 per 100,000 within 3 weeks from April 3 to April 22, 2020. In the urban counties, the median prevalence of COVID‐19 increased from 10.1 per 100,000 population to 107.6 per 100,000 within the same period. The COVID‐19 adjusted prevalence rates in rural counties were substantially elevated in counties with higher black populations, smoking rates, and obesity rates. Counties with high rates of people aged 25‐49 years had increased COVID‐19 prevalence rates. Conclusions Our findings show a rapid spread of COVID‐19 across urban and rural areas in 21 days. Studies based on quality data are needed to explain further the role of social determinants of health on COVID‐19 prevalence.
Collapse
Affiliation(s)
- Rajib Paul
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina
| | - Ahmed A Arif
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina
| | - Oluwaseun Adeyemi
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina
| | - Subhanwita Ghosh
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina
| | - Dan Han
- Department of Mathematics, University of Louisville, Louisville, Kentucky
| |
Collapse
|
14
|
|
15
|
|
16
|
Liu X, Yeo K, Hwang Y, Singh J, Kalagnanam J. A statistical modeling approach for air quality data based on physical dispersion processes and its application to ozone modeling. Ann Appl Stat 2016. [DOI: 10.1214/15-aoas901] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Hu X, Steinsland I. Spatial modeling with system of stochastic partial differential equations. ACTA ACUST UNITED AC 2016. [DOI: 10.1002/wics.1378] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Xiangping Hu
- Department of Mathematics; University of Oslo; Oslo Norway
| | - Ingelin Steinsland
- Department of Mathematical Sciences; Norwegian University of Science and Technology; Trondheim Norway
| |
Collapse
|