1
|
Bertozzi-Villa A, Bever CA, Gerardin J, Proctor JL, Wu M, Harding D, Hollingsworth TD, Bhatt S, Gething PW. An archetypes approach to malaria intervention impact mapping: a new framework and example application. Malar J 2023; 22:138. [PMID: 37101269 PMCID: PMC10131392 DOI: 10.1186/s12936-023-04535-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 03/15/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND As both mechanistic and geospatial malaria modeling methods become more integrated into malaria policy decisions, there is increasing demand for strategies that combine these two methods. This paper introduces a novel archetypes-based methodology for generating high-resolution intervention impact maps based on mechanistic model simulations. An example configuration of the framework is described and explored. METHODS First, dimensionality reduction and clustering techniques were applied to rasterized geospatial environmental and mosquito covariates to find archetypal malaria transmission patterns. Next, mechanistic models were run on a representative site from each archetype to assess intervention impact. Finally, these mechanistic results were reprojected onto each pixel to generate full maps of intervention impact. The example configuration used ERA5 and Malaria Atlas Project covariates, singular value decomposition, k-means clustering, and the Institute for Disease Modeling's EMOD model to explore a range of three-year malaria interventions primarily focused on vector control and case management. RESULTS Rainfall, temperature, and mosquito abundance layers were clustered into ten transmission archetypes with distinct properties. Example intervention impact curves and maps highlighted archetype-specific variation in efficacy of vector control interventions. A sensitivity analysis showed that the procedure for selecting representative sites to simulate worked well in all but one archetype. CONCLUSION This paper introduces a novel methodology which combines the richness of spatiotemporal mapping with the rigor of mechanistic modeling to create a multi-purpose infrastructure for answering a broad range of important questions in the malaria policy space. It is flexible and adaptable to a range of input covariates, mechanistic models, and mapping strategies and can be adapted to the modelers' setting of choice.
Collapse
Affiliation(s)
- Amelia Bertozzi-Villa
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA.
- Malaria Atlas Project, Telethon Kids Institute, Perth, Australia.
- Big Data Institute, Nuffield Department of Medicine, Oxford University, Oxford, UK.
| | - Caitlin A Bever
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA
| | - Jaline Gerardin
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA
- Department of Preventive Medicine and Institute for Global Health, Northwestern University, Chicago, USA
| | - Joshua L Proctor
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA
| | - Meikang Wu
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA
| | - Dennis Harding
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA
| | | | - Samir Bhatt
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College, London, UK
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Peter W Gething
- Malaria Atlas Project, Telethon Kids Institute, Perth, Australia
- Curtin University, Perth, Australia
| |
Collapse
|
2
|
Mishra S, Flaxman S, Berah T, Zhu H, Pakkanen M, Bhatt S. π VAE: a stochastic process prior for Bayesian deep learning with MCMC. STATISTICS AND COMPUTING 2022; 32:96. [PMID: 36276409 PMCID: PMC9576140 DOI: 10.1007/s11222-022-10151-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 09/09/2022] [Indexed: 06/16/2023]
Abstract
Stochastic processes provide a mathematically elegant way to model complex data. In theory, they provide flexible priors over function classes that can encode a wide range of interesting assumptions. However, in practice efficient inference by optimisation or marginalisation is difficult, a problem further exacerbated with big data and high dimensional input spaces. We propose a novel variational autoencoder (VAE) called the prior encoding variational autoencoder ( π VAE). π VAE is a new continuous stochastic process. We use π VAE to learn low dimensional embeddings of function classes by combining a trainable feature mapping with generative model using a VAE. We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions such as their integrals. For popular tasks, such as spatial interpolation, π VAE achieves state-of-the-art performance both in terms of accuracy and computational efficiency. Perhaps most usefully, we demonstrate an elegant and scalable means of performing fully Bayesian inference for stochastic processes within probabilistic programming languages such as Stan.
Collapse
Affiliation(s)
- Swapnil Mishra
- MRC Centre for Global Infectious Disease Analysis, Jameel Institute for Disease and Emergency Analytics, Imperial College London, School of Public Health, London, UK
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Seth Flaxman
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Tresnia Berah
- Department of Mathematics, Imperial College London, London, UK
| | - Harrison Zhu
- Department of Mathematics, Imperial College London, London, UK
| | - Mikko Pakkanen
- Department of Mathematics, Imperial College London, London, UK
| | - Samir Bhatt
- MRC Centre for Global Infectious Disease Analysis, Jameel Institute for Disease and Emergency Analytics, Imperial College London, School of Public Health, London, UK
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
3
|
Transferring model structure in Bayesian transfer learning for Gaussian process regression. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108875] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
4
|
Wang T, Xu L, Li J. SDCRKL-GP: Scalable deep convolutional random kernel learning in gaussian process for image recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.092] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
5
|
Lucas TCD, Nandi AK, Chestnutt EG, Twohig KA, Keddie SH, Collins EL, Howes RE, Nguyen M, Rumisha SF, Python A, Arambepola R, Bertozzi‐Villa A, Hancock P, Amratia P, Battle KE, Cameron E, Gething PW, Weiss DJ. Mapping malaria by sharing spatial information between incidence and prevalence data sets. J R Stat Soc Ser C Appl Stat 2021. [DOI: 10.1111/rssc.12484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Andre Python
- Big Data Institute University of Oxford Oxford UK
| | | | - Amelia Bertozzi‐Villa
- Big Data Institute University of Oxford Oxford UK
- Institute for Disease Modeling Bellevue Washington USA
| | | | | | | | - Ewan Cameron
- Big Data Institute University of Oxford Oxford UK
| | - Peter W. Gething
- Big Data Institute University of Oxford Oxford UK
- Telethon Kids Institute Perth Children's Hospital Perth Australia
- Curtin University Perth Australia
| | | |
Collapse
|
6
|
Qu H, Zhou J, Qin J, Tian X. Anomaly Detection for Industrial Control Networks Based on Improved One-Class Support Vector Machine. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s0218001421500129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In traditional network anomaly detection algorithms, the anomaly threshold needs to be defined manually. Keeping this as background, this study proposes an anomaly detection algorithm (VAEOCSVM), which combines the variable auto-encoder (VAE) and one-class support vector machine (OCSVM) to realize anomaly detection in industrial control networks. First, the VAE model is used to obtain the distribution of the original normal sample data represented by the low-dimensional code; the reconstruction error of the VAE model is merged into the new input. Then, using OCSVM’s hinge-loss objective function and the random Fourier feature fitting radial basis function (RBF) kernel method, the OCSVM model is represented and solved using the deep neural network and gradient descent method. Finally, the decision function of the OCSVM model is constructed by using the solved parameter information to realize the detection of abnormal data. The proposed algorithm is compared with other machine-learning-based anomaly detection algorithms in terms of multiple indicators such as precision, recall, and [Formula: see text] score. The experimental results using various datasets show that the proposed algorithm has a better outlier recognition ability than the machine-learning-based anomaly detection algorithms.
Collapse
Affiliation(s)
- Haicheng Qu
- Institute of Software, Liaoning Technical University, Huludao 125105, P. R. China
| | - Jianzhong Zhou
- Institute of Software, Liaoning Technical University, Huludao 125105, P. R. China
| | - Jitao Qin
- Institute of Software, Liaoning Technical University, Huludao 125105, P. R. China
| | - Xiaorong Tian
- Institute of Software, Liaoning Technical University, Huludao 125105, P. R. China
| |
Collapse
|
7
|
Finding hotspots: development of an adaptive spatial sampling approach. Sci Rep 2020; 10:10939. [PMID: 32616757 PMCID: PMC7331748 DOI: 10.1038/s41598-020-67666-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 06/08/2020] [Indexed: 01/09/2023] Open
Abstract
The identification of disease hotspots is an increasingly important public health problem. While geospatial modeling offers an opportunity to predict the locations of hotspots using suitable environmental and climatological data, little attention has been paid to optimizing the design of surveys used to inform such models. Here we introduce an adaptive sampling scheme optimized to identify hotspot locations where prevalence exceeds a relevant threshold. Our approach incorporates ideas from Bayesian optimization theory to adaptively select sample batches. We present an experimental simulation study based on survey data of schistosomiasis and lymphatic filariasis across four countries. Results across all scenarios explored show that adaptive sampling produces superior results and suggest that similar performance to random sampling can be achieved with a fraction of the sample size.
Collapse
|
8
|
Milton P, Coupland H, Giorgi E, Bhatt S. Spatial analysis made easy with linear regression and kernels. Epidemics 2019; 29:100362. [PMID: 31561884 DOI: 10.1016/j.epidem.2019.100362] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 08/05/2019] [Accepted: 08/19/2019] [Indexed: 11/29/2022] Open
Abstract
Kernel methods are a popular technique for extending linear models to handle non-linear spatial problems via a mapping to an implicit, high-dimensional feature space. While kernel methods are computationally cheaper than an explicit feature mapping, they are still subject to cubic cost on the number of points. Given only a few thousand locations, this computational cost rapidly outstrips the currently available computational power. This paper aims to provide an overview of kernel methods from first-principals (with a focus on ridge regression) and progress to a review of random Fourier features (RFF), a method that enables the scaling of kernel methods to big datasets. We show how the RFF method is capable of approximating the full kernel matrix, providing a significant computational speed-up for a negligible cost to accuracy and can be incorporated into many existing spatial methods using only a few lines of code. We give an example of the implementation of RFFs on a simulated spatial data set to illustrate these properties. Lastly, we summarise the main issues with RFFs and highlight some of the advanced techniques aimed at alleviating them. At each stage, the associated R code is provided.
Collapse
Affiliation(s)
- Philip Milton
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK.
| | - Helen Coupland
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK.
| | - Emanuele Giorgi
- CHICAS, Lancaster Medical School, Lancaster University, Lancaster, UK.
| | - Samir Bhatt
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK.
| |
Collapse
|
9
|
Tusting LS, Bisanzio D, Alabaster G, Cameron E, Cibulskis R, Davies M, Flaxman S, Gibson HS, Knudsen J, Mbogo C, Okumu FO, von Seidlein L, Weiss DJ, Lindsay SW, Gething PW, Bhatt S. Mapping changes in housing in sub-Saharan Africa from 2000 to 2015. Nature 2019; 568:391-394. [PMID: 30918405 PMCID: PMC6784864 DOI: 10.1038/s41586-019-1050-5] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 02/20/2019] [Indexed: 11/16/2022]
Abstract
Access to adequate housing is a fundamental human right, essential to human security, nutrition and health, and a core objective of the United Nations Sustainable Development Goals1,2. Globally, the housing need is most acute in Africa, where the population will more than double by 2050. However, existing data on housing quality across Africa are limited primarily to urban areas and are mostly recorded at the national level. Here we quantify changes in housing in sub-Saharan Africa from 2000 to 2015 by combining national survey data within a geostatistical framework. We show a marked transformation of housing in urban and rural sub-Saharan Africa between 2000 and 2015, with the prevalence of improved housing (with improved water and sanitation, sufficient living area and durable construction) doubling from 11% (95% confidence interval, 10-12%) to 23% (21-25%). However, 53 (50-57) million urban Africans (47% (44-50%) of the urban population analysed) were living in unimproved housing in 2015. We provide high-resolution, standardized estimates of housing conditions across sub-Saharan Africa. Our maps provide a baseline for measuring change and a mechanism to guide interventions during the era of the Sustainable Development Goals.
Collapse
Affiliation(s)
- Lucy S Tusting
- Department of Disease Control, London School of Hygiene & Tropical Medicine, London, UK.
| | - Donal Bisanzio
- RTI International, Washington, DC, USA
- Division of Epidemiology and Public Health, School of Medicine, University of Nottingham, Nottingham, UK
| | | | - Ewan Cameron
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Richard Cibulskis
- Health Metrics and Measurement Cluster, World Health Organization, Geneva, Switzerland
| | - Michael Davies
- UCL Institute for Environmental Design and Engineering (IEDE), University College London, London, UK
| | - Seth Flaxman
- Department of Mathematics and Data Science Institute, Imperial College London, London, UK
| | - Harry S Gibson
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Jakob Knudsen
- School of Architecture, The Royal Danish Academy of Fine Arts, Copenhagen, Denmark
| | - Charles Mbogo
- Kenya Medical Research Institute, Kilifi, Kenya
- KEMRI-Wellcome Trust Research Program, Nairobi, Kenya
| | - Fredros O Okumu
- Environmental Health and Ecological Sciences Department, Ifakara Health Institute, Ifakara, Tanzania
- School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow, UK
| | - Lorenz von Seidlein
- Mahidol-Oxford Tropical Medicine Research Unit (MORU), Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Daniel J Weiss
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | | | - Peter W Gething
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Samir Bhatt
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| |
Collapse
|
10
|
Heaton MJ, Datta A, Finley AO, Furrer R, Guinness J, Guhaniyogi R, Gerber F, Gramacy RB, Hammerling D, Katzfuss M, Lindgren F, Nychka DW, Sun F, Zammit-Mangion A. A Case Study Competition Among Methods for Analyzing Large Spatial Data. JOURNAL OF AGRICULTURAL, BIOLOGICAL, AND ENVIRONMENTAL STATISTICS 2018; 24:398-425. [PMID: 31496633 PMCID: PMC6709111 DOI: 10.1007/s13253-018-00348-w] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 12/05/2018] [Indexed: 10/27/2022]
Abstract
The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online. ELECTRONIC SUPPLEMENTARY MATERIAL Supplementary materials for this article are available at 10.1007/s13253-018-00348-w.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Furong Sun
- Brigham Young University, Provo, UT USA
| | | |
Collapse
|