Francisco ME, Carvajal TM, Watanabe K. Hybrid Machine Learning Approach to Zero-Inflated Data Improves Accuracy of Dengue Prediction.
PLoS Negl Trop Dis 2024;
18:e0012599. [PMID:
39432557 PMCID:
PMC11527386 DOI:
10.1371/journal.pntd.0012599]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 10/31/2024] [Accepted: 10/01/2024] [Indexed: 10/23/2024] Open
Abstract
BACKGROUND
Spatiotemporal dengue forecasting using machine learning (ML) can contribute to the development of prevention and control strategies for impending dengue outbreaks. However, training data for dengue incidence may be inflated with frequent zero values because of the rarity of cases, which lowers the prediction accuracy. This study aimed to understand the influence of spatiotemporal resolutions of data on the accuracy of dengue incidence prediction using ML models, to understand how the influence of spatiotemporal resolution differs between quantitative and qualitative predictions of dengue incidence, and to improve the accuracy of dengue incidence prediction with zero-inflated data.
METHODOLOGY
We predicted dengue incidence at six spatiotemporal resolutions and compared their prediction accuracy. Six ML algorithms were compared: generalized additive models, random forests, conditional inference forest, artificial neural networks, support vector machines and regression, and extreme gradient boosting. Data from 2009 to 2012 were used for training, and data from 2013 were used for model validation with quantitative and qualitative dengue variables. To address the inaccuracy in the quantitative prediction of dengue incidence due to zero-inflated data at fine spatiotemporal scales, we developed a hybrid approach in which the second-stage quantitative prediction is performed only when/where the first-stage qualitative model predicts the occurrence of dengue cases.
PRINCIPAL FINDINGS
At higher resolutions, the dengue incidence data were zero-inflated, which was insufficient for quantitative pattern extraction of relationships between dengue incidence and environmental variables by ML. Qualitative models, used as binary variables, eased the effect of data distribution. Our novel hybrid approach of combining qualitative and quantitative predictions demonstrated high potential for predicting zero-inflated or rare phenomena, such as dengue.
SIGNIFICANCE
Our research contributes valuable insights to the field of spatiotemporal dengue prediction and provides an alternative solution to enhance prediction accuracy in zero-inflated data where hurdle or zero-inflated models cannot be applied.
Collapse