Fang ZG, Yang SQ, Lv CX, An SY, Wu W. Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study.
BMJ Open 2022;
12:e056685. [PMID:
35777884 PMCID:
PMC9251895 DOI:
10.1136/bmjopen-2021-056685]
[Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 06/20/2022] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE
The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more accurate for anticipating the occurrence of COVID-19 in the USA.
DESIGN
Time-series study.
SETTING
The USA was the setting for this study.
MAIN OUTCOME MEASURES
Three accuracy metrics, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), were applied to evaluate the performance of the two models.
RESULTS
In our study, for the training set and the validation set, the MAE, RMSE and MAPE of the XGBoost model were less than those of the ARIMA model.
CONCLUSIONS
The XGBoost model can help improve prediction of COVID-19 cases in the USA over the ARIMA model.
Collapse