1
|
Chen H, Xiao M. Seasonality of influenza-like illness and short-term forecasting model in Chongqing from 2010 to 2022. BMC Infect Dis 2024; 24:432. [PMID: 38654199 PMCID: PMC11036656 DOI: 10.1186/s12879-024-09301-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/07/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND Influenza-like illness (ILI) imposes a significant burden on patients, employers and society. However, there is no analysis and prediction at the hospital level in Chongqing. We aimed to characterize the seasonality of ILI, examine age heterogeneity in visits, and predict ILI peaks and assess whether they affect hospital operations. METHODS The multiplicative decomposition model was employed to decompose the trend and seasonality of ILI, and the Seasonal Auto-Regressive Integrated Moving Average with exogenous factors (SARIMAX) model was used for the trend and short-term prediction of ILI. We used Grid Search and Akaike information criterion (AIC) to calibrate and verify the optimal hyperparameters, and verified the residuals of the multiplicative decomposition and SARIMAX model, which are both white noise. RESULTS During the 12-year study period, ILI showed a continuous upward trend, peaking in winter (Dec. - Jan.) and a small spike in May-June in the 2-4-year-old high-risk group for severe disease. The mean length of stay (LOS) in ILI peaked around summer (about Aug.), and the LOS in the 0-1 and ≥ 65 years old severely high-risk group was more irregular than the others. We found some anomalies in the predictive analysis of the test set, which were basically consistent with the dynamic zero-COVID policy at the time. CONCLUSION The ILI patient visits showed a clear cyclical and seasonal pattern. ILI prevention and control activities can be conducted seasonally on an annual basis, and age heterogeneity should be considered in the health resource planning. Targeted immunization policies are essential to mitigate potential pandemic threats. The SARIMAX model has good short-term forecasting ability and accuracy. It can help explore the epidemiological characteristics of ILI and provide an early warning and decision-making basis for the allocation of medical resources related to ILI visits.
Collapse
Affiliation(s)
- Huayong Chen
- School of Public Health, Research Center for Medical and Social Development, Chongqing Medical University, 1 Yixueyuan Road, Yuzhong District, 400016, Chongqing, P. R. China
| | - Mimi Xiao
- School of Public Health, Research Center for Medical and Social Development, Chongqing Medical University, 1 Yixueyuan Road, Yuzhong District, 400016, Chongqing, P. R. China.
| |
Collapse
|
2
|
Wu X, Zhai F, Chang A, Wei J, Guo Y, Zhang J. Development of Machine Learning Models for Predicting Osteoporosis in Patients with Type 2 Diabetes Mellitus-A Preliminary Study. Diabetes Metab Syndr Obes 2023; 16:1987-2003. [PMID: 37408729 PMCID: PMC10319347 DOI: 10.2147/dmso.s406695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 06/22/2023] [Indexed: 07/07/2023] Open
Abstract
Purpose Diagnosing osteoporosis in T2DM based on bone mineral density (BMD) remains challenging. We sought to develop prediction models employing machine learning algorithms for use as screening instruments for osteoporosis in T2DM patients. Patients and Methods Data were collected from 433 participants and analyzed using nine categorical machine learning algorithms to select features based on demographic and clinical variables. Multiple classification models were compared using the area under the receiver operating characteristic curve (ROC-AUC), accuracy, sensitivity, specificity, the average precision (AP), precision, F1 score, precision-recall curves, calibration plots, and decision curve analysis (DCA) to determine the best model. In addition, 5-fold cross-validation was utilized to optimize the model, followed by an evaluation of feature significance using Shapley Additive exPlanations (SHAP). Using latent class analysis (LCA), distinct subpopulations were identified by constructing several discrete clusters. Results In this study, nine feature variables were identified to construct predictive models for osteoporosis in individuals with T2DM. The machine learning algorithms achieved an AP range of 0.444-1.000. The XGBoost model was selected as the final prediction model with an AUROC of 0.940 in the training set, 0.772 in the validation set for 5-fold cross-validation, and 0.872 in the test set. Using SHAP methodology, 25(OH)D was identified as the most important risk factor. Additionally, a 3-Class model was constructed using LCA, which categorized individuals into high, medium, and low-risk groups. Conclusion Our study developed a predictive model with high accuracy and clinical validity for predicting osteoporosis in type 2 diabetes patients. We also identified three subpopulations with varying osteoporosis risk using clustering. However, limited sample size warrants cautious interpretation of results, and validation in larger cohorts is needed.
Collapse
Affiliation(s)
- Xuelun Wu
- Department of Endocrinology, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People’s Republic of China
| | - Furui Zhai
- Gynecological Clinic, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People’s Republic of China
| | - Ailing Chang
- Department of Endocrinology, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People’s Republic of China
| | - Jing Wei
- Department of Endocrinology, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People’s Republic of China
| | - Yanan Guo
- Department of Endocrinology, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People’s Republic of China
| | - Jincheng Zhang
- Department of Endocrinology, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People’s Republic of China
| |
Collapse
|
3
|
Jayaramu V, Zulkafli Z, De Stercke S, Buytaert W, Rahmat F, Abdul Rahman RZ, Ishak AJ, Tahir W, Ab Rahman J, Mohd Fuzi NMH. Leptospirosis modelling using hydrometeorological indices and random forest machine learning. INTERNATIONAL JOURNAL OF BIOMETEOROLOGY 2023; 67:423-437. [PMID: 36719482 DOI: 10.1007/s00484-022-02422-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 12/21/2022] [Accepted: 12/26/2022] [Indexed: 06/18/2023]
Abstract
Leptospirosis is a zoonosis that has been linked to hydrometeorological variability. Hydrometeorological averages and extremes have been used before as drivers in the statistical prediction of disease. However, their importance and predictive capacity are still little known. In this study, the use of a random forest classifier was explored to analyze the relative importance of hydrometeorological indices in developing the leptospirosis model and to evaluate the performance of models based on the type of indices used, using case data from three districts in Kelantan, Malaysia, that experience annual monsoonal rainfall and flooding. First, hydrometeorological data including rainfall, streamflow, water level, relative humidity, and temperature were transformed into 164 weekly average and extreme indices in accordance with the Expert Team on Climate Change Detection and Indices (ETCCDI). Then, weekly case occurrences were classified into binary classes "high" and "low" based on an average threshold. Seventeen models based on "average," "extreme," and "mixed" indices were trained by optimizing the feature subsets based on the model computed mean decrease Gini (MDG) scores. The variable importance was assessed through cross-correlation analysis and the MDG score. The average and extreme models showed similar prediction accuracy ranges (61.5-76.1% and 72.3-77.0%) while the mixed models showed an improvement (71.7-82.6% prediction accuracy). An extreme model was the most sensitive while an average model was the most specific. The time lag associated with the driving indices agreed with the seasonality of the monsoon. The rainfall variable (extreme) was the most important in classifying the leptospirosis occurrence while streamflow was the least important despite showing higher correlations with leptospirosis.
Collapse
Affiliation(s)
- Veianthan Jayaramu
- Department of Civil Engineering, Universiti Putra Malaysia, Serdang, Malaysia
| | - Zed Zulkafli
- Department of Civil Engineering, Universiti Putra Malaysia, Serdang, Malaysia.
| | - Simon De Stercke
- Department of Civil and Environmental Engineering, Imperial College London, London, UK
| | - Wouter Buytaert
- Department of Civil and Environmental Engineering, Imperial College London, London, UK
| | - Fariq Rahmat
- Department of Electrical and Electronic Engineering, Universiti Putra Malaysia, Serdang, Malaysia
| | | | - Asnor Juraiza Ishak
- Department of Electrical and Electronic Engineering, Universiti Putra Malaysia, Serdang, Malaysia
| | - Wardah Tahir
- Flood Control Research Group, Faculty of Civil Engineering, Universiti Teknologi Mara, Shah Alam, Malaysia
| | - Jamalludin Ab Rahman
- Department of Community Medicine, Kulliyyah of Medicine, International Islamic University Malaysia, Kuantan, Malaysia
| | | |
Collapse
|
4
|
Artificial Intelligence in Biological Sciences. Life (Basel) 2022; 12:life12091430. [PMID: 36143468 PMCID: PMC9505413 DOI: 10.3390/life12091430] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 08/25/2022] [Accepted: 09/10/2022] [Indexed: 12/03/2022] Open
Abstract
Artificial intelligence (AI), currently a cutting-edge concept, has the potential to improve the quality of life of human beings. The fields of AI and biological research are becoming more intertwined, and methods for extracting and applying the information stored in live organisms are constantly being refined. As the field of AI matures with more trained algorithms, the potential of its application in epidemiology, the study of host–pathogen interactions and drug designing widens. AI is now being applied in several fields of drug discovery, customized medicine, gene editing, radiography, image processing and medication management. More precise diagnosis and cost-effective treatment will be possible in the near future due to the application of AI-based technologies. In the field of agriculture, farmers have reduced waste, increased output and decreased the amount of time it takes to bring their goods to market due to the application of advanced AI-based approaches. Moreover, with the use of AI through machine learning (ML) and deep-learning-based smart programs, one can modify the metabolic pathways of living systems to obtain the best possible outputs with the minimal inputs. Such efforts can improve the industrial strains of microbial species to maximize the yield in the bio-based industrial setup. This article summarizes the potentials of AI and their application to several fields of biology, such as medicine, agriculture, and bio-based industry.
Collapse
|
5
|
Chen Y, Liu T, Yu X, Zeng Q, Cai Z, Wu H, Zhang Q, Xiao J, Ma W, Pei S, Guo P. An ensemble forecast system for tracking dynamics of dengue outbreaks and its validation in China. PLoS Comput Biol 2022; 18:e1010218. [PMID: 35759513 PMCID: PMC9269975 DOI: 10.1371/journal.pcbi.1010218] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 07/08/2022] [Accepted: 05/17/2022] [Indexed: 02/05/2023] Open
Abstract
As a common vector-borne disease, dengue fever remains challenging to predict due to large variations in epidemic size across seasons driven by a number of factors including population susceptibility, mosquito density, meteorological conditions, geographical factors, and human mobility. An ensemble forecast system for dengue fever is first proposed that addresses the difficulty of predicting outbreaks with drastically different scales. The ensemble forecast system based on a susceptible-infected-recovered (SIR) type of compartmental model coupled with a data assimilation method called the ensemble adjusted Kalman filter (EAKF) is constructed to generate real-time forecasts of dengue fever spread dynamics. The model was informed by meteorological and mosquito density information to depict the transmission of dengue virus among human and mosquito populations, and generate predictions. To account for the dramatic variations of outbreak size in different seasons, the effective population size parameter that is sequentially updated to adjust the predicted outbreak scale is introduced into the model. Before optimizing the transmission model, we update the effective population size using the most recent observations and historical records so that the predicted outbreak size is dynamically adjusted. In the retrospective forecast of dengue outbreaks in Guangzhou, China during the 2011-2017 seasons, the proposed forecast model generates accurate projections of peak timing, peak intensity, and total incidence, outperforming a generalized additive model approach. The ensemble forecast system can be operated in real-time and inform control planning to reduce the burden of dengue fever.
Collapse
Affiliation(s)
- Yuliang Chen
- Department of Preventive Medicine, Shantou University Medical College, Shantou China
| | - Tao Liu
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou, China
| | - Xiaolin Yu
- Department of Preventive Medicine, Shantou University Medical College, Shantou China
| | - Qinghui Zeng
- Department of Preventive Medicine, Shantou University Medical College, Shantou China
| | - Zixi Cai
- Shantou Center for Disease Control and Prevention, Shantou, China
| | - Haisheng Wu
- Department of Preventive Medicine, Shantou University Medical College, Shantou China
| | - Qingying Zhang
- Department of Preventive Medicine, Shantou University Medical College, Shantou China
| | - Jianpeng Xiao
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou, China
| | - Wenjun Ma
- Guangdong Provincial Institute of Public Health, Guangdong Provincial Center for Disease Control and Prevention, Guangzhou, China
- * E-mail: (WM); (SP); (PG)
| | - Sen Pei
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, United States of America
- * E-mail: (WM); (SP); (PG)
| | - Pi Guo
- Department of Preventive Medicine, Shantou University Medical College, Shantou China
- * E-mail: (WM); (SP); (PG)
| |
Collapse
|
6
|
Forecasting the Potential Number of Influenza-like Illness Cases by Fusing Internet Public Opinion. SUSTAINABILITY 2022. [DOI: 10.3390/su14052803] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
As influenza viruses mutate rapidly, a prediction model for potential outbreaks of influenza-like illnesses helps detect the spread of the illnesses in real time. In order to create a better prediction model, in this study, in addition to using the traditional hydrological and atmospheric data, features, such as popular search keywords on Google Trends, public holiday information, population density, air quality indices, and the numbers of COVID-19 confirmed cases, were also used to train the model in this research. Furthermore, Random Forest and XGBoost were combined and used in the proposed prediction model to increase the prediction accuracy. The training data used in this research were the historical data taken from 2016 to 2021. In our experiments, different combinations of features were tested. The results show that features, such as popular search keywords on Google Trends, the numbers of COVID-19 confirmed cases, and air quality indices can improve the outcome of the prediction model. The evaluation results showed that the error rate between the predicted results and the actual number of influenza-like cases form Week 15 to Week 18 fell to less than 5%. The outbreak of COVID-19 in Taiwan began in Week 19 and resulted in a sharp rise in the number of clinic or hospital visits by patients of influenza-like illnesses. After that, from Week 21 to Week 26, the error rate between the predicted and actual numbers of influenza-like cases in the later period dropped down to 13%. It can be confirmed from the actual experimental results in this research that the use of the ensemble learning prediction model proposed in this research can accurately predict the trend of influenza-like cases.
Collapse
|
7
|
El-Sherif DM, Abouzid M, Elzarif MT, Ahmed AA, Albakri A, Alshehri MM. Telehealth and Artificial Intelligence Insights into Healthcare during the COVID-19 Pandemic. Healthcare (Basel) 2022; 10:385. [PMID: 35206998 PMCID: PMC8871559 DOI: 10.3390/healthcare10020385] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 02/13/2022] [Accepted: 02/15/2022] [Indexed: 02/06/2023] Open
Abstract
Soon after the coronavirus disease 2019 pandemic was proclaimed, digital health services were widely adopted to respond to this public health emergency, including comprehensive monitoring technologies, telehealth, creative diagnostic, and therapeutic decision-making methods. The World Health Organization suggested that artificial intelligence might be a valuable way of dealing with the crisis. Artificial intelligence is an essential technology of the fourth industrial revolution that is a critical nonmedical intervention for overcoming the present global health crisis, developing next-generation pandemic preparation, and regaining resilience. While artificial intelligence has much potential, it raises fundamental privacy, transparency, and safety concerns. This study seeks to address these issues and looks forward to an intelligent healthcare future based on best practices and lessons learned by employing telehealth and artificial intelligence during the COVID-19 pandemic.
Collapse
Affiliation(s)
- Dina M. El-Sherif
- National Institute of Oceanography and Fisheries (NIOF), Cairo 11516, Egypt
| | - Mohamed Abouzid
- Department of Physical Pharmacy and Pharmacokinetics, Poznan University of Medical Sciences, 60-781 Poznan, Poland;
- Doctoral School, Poznan University of Medical Sciences, 60-781 Poznan, Poland;
| | - Mohamed Tarek Elzarif
- Independent Digital Health Researcher and Entrepreneur, CEO Doctor Live Company, Cairo 12655, Egypt;
| | - Alhassan Ali Ahmed
- Doctoral School, Poznan University of Medical Sciences, 60-781 Poznan, Poland;
- Department of Bioinformatics and Computational Biology, Poznan University of Medical Sciences, 60-781 Poznan, Poland
| | - Ashwag Albakri
- Collage of Computer Science and Information Technology, Jazan University, Jizan 45142, Saudi Arabia;
| | - Mohammed M. Alshehri
- Medical Research Center, Jazan University, Jizan 45142, Saudi Arabia;
- Physical Therapy Department, Jazan University, Jizan 82412, Saudi Arabia
| |
Collapse
|
8
|
Hasan A, Levene M, Weston D, Fromson R, Koslover N, Levene T. Monitoring Covid-19 on social media using a novel triage and diagnosis approach. J Med Internet Res 2022; 24:e30397. [PMID: 35142636 PMCID: PMC8887561 DOI: 10.2196/30397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/09/2021] [Accepted: 02/05/2022] [Indexed: 12/23/2022] Open
Abstract
Background The COVID-19 pandemic has created a pressing need for integrating information from disparate sources in order to assist decision makers. Social media is important in this respect; however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. Here, we adopt a triage and diagnosis approach to analyzing social media posts using machine learning techniques for the purpose of disease detection and surveillance. We thus obtain useful prevalence and incidence statistics to identify disease symptoms and their severities, motivated by public health concerns. Objective This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts in order to provide researchers and public health practitioners with additional information on the symptoms, severity, and prevalence of the disease rather than to provide an actionable decision at the individual level. Methods The text processing pipeline first extracted COVID-19 symptoms and related concepts, such as severity, duration, negations, and body parts, from patients’ posts using conditional random fields. An unsupervised rule-based algorithm was then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations were subsequently used to construct 2 different vector representations of each post. These vectors were separately applied to build support vector machine learning models to triage patients into 3 categories and diagnose them for COVID-19. Results We reported macro- and microaveraged F1 scores in the range of 71%-96% and 61%-87%, respectively, for the triage and diagnosis of COVID-19 when the models were trained on human-labeled data. Our experimental results indicated that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. In addition, we highlighted important features uncovered by our diagnostic machine learning models and compared them with the most frequent symptoms revealed in another COVID-19 data set. In particular, we found that the most important features are not always the most frequent ones. Conclusions Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from social media natural language narratives, using a machine learning pipeline in order to provide information on the severity and prevalence of the disease for use within health surveillance systems.
Collapse
Affiliation(s)
- Abul Hasan
- Birkbeck, University of London, Malet street, bloomsbury, London, GB
| | - Mark Levene
- Birkbeck, University of London, Malet street, bloomsbury, London, GB
| | - David Weston
- Birkbeck, University of London, Malet street, bloomsbury, London, GB
| | - Renate Fromson
- Barnet General Hospital, Wellhouse Lane, London EN5 3DJ, United Kingdom, London, GB
| | - Nicolas Koslover
- Barnet General Hospital, Wellhouse Lane, London EN5 3DJ, United Kingdom, London, GB
| | - Tamara Levene
- Barnet General Hospital, Wellhouse Lane, London EN5 3DJ, United Kingdom, London, GB
| |
Collapse
|
9
|
Naeem M, Yu J, Aamir M, Khan SA, Adeleye O, Khan Z. Comparative analysis of machine learning approaches to analyze and predict the COVID-19 outbreak. PeerJ Comput Sci 2021; 7:e746. [PMID: 35036527 PMCID: PMC8725668 DOI: 10.7717/peerj-cs.746] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 09/23/2021] [Indexed: 05/08/2023]
Abstract
BACKGROUND Forecasting the time of forthcoming pandemic reduces the impact of diseases by taking precautionary steps such as public health messaging and raising the consciousness of doctors. With the continuous and rapid increase in the cumulative incidence of COVID-19, statistical and outbreak prediction models including various machine learning (ML) models are being used by the research community to track and predict the trend of the epidemic, and also in developing appropriate strategies to combat and manage its spread. METHODS In this paper, we present a comparative analysis of various ML approaches including Support Vector Machine, Random Forest, K-Nearest Neighbor and Artificial Neural Network in predicting the COVID-19 outbreak in the epidemiological domain. We first apply the autoregressive distributed lag (ARDL) method to identify and model the short and long-run relationships of the time-series COVID-19 datasets. That is, we determine the lags between a response variable and its respective explanatory time series variables as independent variables. Then, the resulting significant variables concerning their lags are used in the regression model selected by the ARDL for predicting and forecasting the trend of the epidemic. RESULTS Statistical measures-Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE)-are used for model accuracy. The values of MAPE for the best-selected models for confirmed, recovered and deaths cases are 0.003, 0.006 and 0.115, respectively, which falls under the category of highly accurate forecasts. In addition, we computed 15 days ahead forecast for the daily deaths, recovered, and confirm patients and the cases fluctuated across time in all aspects. Besides, the results reveal the advantages of ML algorithms for supporting the decision-making of evolving short-term policies.
Collapse
Affiliation(s)
- Muhammad Naeem
- Department of Statistics, Abdul Wali Khan University, Mardan, KP, Pakistan
| | - Jian Yu
- Department of Computer Science, Auckland University of Technology, Auckland, New Zealand
| | - Muhammad Aamir
- Department of Statistics, Abdul Wali Khan University, Mardan, KP, Pakistan
| | - Sajjad Ahmad Khan
- Department of Statistics, Islamia College University, Peshawar, KP, Pakistan
| | - Olayinka Adeleye
- Department of Computer Science, Auckland University of Technology, Auckland, New Zealand
| | - Zardad Khan
- Department of Statistics, Abdul Wali Khan University, Mardan, KP, Pakistan
| |
Collapse
|
10
|
Park HW, Jung H, Back KY, Choi HJ, Ryu KS, Cha HS, Lee EK, Hong AR, Hwangbo Y. Application of Machine Learning to Identify Clinically Meaningful Risk Group for Osteoporosis in Individuals Under the Recommended Age for Dual-Energy X-Ray Absorptiometry. Calcif Tissue Int 2021; 109:645-655. [PMID: 34195852 DOI: 10.1007/s00223-021-00880-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 06/16/2021] [Indexed: 11/29/2022]
Abstract
Dual-energy X-ray absorptiometry (DXA) is the gold standard for diagnosing osteoporosis; it is generally recommended in men ≥ 70 and women ≥ 65 years old. Therefore, assessment of clinical risk factors for osteoporosis is very important in individuals under the recommended age for DXA. Here, we examine the diagnostic performance of machine learning-based prediction models for osteoporosis in individuals under the recommended age for DXA examination. Data of 2210 men aged 50-69 and 1099 women aged 50-64 obtained from the Korea National Health and Nutrition Examination Survey IV-V were analyzed. Extreme gradient boosting (XGBoost) was used to find relevant clinical features and applied to three machine learning models: XGBoost, logistic regression, and a multilayer perceptron. For the prediction of osteoporosis, the XGBoost model using the top 20 features extracted from XGBoost showed the most reliable performance with area under the receiver operating characteristic curve (AUROC) of 0.73 and 0.79 in men and women, respectively. We compared the diagnostic accuracy of the Shapley additive explanation values based on a risk-score model obtained from XGBoost and conventional osteoporosis risk assessment tools for prediction of osteoporosis using optimal cut-off values for each model. We observed that a cut-off risk score of ≥ 28 in men and ≥ 47 in women was optimal to classify a positive screening for osteoporosis (an AUROC of 0.86 in men and 0.91 in women). The XGBoost-based osteoporosis-prediction model outperformed conventional risk assessment tools. Therefore, machine learning-based prediction models are a more suitable option than conventional risk assessment methods for screening osteoporosis in individuals under the recommended age for DXA examination.
Collapse
Affiliation(s)
- Hyun Woo Park
- Healthcare AI Team, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, Gyeonggi, 10408, South Korea
| | - Hyojung Jung
- Healthcare AI Team, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, Gyeonggi, 10408, South Korea
| | - Kyoung Yeon Back
- Healthcare AI Team, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, Gyeonggi, 10408, South Korea
| | - Hyeon Ju Choi
- Healthcare AI Team, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, Gyeonggi, 10408, South Korea
| | - Kwang Sun Ryu
- Cancer Big Data Center, National Cancer Center, National Cancer Control Institute, Goyang, South Korea
| | - Hyo Soung Cha
- Cancer Big Data Center, National Cancer Center, National Cancer Control Institute, Goyang, South Korea
| | - Eun Kyung Lee
- Center for Thyroid Cancer, National Cancer Center, Goyang, South Korea
| | - A Ram Hong
- Department of Internal Medicine, Chonnam National University Medical School, 160, Baekseo-ro, Dong-gu, Gwangju, 61469, South Korea.
| | - Yul Hwangbo
- Healthcare AI Team, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, Gyeonggi, 10408, South Korea.
| |
Collapse
|
11
|
Pourhoseingholi A, Vahedi M, Chaibakhsh S, Pourhoseingholi MA, Vahedian-Azimi A, Guest PC, Rahimi-Bashar F, Sahebkar A. Deep Learning Analysis in Prediction of COVID-19 Infection Status Using Chest CT Scan Features. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1327:139-147. [PMID: 34279835 DOI: 10.1007/978-3-030-71697-4_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Background and aims Non-contrast chest computed tomography (CT) scanning is one of the important tools for evaluating of lung lesions. The aim of this study was to use a deep learning approach for predicting the outcome of patients with COVID-19 into two groups of critical and non-critical according to their CT features. Methods This was carried out as a retrospective study from March to April 2020 in Baqiyatallah Hospital, Tehran, Iran. From total of 1078 patients with COVID-19 pneumonia who underwent chest CT, 169 were critical cases and 909 were non-critical. Deep learning neural networks were used to classify samples into critical or non-critical ones according to the chest CT results. Results The best accuracy of prediction was seen by the presence of diffuse opacities and lesion distribution (both=0.91, 95% CI: 0.83-0.99). The largest sensitivity was achieved using lesion distribution (0.74, 95% CI: 0.55-0.93), and the largest specificity was for presence of diffuse opacities (0.95, 95% CI: 0.9-1). The total model showed an accuracy of 0.89 (95% CI: 0.79-0.99), and the corresponding sensitivity and specificity were 0.71 (95% CI: 0.51-0.91) and 0.93 (95% CI: 0.87-0.96), respectively. Conclusions The results showed that CT scan can accurately classify and predict critical and non-critical COVID-19 cases.
Collapse
Affiliation(s)
- Asma Pourhoseingholi
- Department of Biostatistics, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mohsen Vahedi
- Department of Biostatistics, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Samira Chaibakhsh
- Eye Research Center, The five Senses Institute, Rassoul Akram Hospital, Iran University of Medical Sciences, Tehran, Iran.
| | - Mohamad Amin Pourhoseingholi
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amir Vahedian-Azimi
- Trauma Research Center, Nursing Faculty, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Paul C Guest
- Laboratory of Neuroproteomics, Department of Biochemistry and Tissue Biology, Institute of Biology, University of Campinas (UNICAMP), Campinas, Brazil
| | - Farshid Rahimi-Bashar
- Anesthesia and Critical Care Department, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Amirhossein Sahebkar
- Biotechnology Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran.
- Applied Biomedical Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
- Polish Mother's Memorial Hospital Research Institute (PMMHRI), Lodz, Poland.
- School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
12
|
Aiken EL, Nguyen AT, Viboud C, Santillana M. Toward the use of neural networks for influenza prediction at multiple spatial resolutions. SCIENCE ADVANCES 2021; 7:7/25/eabb1237. [PMID: 34134985 PMCID: PMC8208709 DOI: 10.1126/sciadv.abb1237] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 04/29/2021] [Indexed: 05/24/2023]
Abstract
Mitigating the effects of disease outbreaks with timely and effective interventions requires accurate real-time surveillance and forecasting of disease activity, but traditional health care-based surveillance systems are limited by inherent reporting delays. Machine learning methods have the potential to fill this temporal "data gap," but work to date in this area has focused on relatively simple methods and coarse geographic resolutions (state level and above). We evaluate the predictive performance of a gated recurrent unit neural network approach in comparison with baseline machine learning methods for estimating influenza activity in the United States at the state and city levels and experiment with the inclusion of real-time Internet search data. We find that the neural network approach improves upon baseline models for long time horizons of prediction but is not improved by real-time internet search data. We conduct a thorough analysis of feature importances in all considered models for interpretability purposes.
Collapse
Affiliation(s)
- Emily L Aiken
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA.
| | - Andre T Nguyen
- Booz Allen Hamilton, Columbia, MD 21044, USA
- University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Cecile Viboud
- Fogarty International Center, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mauricio Santillana
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA.
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02215, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA 02215, USA
| |
Collapse
|
13
|
Jo YY, Han J, Park HW, Jung H, Lee JD, Jung J, Cha HS, Sohn DK, Hwangbo Y. Prediction of Prolonged Length of Hospital Stay After Cancer Surgery Using Machine Learning on Electronic Health Records: Retrospective Cross-sectional Study. JMIR Med Inform 2021; 9:e23147. [PMID: 33616544 PMCID: PMC7939945 DOI: 10.2196/23147] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 01/06/2021] [Accepted: 01/16/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Postoperative length of stay is a key indicator in the management of medical resources and an indirect predictor of the incidence of surgical complications and the degree of recovery of the patient after cancer surgery. Recently, machine learning has been used to predict complex medical outcomes, such as prolonged length of hospital stay, using extensive medical information. OBJECTIVE The objective of this study was to develop a prediction model for prolonged length of stay after cancer surgery using a machine learning approach. METHODS In our retrospective study, electronic health records (EHRs) from 42,751 patients who underwent primary surgery for 17 types of cancer between January 1, 2000, and December 31, 2017, were sourced from a single cancer center. The EHRs included numerous variables such as surgical factors, cancer factors, underlying diseases, functional laboratory assessments, general assessments, medications, and social factors. To predict prolonged length of stay after cancer surgery, we employed extreme gradient boosting classifier, multilayer perceptron, and logistic regression models. Prolonged postoperative length of stay for cancer was defined as bed-days of the group of patients who accounted for the top 50% of the distribution of bed-days by cancer type. RESULTS In the prediction of prolonged length of stay after cancer surgery, extreme gradient boosting classifier models demonstrated excellent performance for kidney and bladder cancer surgeries (area under the receiver operating characteristic curve [AUC] >0.85). A moderate performance (AUC 0.70-0.85) was observed for stomach, breast, colon, thyroid, prostate, cervix uteri, corpus uteri, and oral cancers. For stomach, breast, colon, thyroid, and lung cancers, with more than 4000 cases each, the extreme gradient boosting classifier model showed slightly better performance than the logistic regression model, although the logistic regression model also performed adequately. We identified risk variables for the prediction of prolonged postoperative length of stay for each type of cancer, and the importance of the variables differed depending on the cancer type. After we added operative time to the models trained on preoperative factors, the models generally outperformed the corresponding models using only preoperative variables. CONCLUSIONS A machine learning approach using EHRs may improve the prediction of prolonged length of hospital stay after primary cancer surgery. This algorithm may help to provide a more effective allocation of medical resources in cancer surgery.
Collapse
Affiliation(s)
- Yong-Yeon Jo
- Healthcare AI Team, National Cancer Center, Goyang, Republic of Korea
| | - JaiHong Han
- Department of Surgery, National Cancer Center, Goyang, Republic of Korea
| | - Hyun Woo Park
- Healthcare AI Team, National Cancer Center, Goyang, Republic of Korea
| | - Hyojung Jung
- Healthcare AI Team, National Cancer Center, Goyang, Republic of Korea
| | - Jae Dong Lee
- Healthcare AI Team, National Cancer Center, Goyang, Republic of Korea
| | - Jipmin Jung
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang, Republic of Korea
| | - Hyo Soung Cha
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang, Republic of Korea
| | - Dae Kyung Sohn
- Center for Colorectal Cancer, Research Institute and Hospital, National Cancer Center, Goyang, Republic of Korea
| | - Yul Hwangbo
- Healthcare AI Team, National Cancer Center, Goyang, Republic of Korea
| |
Collapse
|
14
|
Zou LX, Sun L. Analysis of Hemorrhagic Fever With Renal Syndrome Using Wavelet Tools in Mainland China, 2004-2019. Front Public Health 2020; 8:571984. [PMID: 33335877 PMCID: PMC7736046 DOI: 10.3389/fpubh.2020.571984] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 11/09/2020] [Indexed: 01/24/2023] Open
Abstract
Introduction : Hemorrhagic fever with renal syndrome (HFRS) is a life-threatening public health problem in China, accounting for ~90% of HFRS cases reported globally. Accurate analysis and prediction of the HFRS epidemic could help to establish effective preventive measures. Materials and Methods : In this study, the geographical information system (GIS) explored the spatiotemporal features of HFRS, the wavelet power spectrum (WPS) unfolded the cyclical fluctuation of HFRS, and the wavelet neural network (WNN) model predicted the trends of HFRS outbreaks in mainland China. Results : A total of 209,209 HFRS cases were reported in mainland China from 2004 to 2019, with the annual incidence ranged from 0 to 13.05 per 100,0000 persons at the province level. The WPS proved that the periodicity of HFRS could be half a year, 1 year, and roughly 7-year at different time intervals. The WNN structure of 12-6-1 was set up as the fittest forecasting model for the HFRS epidemic. Conclusions : This study provided several potential support tools for the control and risk-management of HFRS in China.
Collapse
Affiliation(s)
- Lu-Xi Zou
- School of Management, Zhejiang University, Hangzhou, China
| | - Ling Sun
- Department of Nephrology, Xuzhou Central Hospital, The Xuzhou School of Clinical Medicine of Nanjing Medical University, Xuzhou, China.,Xuzhou Clinical School of Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
15
|
Li Z, Li X, Porter D, Zhang J, Jiang Y, Olatosi B, Weissman S. Monitoring the Spatial Spread of COVID-19 and Effectiveness of Control Measures Through Human Movement Data: Proposal for a Predictive Model Using Big Data Analytics. JMIR Res Protoc 2020; 9:e24432. [PMID: 33301418 PMCID: PMC7752182 DOI: 10.2196/24432] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Revised: 12/03/2020] [Accepted: 12/08/2020] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Human movement is one of the forces that drive the spatial spread of infectious diseases. To date, reducing and tracking human movement during the COVID-19 pandemic has proven effective in limiting the spread of the virus. Existing methods for monitoring and modeling the spatial spread of infectious diseases rely on various data sources as proxies of human movement, such as airline travel data, mobile phone data, and banknote tracking. However, intrinsic limitations of these data sources prevent us from systematic monitoring and analyses of human movement on different spatial scales (from local to global). OBJECTIVE Big data from social media such as geotagged tweets have been widely used in human mobility studies, yet more research is needed to validate the capabilities and limitations of using such data for studying human movement at different geographic scales (eg, from local to global) in the context of global infectious disease transmission. This study aims to develop a novel data-driven public health approach using big data from Twitter coupled with other human mobility data sources and artificial intelligence to monitor and analyze human movement at different spatial scales (from global to regional to local). METHODS We will first develop a database with optimized spatiotemporal indexing to store and manage the multisource data sets collected in this project. This database will be connected to our in-house Hadoop computing cluster for efficient big data computing and analytics. We will then develop innovative data models, predictive models, and computing algorithms to effectively extract and analyze human movement patterns using geotagged big data from Twitter and other human mobility data sources, with the goal of enhancing situational awareness and risk prediction in public health emergency response and disease surveillance systems. RESULTS This project was funded as of May 2020. We have started the data collection, processing, and analysis for the project. CONCLUSIONS Research findings can help government officials, public health managers, emergency responders, and researchers answer critical questions during the pandemic regarding the current and future infectious risk of a state, county, or community and the effectiveness of social/physical distancing practices in curtailing the spread of the virus. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/24432.
Collapse
Affiliation(s)
- Zhenlong Li
- Geoinformation and Big Data Research Laboratory, Department of Geography, University of South Carolina, Columbia, SC, United States
| | - Xiaoming Li
- Department of Health Promotion, Education, and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, United States
| | - Dwayne Porter
- Department of Environmental Health Sciences, Arnold School of Public Health, University of South Carolina, Columbia, SC, United States
| | - Jiajia Zhang
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC, United States
| | - Yuqin Jiang
- Geoinformation and Big Data Research Laboratory, Department of Geography, University of South Carolina, Columbia, SC, United States
| | - Bankole Olatosi
- Department of Health Services Policy and Management, Arnold School of Public Health, University of South Carolina, Columbia, SC, United States
| | - Sharon Weissman
- Department of Internal Medicine, School of Medicine, University of South Carolina, Columbia, SC, United States
| |
Collapse
|
16
|
Qian W, Viennet E, Glass K, Harley D. Epidemiological models for predicting Ross River virus in Australia: A systematic review. PLoS Negl Trop Dis 2020; 14:e0008621. [PMID: 32970673 PMCID: PMC7537878 DOI: 10.1371/journal.pntd.0008621] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 10/06/2020] [Accepted: 07/20/2020] [Indexed: 01/18/2023] Open
Abstract
Ross River virus (RRV) is the most common and widespread arbovirus in Australia. Epidemiological models of RRV increase understanding of RRV transmission and help provide early warning of outbreaks to reduce incidence. However, RRV predictive models have not been systematically reviewed, analysed, and compared. The hypothesis of this systematic review was that summarising the epidemiological models applied to predict RRV disease and analysing model performance could elucidate drivers of RRV incidence and transmission patterns. We performed a systematic literature search in PubMed, EMBASE, Web of Science, Cochrane Library, and Scopus for studies of RRV using population-based data, incorporating at least one epidemiological model and analysing the association between exposures and RRV disease. Forty-three articles, all of high or medium quality, were included. Twenty-two (51.2%) used generalised linear models and 11 (25.6%) used time-series models. Climate and weather data were used in 27 (62.8%) and mosquito abundance or related data were used in 14 (32.6%) articles as model covariates. A total of 140 models were included across the articles. Rainfall (69 models, 49.3%), temperature (66, 47.1%) and tide height (45, 32.1%) were the three most commonly used exposures. Ten (23.3%) studies published data related to model performance. This review summarises current knowledge of RRV modelling and reveals a research gap in comparing predictive methods. To improve predictive accuracy, new methods for forecasting, such as non-linear mixed models and machine learning approaches, warrant investigation.
Collapse
Affiliation(s)
- Wei Qian
- Mater Research Institute‐University of Queensland (MRI‐UQ), Brisbane, Queensland, Australia
| | - Elvina Viennet
- Research and Development, Australian Red Cross Lifeblood, Brisbane, Queensland, Australia
- Institute for Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology (QUT), Queensland, Australia
| | - Kathryn Glass
- Research School of Population Health, Australian National University, Acton, Australian Capital Territory, Australia
| | - David Harley
- Mater Research Institute‐University of Queensland (MRI‐UQ), Brisbane, Queensland, Australia
| |
Collapse
|
17
|
Edo-Osagie O, De La Iglesia B, Lake I, Edeghere O. A scoping review of the use of Twitter for public health research. Comput Biol Med 2020; 122:103770. [PMID: 32502758 PMCID: PMC7229729 DOI: 10.1016/j.compbiomed.2020.103770] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 04/01/2020] [Accepted: 04/17/2020] [Indexed: 11/25/2022]
Abstract
Public health practitioners and researchers have used traditional medical databases to study and understand public health for a long time. Recently, social media data, particularly Twitter, has seen some use for public health purposes. Every large technological development in history has had an impact on the behaviour of society. The advent of the internet and social media is no different. Social media creates public streams of communication, and scientists are starting to understand that such data can provide some level of access into the people's opinions and situations. As such, this paper aims to review and synthesize the literature on Twitter applications for public health, highlighting current research and products in practice. A scoping review methodology was employed and four leading health, computer science and cross-disciplinary databases were searched. A total of 755 articles were retreived, 92 of which met the criteria for review. From the reviewed literature, six domains for the application of Twitter to public health were identified: (i) Surveillance; (ii) Event Detection; (iii) Pharmacovigilance; (iv) Forecasting; (v) Disease Tracking; and (vi) Geographic Identification. From our review, we were able to obtain a clear picture of the use of Twitter for public health. We gained insights into interesting observations such as how the popularity of different domains changed with time, the diseases and conditions studied and the different approaches to understanding each disease, which algorithms and techniques were popular with each domain, and more.
Collapse
Affiliation(s)
- Oduwa Edo-Osagie
- School of Computing Science, University of East Anglia, Norwich, NR4 7TJ, UK.
| | | | - Iain Lake
- School of Environmental Science, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Obaghe Edeghere
- National Infection Service, Public Health England, Birmingham, B3 2PW, UK
| |
Collapse
|
18
|
Choi SB, Kim J, Ahn I. Forecasting type-specific seasonal influenza after 26 weeks in the United States using influenza activities in other countries. PLoS One 2019; 14:e0220423. [PMID: 31765386 PMCID: PMC6876883 DOI: 10.1371/journal.pone.0220423] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Accepted: 11/04/2019] [Indexed: 12/21/2022] Open
Abstract
To identify countries that have seasonal patterns similar to the time series of influenza surveillance data in the United States and other countries, and to forecast the 2018-2019 seasonal influenza outbreak in the U.S., we collected the surveillance data of 164 countries using the FluNet database, search queries from Google Trends, and temperature from 2010 to 2018. Data for influenza-like illness (ILI) in the U.S. were collected from the Fluview database. We identified the time lag between two time-series which were weekly surveillances for ILI, total influenza (Total INF), influenza A (INF A), and influenza B (INF B) viruses between two countries using cross-correlation analysis. In order to forecast ILI, Total INF, INF A, and INF B of next season (after 26 weeks) in the U.S., we developed prediction models using linear regression, auto regressive integrated moving average, and an artificial neural network (ANN). As a result of cross-correlation analysis between the countries located in northern and southern hemisphere, the seasonal influenza patterns in Australia and Chile showed a high correlation with those of the U.S. 22 weeks and 28 weeks earlier, respectively. The R2 score of ANN models for ILI for validation set in 2015-2019 was 0.758 despite how hard it is to forecast 26 weeks ahead. Our prediction models forecast that the ILI for the U.S. in 2018-2019 may be later and less severe than those in 2017-2018, judging from the influenza activity for Australia and Chile in 2018. It allows to estimate peak timing, peak intensity, and type-specific influenza activities for next season at 40th week. The correlation between seasonal influenza patterns in the U.S., Australia, and Chile could be used to forecast the next seasonal influenza pattern, which can help to determine influenza vaccine strategy approximately six months ahead in the U.S.
Collapse
Affiliation(s)
- Soo Beom Choi
- Department of Data-centric Problem Solving Research, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, Republic of Korea
| | - Juhyeon Kim
- Department of Data-centric Problem Solving Research, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, Republic of Korea
| | - Insung Ahn
- Department of Data-centric Problem Solving Research, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology, Daejeon, Republic of Korea
| |
Collapse
|
19
|
Zhu X, Fu B, Yang Y, Ma Y, Hao J, Chen S, Liu S, Li T, Liu S, Guo W, Liao Z. Attention-based recurrent neural network for influenza epidemic prediction. BMC Bioinformatics 2019; 20:575. [PMID: 31760945 PMCID: PMC6876090 DOI: 10.1186/s12859-019-3131-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Influenza is an infectious respiratory disease that can cause serious public health hazard. Due to its huge threat to the society, precise real-time forecasting of influenza outbreaks is of great value to our public. RESULTS In this paper, we propose a new deep neural network structure that forecasts a real-time influenza-like illness rate (ILI%) in Guangzhou, China. Long short-term memory (LSTM) neural networks is applied to precisely forecast accurateness due to the long-term attribute and diversity of influenza epidemic data. We devise a multi-channel LSTM neural network that can draw multiple information from different types of inputs. We also add attention mechanism to improve forecasting accuracy. By using this structure, we are able to deal with relationships between multiple inputs more appropriately. Our model fully consider the information in the data set, targetedly solving practical problems of the Guangzhou influenza epidemic forecasting. CONCLUSION We assess the performance of our model by comparing it with different neural network structures and other state-of-the-art methods. The experimental results indicate that our model has strong competitiveness and can provide effective real-time influenza epidemic forecasting.
Collapse
Affiliation(s)
- Xianglei Zhu
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Bofeng Fu
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Yaodong Yang
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Yu Ma
- Guangzhou Center for Disease Control and Prevention, Guangzhou, 510440 China
| | - Jianye Hao
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Siqi Chen
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Shuang Liu
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Tiegang Li
- Guangzhou Center for Disease Control and Prevention, Guangzhou, 510440 China
| | - Sen Liu
- Automotive Data Center, China Automotive Technology & Research, Tianjin, 300300 China
| | - Weiming Guo
- Automotive Data Center, China Automotive Technology & Research, Tianjin, 300300 China
| | - Zhenyu Liao
- Pony Testing International Group, Tianjin, 300051 China
- Tianjin FoodSafety Inspection Technology Institute, Tianjin, 300300 China
| |
Collapse
|
20
|
Rangarajan P, Mody SK, Marathe M. Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data. PLoS Comput Biol 2019; 15:e1007518. [PMID: 31751346 PMCID: PMC6894887 DOI: 10.1371/journal.pcbi.1007518] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 12/05/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022] Open
Abstract
Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world’s population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time. Dengue and influenza-like illness (ILI) are leading causes of viral infection in the world and hence it is important to develop accurate methods for forecasting their incidence. We use Autoregressive Likelihood Ratio method, which is a computationally efficient implementation of the variable selection method, in order to obtain a sparse (non-lasso) representation of time series, Google Trends and electronic health records (for ILI) data. This method is used to forecast dengue incidence in five countries/states and ILI incidence in USA. We show that this method outperforms existing time series methods in forecasting these diseases. The method is general and can also be used to forecast other diseases.
Collapse
Affiliation(s)
- Prashant Rangarajan
- Departments of Computer Science and Mathematics, Birla Institute of Technology and Science, Pilani, India
| | - Sandeep K. Mody
- Department of Mathematics, Indian Institute of Science, Bangalore, India
| | - Madhav Marathe
- Department of Computer Science, Network, Simulation Science and Advanced Computing Division, Biocomplexity Institute, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| |
Collapse
|
21
|
Su K, Xu L, Li G, Ruan X, Li X, Deng P, Li X, Li Q, Chen X, Xiong Y, Lu S, Qi L, Shen C, Tang W, Rong R, Hong B, Ning Y, Long D, Xu J, Shi X, Yang Z, Zhang Q, Zhuang Z, Zhang L, Xiao J, Li Y. Forecasting influenza activity using self-adaptive AI model and multi-source data in Chongqing, China. EBioMedicine 2019; 47:284-292. [PMID: 31477561 PMCID: PMC6796527 DOI: 10.1016/j.ebiom.2019.08.024] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 08/09/2019] [Accepted: 08/09/2019] [Indexed: 02/05/2023] Open
Abstract
Background Early detection of influenza activity followed by timely response is a critical component of preparedness for seasonal influenza epidemic and influenza pandemic. However, most relevant studies were conducted at the regional or national level with regular seasonal influenza trends. There are few feasible strategies to forecast influenza activity at the local level with irregular trends. Methods Multi-source electronic data, including historical percentage of influenza-like illness (ILI%), weather data, Baidu search index and Sina Weibo data of Chongqing, China, were collected and integrated into an innovative Self-adaptive AI Model (SAAIM), which was constructed by integrating Seasonal Autoregressive Integrated Moving Average model and XGBoost model using a self-adaptive weight adjustment mechanism. SAAIM was applied to ILI% forecast in Chongqing from 2017 to 2018, of which the performance was compared with three previously available models on forecasting. Findings ILI% showed an irregular seasonal trend from 2012 to 2018 in Chongqing. Compared with three reference models, SAAIM achieved the best performance on forecasting ILI% of Chongqing with the mean absolute percentage error (MAPE) of 11·9%, 7·5%, and 11·9% during the periods of the year 2014–2016, 2017, and 2018 respectively. Among the three categories of source data, historical influenza activity contributed the most to the forecast accuracy by decreasing the MAPE by 19·6%, 43·1%, and 11·1%, followed by weather information (MAPE reduced by 3·3%, 17·1%, and 2·2%), and Internet-related public sentiment data (MAPE reduced by 1·1%, 0·9%, and 1·3%). Interpretation Accurate influenza forecast in areas with irregular seasonal influenza trends can be made by SAAIM with multi-source electronic data.
Collapse
Affiliation(s)
- Kun Su
- Department of Epidemiology, College of Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing, People's Republic of China; Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Liang Xu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Guanqiao Li
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Xiaowen Ruan
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xian Li
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Pan Deng
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xinmi Li
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Qin Li
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Xianxian Chen
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Yu Xiong
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Shaofeng Lu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Li Qi
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Chaobo Shen
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Wenge Tang
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Rong Rong
- Chongqing Municipal Center for Disease Control and Prevention, Chongqing, People's Republic of China
| | - Boran Hong
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Yi Ning
- Meinian Institute of Health, Beijing, People's Republic of China
| | - Dongyan Long
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Jiaying Xu
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Xuanling Shi
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Zhihong Yang
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Qi Zhang
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China
| | - Ziqi Zhuang
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China
| | - Linqi Zhang
- Comprehensive AIDS Research Center and Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Medicine, Tsinghua University, Beijing, People's Republic of China.
| | - Jing Xiao
- Ping An Technology (Shenzhen) Co., Ltd, Shenzhen, People's Republic of China.
| | - Yafei Li
- Department of Epidemiology, College of Preventive Medicine, Army Medical University (Third Military Medical University), Chongqing, People's Republic of China.
| |
Collapse
|
22
|
Zhang T, Ma Y, Xiao X, Lin Y, Zhang X, Yin F, Li X. Dynamic Bayesian network in infectious diseases surveillance: a simulation study. Sci Rep 2019; 9:10376. [PMID: 31316113 PMCID: PMC6637193 DOI: 10.1038/s41598-019-46737-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Accepted: 07/04/2019] [Indexed: 11/09/2022] Open
Abstract
The surveillance of infectious diseases relies on the identification of dynamic relations between the infectious diseases and corresponding influencing factors. However, the identification task confronts with two practical challenges: small sample size and delayed effect. To overcome both challenges to imporve the identification results, this study evaluated the performance of dynamic Bayesian network(DBN) in infectious diseases surveillance. Specifically, the evaluation was conducted by two simulations. The first simulation was to evaluate the performance of DBN by comparing it with the Granger causality test and the least absolute shrinkage and selection operator (LASSO) method; and the second simulation was to assess how the DBN could improve the forecasting ability of infectious diseases. In order to make both simulations close to the real-world situation as much as possible, their simulation scenarios were adapted from real-world studies, and practical issues such as nonlinearity and nuisance variables were also considered. The main simulation results were: ① When the sample size was large (n = 340), the true positive rates (TPRs) of DBN (≥98%) were slightly higher than those of the Granger causality method and approximately the same as those of the LASSO method; the false positive rates (FPRs) of DBN were averagely 46% less than those of the Granger causality test, and 22% less than those of the LASSO method. ② When the sample size was small, the main problem was low TPR, which would be further aggravated by the issues of nonlinearity and nuisance variables. In the worst situation (i.e., small sample size, nonlinearity and existence of nuisance variables), the TPR of DBN declined to 43.30%. However, it was worth noting that such decline could also be found in the corresponding results of Granger causality test and LASSO method. ③ Sample size was important for identifying the dynamic relations among multiple variables, in this case, at least three years of weekly historical data were needed to guarantee the quality of infectious diseases surveillance. ④ DBN could improve the foresting results through reducing forecasting errors by 7%. According to the above results, DBN is recommended to improve the quality of infectious diseases surveillance.
Collapse
Affiliation(s)
- Tao Zhang
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Sichuan, China
| | - Yue Ma
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Sichuan, China.
| | - Xiong Xiao
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Sichuan, China
| | - Yun Lin
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Sichuan, China
| | - Xingyu Zhang
- Department of Systems, Populations and Leadership, University of Michigan, School of Nursing, Ann Arbor, USA.
| | - Fei Yin
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Sichuan, China.
| | - Xiaosong Li
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Sichuan, China
| |
Collapse
|
23
|
Tapak L, Hamidi O, Fathian M, Karami M. Comparative evaluation of time series models for predicting influenza outbreaks: application of influenza-like illness data from sentinel sites of healthcare centers in Iran. BMC Res Notes 2019; 12:353. [PMID: 31234938 PMCID: PMC6591835 DOI: 10.1186/s13104-019-4393-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 06/17/2019] [Indexed: 11/24/2022] Open
Abstract
Objective Forecasting the time of future outbreaks would minimize the impact of diseases by taking preventive steps including public health messaging and raising awareness of clinicians for timely treatment and diagnosis. The present study investigated the accuracy of support vector machine, artificial neural-network, and random-forest time series models in influenza like illness (ILI) modeling and outbreaks detection. The models were applied to a data set of weekly ILI frequencies in Iran. The root mean square errors (RMSE), mean absolute errors (MAE), and intra-class correlation coefficient (ICC) statistics were employed as evaluation criteria. Results It was indicated that the random-forest time series model outperformed other three methods in modeling weekly ILI frequencies (RMSE = 22.78, MAE = 14.99 and ICC = 0.88 for the test set). In addition neural-network was better in outbreaks detection with total accuracy of 0.889 for the test set. The results showed that the used time series models had promising performances suggesting they could be effectively applied for predicting weekly ILI frequencies and outbreaks. Electronic supplementary material The online version of this article (10.1186/s13104-019-4393-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Leili Tapak
- Department of Biostatistics, School of Public Health, Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Omid Hamidi
- Department of Science, Hamedan University of Technology, Hamedan, 65155, Iran.
| | - Mohsen Fathian
- Office of Information Technology, Hamedan Electrical Power Distribution Company, Hamedan, Iran
| | - Manoochehr Karami
- Department of Epidemiology, School of Public Health, Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran
| |
Collapse
|
24
|
Kamel Boulos MN, Peng G, VoPham T. An overview of GeoAI applications in health and healthcare. Int J Health Geogr 2019; 18:7. [PMID: 31043176 PMCID: PMC6495523 DOI: 10.1186/s12942-019-0171-2] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 04/09/2019] [Indexed: 01/01/2023] Open
Abstract
The moulding together of artificial intelligence (AI) and the geographic/geographic information systems (GIS) dimension creates GeoAI. There is an emerging role for GeoAI in health and healthcare, as location is an integral part of both population and individual health. This article provides an overview of GeoAI technologies (methods, tools and software), and their current and potential applications in several disciplines within public health, precision medicine, and Internet of Things-powered smart healthy cities. The potential challenges currently facing GeoAI research and applications in health and healthcare are also briefly discussed.
Collapse
Affiliation(s)
- Maged N. Kamel Boulos
- School of Information Management, Sun Yat-sen University, East Campus, Guangzhou, 510006 Guangdong China
| | - Guochao Peng
- School of Information Management, Sun Yat-sen University, East Campus, Guangzhou, 510006 Guangdong China
| | - Trang VoPham
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, 181 Longwood Ave, Boston, MA 02115 USA
| |
Collapse
|
25
|
Xue H, Bai Y, Hu H, Liang H. Regional level influenza study based on Twitter and machine learning method. PLoS One 2019; 14:e0215600. [PMID: 31013324 PMCID: PMC6478375 DOI: 10.1371/journal.pone.0215600] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Accepted: 04/04/2019] [Indexed: 11/28/2022] Open
Abstract
The significance of flu prediction is that the appropriate preventive and control measures can be taken by relevant departments after assessing predicted data; thus, morbidity and mortality can be reduced. In this paper, three flu prediction models, based on twitter and US Centers for Disease Control's (CDC's) Influenza-Like Illness (ILI) data, are proposed (models 1-3) to verify the factors that affect the spread of the flu. In this work, an Improved Particle Swarm Optimization algorithm to optimize the parameters of Support Vector Regression (IPSO-SVR) was proposed. The IPSO-SVR was trained by the independent and dependent variables of the three models (models 1-3) as input and output. The trained IPSO-SVR method was used to predict the regional unweighted percentage ILI (%ILI) events in the US. The prediction results of each model are analyzed and compared. The results show that the IPSO-SVR method (model 3) demonstrates excellent performance in real-time prediction of ILIs, and further highlights the benefits of using real-time twitter data, thus providing an effective means for the prevention and control of flu.
Collapse
Affiliation(s)
- Hongxin Xue
- School of Information and Communication Engineering, North University of China, Taiyuan, Shanxi, 030051, People’s Republic of China
- Department of Mathematics, School of Science, North University of China, Taiyuan, Shanxi, 030051, People’s Republic of China
| | - Yanping Bai
- Department of Mathematics, School of Science, North University of China, Taiyuan, Shanxi, 030051, People’s Republic of China
| | - Hongping Hu
- Department of Mathematics, School of Science, North University of China, Taiyuan, Shanxi, 030051, People’s Republic of China
| | - Haijian Liang
- National Key Laboratory for Electronic Measurement Technology, Key Laboratory of Instrumentation Science & Dynamic Measurement Ministry of Educations, School of Information and Communication Engineering, North University of China, Taiyuan, Shanxi, 030051, People’s Republic of China
| |
Collapse
|
26
|
Ning S, Yang S, Kou SC. Accurate regional influenza epidemics tracking using Internet search data. Sci Rep 2019; 9:5238. [PMID: 30918276 PMCID: PMC6437143 DOI: 10.1038/s41598-019-41559-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 03/12/2019] [Indexed: 12/12/2022] Open
Abstract
Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users' online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.
Collapse
Affiliation(s)
- Shaoyang Ning
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA
| | - Shihao Yang
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA
| | - S C Kou
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA, USA.
| |
Collapse
|
27
|
Dhewantara PW, Lau CL, Allan KJ, Hu W, Zhang W, Mamun AA, Soares Magalhães RJ. Spatial epidemiological approaches to inform leptospirosis surveillance and control: A systematic review and critical appraisal of methods. Zoonoses Public Health 2018; 66:185-206. [PMID: 30593736 DOI: 10.1111/zph.12549] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Accepted: 11/19/2018] [Indexed: 12/17/2022]
Abstract
Leptospirosis is a global zoonotic disease that the transmission is driven by complex geographical and temporal variation in demographics, animal hosts and socioecological factors. This results in complex challenges for the identification of high-risk areas. Spatial and temporal epidemiological tools could be used to support leptospirosis control programs, but the adequacy of its application has not been evaluated. We searched literature in six databases including PubMed, Web of Science, EMBASE, Scopus, SciELO and Zoological Record to systematically review and critically assess the use of spatial and temporal analytical tools for leptospirosis and to provide general framework for its application in future studies. We reviewed 115 articles published between 1930 and October 2018 from 41 different countries. Of these, 65 (56.52%) articles were on human leptospirosis, 39 (33.91%) on animal leptospirosis and 11 (9.5%) used data from both human and animal leptospirosis. Spatial analytical (n = 106) tools were used to describe the distribution of incidence/prevalence at various geographical scales (96.5%) and to explored spatial patterns to detect clustering and hot spots (33%). A total of 51 studies modelled the relationships of various variables on the risk of human (n = 31), animal (n = 17) and both human and animal infection (n = 3). Among those modelling studies, few studies had generated spatially structured models and predictive maps of human (n = 2/31) and animal leptospirosis (n = 1/17). In addition, nine studies applied time-series analytical tools to predict leptospirosis incidence. Spatial and temporal analytical tools have been greatly utilized to improve our understanding on leptospirosis epidemiology. Yet the quality of the epidemiological data, the selection of covariates and spatial analytical techniques should be carefully considered in future studies to improve usefulness of evidence as tools to support leptospirosis control. A general framework for the application of spatial analytical tools for leptospirosis was proposed.
Collapse
Affiliation(s)
- Pandji W Dhewantara
- UQ Spatial Epidemiology Laboratory, School of Veterinary Science, The University of Queensland, Gatton, Queensland, Australia.,Pangandaran Unit for Health Research and Development, National Health Research and Development, Ministry of Health of Indonesia, Pangandaran, West Java, Indonesia
| | - Colleen L Lau
- Research School of Population Health, Australian National University, Canberra, Australian Capital Territory, Australia.,Child Health Research Centre, The University of Queensland, Brisbane, Queensland, Australia
| | - Kathryn J Allan
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Wenbiao Hu
- School of Public Health and Social Work, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Wenyi Zhang
- Center for Disease Surveillance and Research, Institute of Disease Control and Prevention of PLA, Beijing, China
| | - Abdullah A Mamun
- Faculty of Humanities and Social Sciences, Institute for Social Science Research, The University of Queensland, Brisbane, Queensland, Australia
| | - Ricardo J Soares Magalhães
- UQ Spatial Epidemiology Laboratory, School of Veterinary Science, The University of Queensland, Gatton, Queensland, Australia.,Child Health Research Centre, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|