1
|
Sinwar D, Dhaka VS, Tesfaye BA, Raghuwanshi G, Kumar A, Maakar SK, Agrawal S. Artificial Intelligence and Deep Learning Assisted Rapid Diagnosis of COVID-19 from Chest Radiographical Images: A Survey. CONTRAST MEDIA & MOLECULAR IMAGING 2022; 2022:1306664. [PMID: 36304775 PMCID: PMC9581633 DOI: 10.1155/2022/1306664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/06/2022] [Accepted: 09/27/2022] [Indexed: 01/26/2023]
Abstract
Artificial Intelligence (AI) has been applied successfully in many real-life domains for solving complex problems. With the invention of Machine Learning (ML) paradigms, it becomes convenient for researchers to predict the outcome based on past data. Nowadays, ML is acting as the biggest weapon against the COVID-19 pandemic by detecting symptomatic cases at an early stage and warning people about its futuristic effects. It is observed that COVID-19 has blown out globally so much in a short period because of the shortage of testing facilities and delays in test reports. To address this challenge, AI can be effectively applied to produce fast as well as cost-effective solutions. Plenty of researchers come up with AI-based solutions for preliminary diagnosis using chest CT Images, respiratory sound analysis, voice analysis of symptomatic persons with asymptomatic ones, and so forth. Some AI-based applications claim good accuracy in predicting the chances of being COVID-19-positive. Within a short period, plenty of research work is published regarding the identification of COVID-19. This paper has carefully examined and presented a comprehensive survey of more than 110 papers that came from various reputed sources, that is, Springer, IEEE, Elsevier, MDPI, arXiv, and medRxiv. Most of the papers selected for this survey presented candid work to detect and classify COVID-19, using deep-learning-based models from chest X-Rays and CT scan images. We hope that this survey covers most of the work and provides insights to the research community in proposing efficient as well as accurate solutions for fighting the pandemic.
Collapse
Affiliation(s)
- Deepak Sinwar
- Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur, India
| | - Vijaypal Singh Dhaka
- Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur, India
| | - Biniyam Alemu Tesfaye
- Department of Computer Science, College of Informatics, Bule Hora University, Bule Hora, Ethiopia
| | - Ghanshyam Raghuwanshi
- Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur, India
| | - Ashish Kumar
- Department of Mathematics and Statistics, Manipal University Jaipur, Jaipur, India
| | - Sunil Kr. Maakar
- School of Computing Science & Engineering, Galgotias University, Greater Noida, India
| | - Sanjay Agrawal
- Department of Electrical Engineering, Rajkiya Engineering College, Akbarpur, Ambedkar Nagar, India
| |
Collapse
|
2
|
Paul R, Han D, DeDoncker E, Prieto D. Dynamic downscaling and daily nowcasting from influenza surveillance data. Stat Med 2022; 41:4159-4175. [PMID: 35718471 PMCID: PMC9544787 DOI: 10.1002/sim.9502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 04/30/2022] [Accepted: 05/31/2022] [Indexed: 11/08/2022]
Abstract
Real-time trends from surveillance data are important to assess and develop preparedness for influenza outbreaks. The overwhelming testing demand and limited capacity of testing laboratories for viral positivity render daily confirmed case data inaccurate and delay its availability in preparedness. Using Bayesian dynamic downscaling models, we obtained posterior estimates for daily influenza incidences from weekly estimates of the Centers for Disease Control and Prevention and daily reported constitutional and respiratory complaints during emergency department (ED) visits obtained from the state health departments. Our model provides one-day and seven-day lead forecasts along with 95 % $$ \% $$ prediction intervals. Our hybrid Markov Chain Monte Carlo and Kalman filter algorithms facilitate faster computation and enable us to update our estimates as new data become available. Our method is tested and validated using the State of Michigan data over the years 2009-2013. Reported constitutional and respiratory complaints at the EDs showed strong correlations of 0.81 and 0.68 respectively, with influenza rates. In general, our forecast model can be adapted to track an outbreak with only one respiratory virus as a causative agent.
Collapse
Affiliation(s)
- Rajib Paul
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina, USA
| | - Dan Han
- Department of Mathematics, University of Louisville, Louisville, Kentucky, USA
| | - Elise DeDoncker
- Department of Computer Science, Western Michigan University, Kalamazoo, Michigan, USA
| | - Diana Prieto
- Carey School of Business, Johns Hopkins University, Baltimore, Maryland, USA.,School of Industrial Engineering, Pontificia Universdad de Catòlica de Valparaìso, Valparaìso, Chile
| |
Collapse
|
3
|
Fioriti V, Chinnici M, Arbore A, Sigismondi N, Roselli I. Estimating the epidemic growth dynamics within the first week. Heliyon 2021; 7:e08422. [PMID: 34816052 PMCID: PMC8600919 DOI: 10.1016/j.heliyon.2021.e08422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 10/22/2021] [Accepted: 11/15/2021] [Indexed: 11/20/2022] Open
Abstract
Information about the early growth of infectious outbreaks is indispensable to estimate the epidemic spreading. A large number of mathematical tools have been developed to this end, facing as much large number of different dynamic evolutions, ranging from sub-linear to super-exponential growth. Of course, the crucial point is that we do not have enough data during the initial outbreak phase to make reliable inferences. Here we propose a straightforward methodology to estimate the epidemic growth dynamic from the cumulative infected data of just a week, provided a surveillance system is available over the whole territory. The methodology, based on the Newcomb-Benford Law, is applied to the Italian covid 19 case-study. Results show that it is possible to discriminate the epidemic dynamics using the first seven data points collected in fifty Italian cities. Moreover, the most probable approximating function of the growth within a six-week epidemic scenario is identified.
Collapse
Affiliation(s)
| | - Marta Chinnici
- ENEA- C.R Casaccia, Via Anguillarese 301, Rome, 00123, Italy
| | | | | | - Ivan Roselli
- ENEA- C.R Casaccia, Via Anguillarese 301, Rome, 00123, Italy
| |
Collapse
|
4
|
Srivastava A, Chowell G. Modeling Study: Characterizing the Spatial Heterogeneity of the COVID-19 Pandemic through Shape Analysis of Epidemic Curves. RESEARCH SQUARE 2021:rs.3.rs-223226. [PMID: 33655241 PMCID: PMC7924281 DOI: 10.21203/rs.3.rs-223226/v1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Background The COVID-19 incidence rates across different geographical regions (e.g., counties in a state, states in a nation, countries in a continent) follow different shapes and patterns. The overall summaries at coarser spatial scales, that are obtained by simply averaging individual curves (across regions), hide nuanced variability and blur the spatial heterogeneity at finer spatial scales. For instance, a decreasing incidence rate curve in one region is obscured by an increasing rate curve for another region, when the analysis relies on coarse averages of locally heterogeneous transmission dynamics. Objective To highlight regional differences in COVID-19 incidence rates and to discover prominent patterns in shapes of incidence rate curves in multiple regions (USA and Europe). Methods We employ statistical methods to analyze shapes of local COVID-19 incidence rate curves and statistically group them into distinct clusters, according to their shapes. Using this information, we derive the so-called shape averages of curves within these clusters, which represent the dominant incidence patterns of these clusters. We apply this methodology to the analysis of the daily incidence trajectory of the COVID-pandemic for two geographic areas: A state-level analysis within the USA and a country-level analysis within Europe during late-February to October 1st, 2020. Results Our analyses reveal that pandemic curves often differ substantially across regions. However, there are only a handful of shapes that dominate transmission dynamics for all states in the USA and countries in Europe. This approach yields a broad classification of spatial areas into different characteristic epidemic trajectories. In particular, spatial areas within the same cluster have followed similar transmission and control dynamics. Conclusion The shape-based analysis of pandemic curves presented here helps divide country or continental data into multiple regional clusters, each cluster containing areas with similar trend patterns. This clustering helps highlight differences in pandemic curves across regions and provides summaries that better reflect dynamical patterns within the clusters. This approach adds to the methodological toolkit for public health practitioners to facilitate decision making at different spatial scales.
Collapse
Affiliation(s)
- Anuj Srivastava
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Gerardo Chowell
- Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| |
Collapse
|
5
|
Srivastava A, Chowell G. Understanding Spatial Heterogeneity of COVID-19 Pandemic Using Shape Analysis of Growth Rate Curves. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.05.25.20112433. [PMID: 32511500 PMCID: PMC7273268 DOI: 10.1101/2020.05.25.20112433] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The growth rates of COVID-19 across different geographical regions (e.g., states in a nation, countries in a continent) follow different shapes and patterns. The overall summaries at coarser spatial scales that are obtained by simply averaging individual curves (across regions) obscure nuanced variability and blurs the spatial heterogeneity at finer spatial scales. We employ statistical methods to analyze shapes of local COVID-19 growth rate curves and statistically group them into distinct clusters, according to their shapes. Using this information, we derive the so-called elastic averages of curves within these clusters, which correspond to the dominant incidence patterns. We apply this methodology to the analysis of the daily incidence trajectory of the COVID-pandemic at two spatial scales: A state-level analysis within the USA and a country-level analysis within Europe during mid-February to mid-May, 2020. Our analyses reveal a few dominant incidence trajectories that characterize transmission dynamics across states in the USA and across countries in Europe. This approach results in broad classifications of spatial areas into different trajectories and adds to the methodological toolkit for guiding public health decision making at different spatial scales.
Collapse
|
6
|
Fong SJ, Li G, Dey N, Crespo RG, Herrera-Viedma E. Composite Monte Carlo decision making under high uncertainty of novel coronavirus epidemic using hybridized deep learning and fuzzy rule induction. Appl Soft Comput 2020; 93:106282. [PMID: 32362799 PMCID: PMC7195106 DOI: 10.1016/j.asoc.2020.106282] [Citation(s) in RCA: 124] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 04/03/2020] [Accepted: 04/07/2020] [Indexed: 11/27/2022]
Abstract
In the advent of the novel coronavirus epidemic since December 2019, governments and authorities have been struggling to make critical decisions under high uncertainty at their best efforts. In computer science, this represents a typical problem of machine learning over incomplete or limited data in early epidemic Composite Monte-Carlo (CMC) simulation is a forecasting method which extrapolates available data which are broken down from multiple correlated/casual micro-data sources into many possible future outcomes by drawing random samples from some probability distributions. For instance, the overall trend and propagation of the infested cases in China are influenced by the temporal–spatial data of the nearby cities around the Wuhan city (where the virus is originated from), in terms of the population density, travel mobility, medical resources such as hospital beds and the timeliness of quarantine control in each city etc. Hence a CMC is reliable only up to the closeness of the underlying statistical distribution of a CMC, that is supposed to represent the behaviour of the future events, and the correctness of the composite data relationships. In this paper, a case study of using CMC that is enhanced by deep learning network and fuzzy rule induction for gaining better stochastic insights about the epidemic development is experimented. Instead of applying simplistic and uniform assumptions for a MC which is a common practice, a deep learning-based CMC is used in conjunction of fuzzy rule induction techniques. As a result, decision makers are benefited from a better fitted MC outputs complemented by min–max rules that foretell about the extreme ranges of future possibilities with respect to the epidemic. Composite Monte-Carlo (CMC) simulation is a forecasting method. A case study of using CMC through deep learning network is developed. Decision makers are benefited from a better fitted Monte Carlo outputs. Novel Coronavirus Epidemic is studied.
Collapse
Affiliation(s)
- Simon James Fong
- Department of Computer and Information Science, University of Macau, Macau, SAR, China
- DACC Laboratory, Zhuhai Institutes of Advanced Technology of the Chinese Academy of Sciences, China
- Corresponding author at: Department of Computer and Information Science, University of Macau, Macau, SAR, China.
| | - Gloria Li
- DACC Laboratory, Zhuhai Institutes of Advanced Technology of the Chinese Academy of Sciences, China
| | - Nilanjan Dey
- Department of Information Technology, Techno India College of Technology, India
- Corresponding author.
| | | | | |
Collapse
|
7
|
Yuan M, Boston-Fisher N, Luo Y, Verma A, Buckeridge DL. A systematic review of aberration detection algorithms used in public health surveillance. J Biomed Inform 2019; 94:103181. [PMID: 31014979 DOI: 10.1016/j.jbi.2019.103181] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 04/16/2019] [Accepted: 04/17/2019] [Indexed: 12/21/2022]
Abstract
The algorithms used for detecting anomalies have evolved substantially over the last decade to take advantage of advances in informatics and to accommodate changes in surveillance data. We identified 145 studies since 2007 that evaluated statistical methods used to detect aberrations in public health surveillance data. For each study, we classified the analytic methods and reviewed the evaluation metrics. We also summarized the practical usage of the detection algorithms in public health surveillance systems worldwide. Traditional methods (e.g., control charts, linear regressions) were the focus of most evaluation studies and continue to be used commonly in practice. There was, however, an increase in the number of studies using forecasting methods and studies applying machine learning methods, hidden Markov models, and Bayesian framework to multivariate datasets. Evaluation studies demonstrated improved accuracy with more sophisticated methods, but these methods do not appear to be used widely in public health practice.
Collapse
Affiliation(s)
- Mengru Yuan
- Clinical and Health Informatics Research Group, McGill University, 1140 Pine Avenue West, Montreal, QC H3A 1A3, Canada
| | - Nikita Boston-Fisher
- Clinical and Health Informatics Research Group, McGill University, 1140 Pine Avenue West, Montreal, QC H3A 1A3, Canada
| | - Yu Luo
- Clinical and Health Informatics Research Group, McGill University, 1140 Pine Avenue West, Montreal, QC H3A 1A3, Canada
| | - Aman Verma
- Clinical and Health Informatics Research Group, McGill University, 1140 Pine Avenue West, Montreal, QC H3A 1A3, Canada
| | - David L Buckeridge
- Clinical and Health Informatics Research Group, McGill University, 1140 Pine Avenue West, Montreal, QC H3A 1A3, Canada.
| |
Collapse
|
8
|
Zimmer C, Leuba SI, Yaesoubi R, Cohen T. Use of daily Internet search query data improves real-time projections of influenza epidemics. J R Soc Interface 2018; 15:20180220. [PMID: 30305417 PMCID: PMC6228485 DOI: 10.1098/rsif.2018.0220] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 09/11/2018] [Indexed: 01/25/2023] Open
Abstract
Seasonal influenza causes millions of illnesses and tens of thousands of deaths per year in the USA alone. While the morbidity and mortality associated with influenza is substantial each year, the timing and magnitude of epidemics are highly variable which complicates efforts to anticipate demands on the healthcare system. Better methods to forecast influenza activity would help policymakers anticipate such stressors. The US Centers for Disease Control and Prevention (CDC) has recognized the importance of improving influenza forecasting and hosts an annual challenge for predicting influenza-like illness (ILI) activity in the USA. The CDC data serve as the reference for ILI in the USA, but this information is aggregated by epidemiological week and reported after a one-week delay (and may be subject to correction even after this reporting lag). Therefore, there has been substantial interest in whether real-time Internet search data, such as Google, Twitter or Wikipedia could be used to improve influenza forecasting. In this study, we combine a previously developed calibration and prediction framework with an established humidity-based transmission dynamic model to forecast influenza. We then compare predictions based on only CDC ILI data with predictions that leverage the earlier availability and finer temporal resolution of Wikipedia search data. We find that both the earlier availability and the finer temporal resolution are important for increasing forecasting performance. Using daily Wikipedia search data leads to a marked improvement in prediction performance compared to weekly data especially for a three- to four-week forecasting horizon.
Collapse
Affiliation(s)
- Christoph Zimmer
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
- Bosch Center for Artificial Intelligence, Robert Bosch GmbH, Renningen, Germany
| | - Sequoia I Leuba
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| | - Reza Yaesoubi
- Health Policy and Management, Yale School of Public Health, New Haven, CT, USA
| | - Ted Cohen
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
9
|
Influenza detection and prediction algorithms: comparative accuracy trial in Östergötland county, Sweden, 2008–2012. Epidemiol Infect 2017; 145:2166-2175. [DOI: 10.1017/s0950268817001005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
SUMMARYMethods for the detection of influenza epidemics and prediction of their progress have seldom been comparatively evaluated using prospective designs. This study aimed to perform a prospective comparative trial of algorithms for the detection and prediction of increased local influenza activity. Data on clinical influenza diagnoses recorded by physicians and syndromic data from a telenursing service were used. Five detection and three prediction algorithms previously evaluated in public health settings were calibrated and then evaluated over 3 years. When applied on diagnostic data, only detection using the Serfling regression method and prediction using the non-adaptive log-linear regression method showed acceptable performances during winter influenza seasons. For the syndromic data, none of the detection algorithms displayed a satisfactory performance, while non-adaptive log-linear regression was the best performing prediction method. We conclude that evidence was found for that available algorithms for influenza detection and prediction display satisfactory performance when applied on local diagnostic data during winter influenza seasons. When applied on local syndromic data, the evaluated algorithms did not display consistent performance. Further evaluations and research on combination of methods of these types in public health information infrastructures for ‘nowcasting’ (integrated detection and prediction) of influenza activity are warranted.
Collapse
|
10
|
Tabataba FS, Chakraborty P, Ramakrishnan N, Venkatramanan S, Chen J, Lewis B, Marathe M. A framework for evaluating epidemic forecasts. BMC Infect Dis 2017; 17:345. [PMID: 28506278 PMCID: PMC5433189 DOI: 10.1186/s12879-017-2365-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 03/29/2017] [Indexed: 11/16/2022] Open
Abstract
Background Over the past few decades, numerous forecasting methods have been proposed in the field of epidemic forecasting. Such methods can be classified into different categories such as deterministic vs. probabilistic, comparative methods vs. generative methods, and so on. In some of the more popular comparative methods, researchers compare observed epidemiological data from the early stages of an outbreak with the output of proposed models to forecast the future trend and prevalence of the pandemic. A significant problem in this area is the lack of standard well-defined evaluation measures to select the best algorithm among different ones, as well as for selecting the best possible configuration for a particular algorithm. Results In this paper we present an evaluation framework which allows for combining different features, error measures, and ranking schema to evaluate forecasts. We describe the various epidemic features (Epi-features) included to characterize the output of forecasting methods and provide suitable error measures that could be used to evaluate the accuracy of the methods with respect to these Epi-features. We focus on long-term predictions rather than short-term forecasting and demonstrate the utility of the framework by evaluating six forecasting methods for predicting influenza in the United States. Our results demonstrate that different error measures lead to different rankings even for a single Epi-feature. Further, our experimental analyses show that no single method dominates the rest in predicting all Epi-features when evaluated across error measures. As an alternative, we provide various Consensus Ranking schema that summarize individual rankings, thus accounting for different error measures. Since each Epi-feature presents a different aspect of the epidemic, multiple methods need to be combined to provide a comprehensive forecast. Thus we call for a more nuanced approach while evaluating epidemic forecasts and we believe that a comprehensive evaluation framework, as presented in this paper, will add value to the computational epidemiology community. Electronic supplementary material The online version of this article (doi:10.1186/s12879-017-2365-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Farzaneh Sadat Tabataba
- Computer Science Department, Virginia Tech, 2202 Kraft Drive, Blacksburg/Virginia, 24060, USA. .,Network Dynamics and Simulation Science Laboratory (NDSSL), Biocomplexity Institute, Virginia Tech, 1015 Life Science Cir, Blacksburg/Virginia, 24061, USA.
| | - Prithwish Chakraborty
- Computer Science Department, Virginia Tech, 2202 Kraft Drive, Blacksburg/Virginia, 24060, USA
| | - Naren Ramakrishnan
- Computer Science Department, Virginia Tech, 2202 Kraft Drive, Blacksburg/Virginia, 24060, USA.,Network Dynamics and Simulation Science Laboratory (NDSSL), Biocomplexity Institute, Virginia Tech, 1015 Life Science Cir, Blacksburg/Virginia, 24061, USA
| | - Srinivasan Venkatramanan
- Network Dynamics and Simulation Science Laboratory (NDSSL), Biocomplexity Institute, Virginia Tech, 1015 Life Science Cir, Blacksburg/Virginia, 24061, USA
| | - Jiangzhuo Chen
- Network Dynamics and Simulation Science Laboratory (NDSSL), Biocomplexity Institute, Virginia Tech, 1015 Life Science Cir, Blacksburg/Virginia, 24061, USA
| | - Bryan Lewis
- Network Dynamics and Simulation Science Laboratory (NDSSL), Biocomplexity Institute, Virginia Tech, 1015 Life Science Cir, Blacksburg/Virginia, 24061, USA
| | - Madhav Marathe
- Computer Science Department, Virginia Tech, 2202 Kraft Drive, Blacksburg/Virginia, 24060, USA.,Network Dynamics and Simulation Science Laboratory (NDSSL), Biocomplexity Institute, Virginia Tech, 1015 Life Science Cir, Blacksburg/Virginia, 24061, USA
| |
Collapse
|
11
|
Zeng Z, Jiang X, Neapolitan R. Discovering causal interactions using Bayesian network scoring and information gain. BMC Bioinformatics 2016; 17:221. [PMID: 27230078 PMCID: PMC4880828 DOI: 10.1186/s12859-016-1084-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Accepted: 05/14/2016] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The problem of learning causal influences from data has recently attracted much attention. Standard statistical methods can have difficulty learning discrete causes, which interacting to affect a target, because the assumptions in these methods often do not model discrete causal relationships well. An important task then is to learn such interactions from data. Motivated by the problem of learning epistatic interactions from datasets developed in genome-wide association studies (GWAS), researchers conceived new methods for learning discrete interactions. However, many of these methods do not differentiate a model representing a true interaction from a model representing non-interacting causes with strong individual affects. The recent algorithm MBS-IGain addresses this difficulty by using Bayesian network learning and information gain to discover interactions from high-dimensional datasets. However, MBS-IGain requires marginal effects to detect interactions containing more than two causes. If the dataset is not high-dimensional, we can avoid this shortcoming by doing an exhaustive search. RESULTS We develop Exhaustive-IGain, which is like MBS-IGain but does an exhaustive search. We compare the performance of Exhaustive-IGain to MBS-IGain using low-dimensional simulated datasets based on interactions with marginal effects and ones based on interactions without marginal effects. Their performance is similar on the datasets based on marginal effects. However, Exhaustive-IGain compellingly outperforms MBS-IGain on the datasets based on 3 and 4-cause interactions without marginal effects. We apply Exhaustive-IGain to investigate how clinical variables interact to affect breast cancer survival, and obtain results that agree with judgements of a breast cancer oncologist. CONCLUSIONS We conclude that the combined use of information gain and Bayesian network scoring enables us to discover higher order interactions with no marginal effects if we perform an exhaustive search. We further conclude that Exhaustive-IGain can be effective when applied to real data.
Collapse
Affiliation(s)
- Zexian Zeng
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Xia Jiang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Richard Neapolitan
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
| |
Collapse
|
12
|
Abstract
OBJECTIVES Reliable monitoring of influenza seasons and pandemic outbreaks is essential for response planning, but compilations of reports on detection and prediction algorithm performance in influenza control practice are largely missing. The aim of this study is to perform a metanarrative review of prospective evaluations of influenza outbreak detection and prediction algorithms restricted settings where authentic surveillance data have been used. DESIGN The study was performed as a metanarrative review. An electronic literature search was performed, papers selected and qualitative and semiquantitative content analyses were conducted. For data extraction and interpretations, researcher triangulation was used for quality assurance. RESULTS Eight prospective evaluations were found that used authentic surveillance data: three studies evaluating detection and five studies evaluating prediction. The methodological perspectives and experiences from the evaluations were found to have been reported in narrative formats representing biodefence informatics and health policy research, respectively. The biodefence informatics narrative having an emphasis on verification of technically and mathematically sound algorithms constituted a large part of the reporting. Four evaluations were reported as health policy research narratives, thus formulated in a manner that allows the results to qualify as policy evidence. CONCLUSIONS Awareness of the narrative format in which results are reported is essential when interpreting algorithm evaluations from an infectious disease control practice perspective.
Collapse
Affiliation(s)
- A Spreco
- Department of Medical and Health Sciences, Linköping University, Linköping, Sweden
| | - T Timpka
- Department of Medical and Health Sciences, Linköping University, Linköping, Sweden
- Unit for Health Analysis, Centre for Healthcare Development, Region Östergötland, Linköping, Sweden
| |
Collapse
|
13
|
Jiang X, Jao J, Neapolitan R. Learning Predictive Interactions Using Information Gain and Bayesian Network Scoring. PLoS One 2015; 10:e0143247. [PMID: 26624895 PMCID: PMC4666609 DOI: 10.1371/journal.pone.0143247] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 11/02/2015] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND The problems of correlation and classification are long-standing in the fields of statistics and machine learning, and techniques have been developed to address these problems. We are now in the era of high-dimensional data, which is data that can concern billions of variables. These data present new challenges. In particular, it is difficult to discover predictive variables, when each variable has little marginal effect. An example concerns Genome-wide Association Studies (GWAS) datasets, which involve millions of single nucleotide polymorphism (SNPs), where some of the SNPs interact epistatically to affect disease status. Towards determining these interacting SNPs, researchers developed techniques that addressed this specific problem. However, the problem is more general, and so these techniques are applicable to other problems concerning interactions. A difficulty with many of these techniques is that they do not distinguish whether a learned interaction is actually an interaction or whether it involves several variables with strong marginal effects. METHODOLOGY/FINDINGS We address this problem using information gain and Bayesian network scoring. First, we identify candidate interactions by determining whether together variables provide more information than they do separately. Then we use Bayesian network scoring to see if a candidate interaction really is a likely model. Our strategy is called MBS-IGain. Using 100 simulated datasets and a real GWAS Alzheimer's dataset, we investigated the performance of MBS-IGain. CONCLUSIONS/SIGNIFICANCE When analyzing the simulated datasets, MBS-IGain substantially out-performed nine previous methods at locating interacting predictors, and at identifying interactions exactly. When analyzing the real Alzheimer's dataset, we obtained new results and results that substantiated previous findings. We conclude that MBS-IGain is highly effective at finding interactions in high-dimensional datasets. This result is significant because we have increasingly abundant high-dimensional data in many domains, and to learn causes and perform prediction/classification using these data, we often must first identify interactions.
Collapse
Affiliation(s)
- Xia Jiang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15213, United States of America
| | - Jeremy Jao
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15213, United States of America
| | - Richard Neapolitan
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, United States of America
| |
Collapse
|
14
|
Zamiri A, Yazdi HS, Goli SA. Temporal and spatial monitoring and prediction of epidemic outbreaks. IEEE J Biomed Health Inform 2014; 19:735-44. [PMID: 25122846 PMCID: PMC7186040 DOI: 10.1109/jbhi.2014.2338213] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper introduces a nonlinear dynamic model to study spatial and temporal dynamics of epidemics of susceptible-infected-removed type. It involves modeling the respective collections of epidemic states and syndromic observations as random finite sets. Each epidemic state consists of the number of infected individuals in an isolated population system and the corresponding partially known parameters of the epidemic model. The infectious disease could spread between population systems with known probabilities based on prior knowledge of ecological and biological features of the environment. The problem is then formulated in the context of Bayesian framework and estimated via a probability hypothesis density filter. Each population system under surveillance is assumed to be homogenous and fixed, with daily reports on the number of infected people available for monitoring and prediction. When model parameters are partially known, results of numerical studies indicate that the proposed approach can help early prediction of the epidemic in terms of peak and duration.
Collapse
|
15
|
Chretien JP, George D, Shaman J, Chitale RA, McKenzie FE. Influenza forecasting in human populations: a scoping review. PLoS One 2014; 9:e94130. [PMID: 24714027 PMCID: PMC3979760 DOI: 10.1371/journal.pone.0094130] [Citation(s) in RCA: 110] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Accepted: 03/12/2014] [Indexed: 11/18/2022] Open
Abstract
Forecasts of influenza activity in human populations could help guide key preparedness tasks. We conducted a scoping review to characterize these methodological approaches and identify research gaps. Adapting the PRISMA methodology for systematic reviews, we searched PubMed, CINAHL, Project Euclid, and Cochrane Database of Systematic Reviews for publications in English since January 1, 2000 using the terms “influenza AND (forecast* OR predict*)”, excluding studies that did not validate forecasts against independent data or incorporate influenza-related surveillance data from the season or pandemic for which the forecasts were applied. We included 35 publications describing population-based (N = 27), medical facility-based (N = 4), and regional or global pandemic spread (N = 4) forecasts. They included areas of North America (N = 15), Europe (N = 14), and/or Asia-Pacific region (N = 4), or had global scope (N = 3). Forecasting models were statistical (N = 18) or epidemiological (N = 17). Five studies used data assimilation methods to update forecasts with new surveillance data. Models used virological (N = 14), syndromic (N = 13), meteorological (N = 6), internet search query (N = 4), and/or other surveillance data as inputs. Forecasting outcomes and validation metrics varied widely. Two studies compared distinct modeling approaches using common data, 2 assessed model calibration, and 1 systematically incorporated expert input. Of the 17 studies using epidemiological models, 8 included sensitivity analysis. This review suggests need for use of good practices in influenza forecasting (e.g., sensitivity analysis); direct comparisons of diverse approaches; assessment of model calibration; integration of subjective expert input; operational research in pilot, real-world applications; and improved mutual understanding among modelers and public health officials.
Collapse
Affiliation(s)
- Jean-Paul Chretien
- Division of Integrated Biosurveillance, Armed Forces Health Surveillance Center, Silver Spring, Maryland, United States of America
- * E-mail:
| | - Dylan George
- Division of Analytic Decision Support, Biomedical Advanced Research and Development Authority, Department of Health and Human Services, Washington, DC, United States of America
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York, United States of America
| | - Rohit A. Chitale
- Division of Integrated Biosurveillance, Armed Forces Health Surveillance Center, Silver Spring, Maryland, United States of America
| | - F. Ellis McKenzie
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
16
|
Nsoesie EO, Brownstein JS, Ramakrishnan N, Marathe MV. A systematic review of studies on forecasting the dynamics of influenza outbreaks. Influenza Other Respir Viruses 2013; 8:309-16. [PMID: 24373466 PMCID: PMC4181479 DOI: 10.1111/irv.12226] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2013] [Indexed: 12/24/2022] Open
Abstract
Forecasting the dynamics of influenza outbreaks could be useful for decision-making regarding the allocation of public health resources. Reliable forecasts could also aid in the selection and implementation of interventions to reduce morbidity and mortality due to influenza illness. This paper reviews methods for influenza forecasting proposed during previous influenza outbreaks and those evaluated in hindsight. We discuss the various approaches, in addition to the variability in measures of accuracy and precision of predicted measures. PubMed and Google Scholar searches for articles on influenza forecasting retrieved sixteen studies that matched the study criteria. We focused on studies that aimed at forecasting influenza outbreaks at the local, regional, national, or global level. The selected studies spanned a wide range of regions including USA, Sweden, Hong Kong, Japan, Singapore, United Kingdom, Canada, France, and Cuba. The methods were also applied to forecast a single measure or multiple measures. Typical measures predicted included peak timing, peak height, daily/weekly case counts, and outbreak magnitude. Due to differences in measures used to assess accuracy, a single estimate of predictive error for each of the measures was difficult to obtain. However, collectively, the results suggest that these diverse approaches to influenza forecasting are capable of capturing specific outbreak measures with some degree of accuracy given reliable data and correct disease assumptions. Nonetheless, several of these approaches need to be evaluated and their performance quantified in real-time predictions.
Collapse
Affiliation(s)
- Elaine O Nsoesie
- Children's Hospital Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA; Network Dynamics and Simulation Science Laboratory, Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA
| | | | | | | |
Collapse
|
17
|
Optimizing provider recruitment for influenza surveillance networks. PLoS Comput Biol 2012; 8:e1002472. [PMID: 22511860 PMCID: PMC3325176 DOI: 10.1371/journal.pcbi.1002472] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2011] [Accepted: 02/29/2012] [Indexed: 12/24/2022] Open
Abstract
The increasingly complex and rapid transmission dynamics of many infectious diseases necessitates the use of new, more advanced methods for surveillance, early detection, and decision-making. Here, we demonstrate that a new method for optimizing surveillance networks can improve the quality of epidemiological information produced by typical provider-based networks. Using past surveillance and Internet search data, it determines the precise locations where providers should be enrolled. When applied to redesigning the provider-based, influenza-like-illness surveillance network (ILINet) for the state of Texas, the method identifies networks that are expected to significantly outperform the existing network with far fewer providers. This optimized network avoids informational redundancies and is thereby more effective than networks designed by conventional methods and a recently published algorithm based on maximizing population coverage. We show further that Google Flu Trends data, when incorporated into a network as a virtual provider, can enhance but not replace traditional surveillance methods. Public health agencies use surveillance systems to detect and monitor chronic and infectious diseases. These systems often rely on data sources that are chosen based on loose guidelines or out of convenience. In this paper, we introduce a new, data-driven method for designing and improving surveillance systems. Our approach is a geographic optimization of data sources designed to achieve specific surveillance goals. We tested our method by re-designing Texas' provider-based influenza surveillance system (ILINet). The resulting networks better predicted influenza associated hospitalizations and contained fewer providers than the existing ILINet. Furthermore, our study demonstrates that the integration of Internet source data, like Google Flu Trends, into surveillance systems can enhance traditional, provider-based networks.
Collapse
|
18
|
Abstract
Classification methods are widely used for identifying underlying groupings within datasets and predicting the class for new data objects given a trained classifier. This study introduces a project aimed at using a combination of simulations and classification techniques to predict epidemic curves and infer underlying disease parameters for an ongoing outbreak.Six supervised classification methods (random forest, support vector machines, nearest neighbor with three decision rules, linear and flexible discriminant analysis) were used in identifying partial epidemic curves from six agent-based stochastic simulations of influenza epidemics. The accuracy of the methods was compared using a performance metric based on the McNemar test.The findings showed that: (1) assumptions made by the methods regarding the structure of an epidemic curve influences their performance i.e. methods with fewer assumptions perform best, (2) the performance of most methods is consistent across different individual-based networks for Seattle, Los Angeles and New York and (3) combining classifiers using a weighting approach does not guarantee better prediction.
Collapse
|
19
|
Zhao D, Weng C. Combining PubMed knowledge and EHR data to develop a weighted bayesian network for pancreatic cancer prediction. J Biomed Inform 2011; 44:859-68. [PMID: 21642013 DOI: 10.1016/j.jbi.2011.05.004] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2011] [Revised: 05/12/2011] [Accepted: 05/13/2011] [Indexed: 02/06/2023]
Abstract
In this paper, we propose a novel method that combines PubMed knowledge and Electronic Health Records to develop a weighted Bayesian Network Inference (BNI) model for pancreatic cancer prediction. We selected 20 common risk factors associated with pancreatic cancer and used PubMed knowledge to weigh the risk factors. A keyword-based algorithm was developed to extract and classify PubMed abstracts into three categories that represented positive, negative, or neutral associations between each risk factor and pancreatic cancer. Then we designed a weighted BNI model by adding the normalized weights into a conventional BNI model. We used this model to extract the EHR values for patients with or without pancreatic cancer, which then enabled us to calculate the prior probabilities for the 20 risk factors in the BNI. The software iDiagnosis was designed to use this weighted BNI model for predicting pancreatic cancer. In an evaluation using a case-control dataset, the weighted BNI model significantly outperformed the conventional BNI and two other classifiers (k-Nearest Neighbor and Support Vector Machine). We conclude that the weighted BNI using PubMed knowledge and EHR data shows remarkable accuracy improvement over existing representative methods for pancreatic cancer prediction.
Collapse
Affiliation(s)
- Di Zhao
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
20
|
From Ontology Selection and Semantic Web to an Integrated Information System for Food-borne Diseases and Food Safety. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2011; 696:741-50. [DOI: 10.1007/978-1-4419-7046-6_76] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|