1
|
Leiner J, Pellissier V, König S, Hohenstein S, Ueberham L, Nachtigall I, Meier-Hellmann A, Kuhlen R, Hindricks G, Bollmann A. Machine learning-derived prediction of in-hospital mortality in patients with severe acute respiratory infection: analysis of claims data from the German-wide Helios hospital network. Respir Res 2022; 23:264. [PMID: 36151525 PMCID: PMC9502925 DOI: 10.1186/s12931-022-02180-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 09/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Severe acute respiratory infections (SARI) are the most common infectious causes of death. Previous work regarding mortality prediction models for SARI using machine learning (ML) algorithms that can be useful for both individual risk stratification and quality of care assessment is scarce. We aimed to develop reliable models for mortality prediction in SARI patients utilizing ML algorithms and compare its performances with a classic regression analysis approach. METHODS Administrative data (dataset randomly split 75%/25% for model training/testing) from years 2016-2019 of 86 German Helios hospitals was retrospectively analyzed. Inpatient SARI cases were defined by ICD-codes J09-J22. Three ML algorithms were evaluated and its performance compared to generalized linear models (GLM) by computing receiver operating characteristic area under the curve (AUC) and area under the precision-recall curve (AUPRC). RESULTS The dataset contained 241,988 inpatient SARI cases (75 years or older: 49%; male 56.2%). In-hospital mortality was 11.6%. AUC and AUPRC in the testing dataset were 0.83 and 0.372 for GLM, 0.831 and 0.384 for random forest (RF), 0.834 and 0.382 for single layer neural network (NNET) and 0.834 and 0.389 for extreme gradient boosting (XGBoost). Statistical comparison of ROC AUCs revealed a better performance of NNET and XGBoost as compared to GLM. CONCLUSION ML algorithms for predicting in-hospital mortality were trained and tested on a large real-world administrative dataset of SARI patients and showed good discriminatory performances. Broad application of our models in clinical routine practice can contribute to patients' risk assessment and quality management.
Collapse
Affiliation(s)
- Johannes Leiner
- Department of Electrophysiology, Heart Center Leipzig at University of Leipzig, Leipzig, Germany. .,Real World Evidence and Health Technology Assessment, Helios Health Institute, Berlin, Germany.
| | - Vincent Pellissier
- Real World Evidence and Health Technology Assessment, Helios Health Institute, Berlin, Germany
| | - Sebastian König
- Department of Electrophysiology, Heart Center Leipzig at University of Leipzig, Leipzig, Germany.,Real World Evidence and Health Technology Assessment, Helios Health Institute, Berlin, Germany
| | - Sven Hohenstein
- Real World Evidence and Health Technology Assessment, Helios Health Institute, Berlin, Germany
| | - Laura Ueberham
- Clinic for Cardiology, University Hospital Leipzig, Leipzig, Germany
| | - Irit Nachtigall
- Department of Infectious Diseases and Infection Prevention, Helios Hospital Emil-von-Behring, Berlin, Germany.,Institute of Hygiene and Environmental Medicine, Charité - Universitaetsmedizin Berlin, Berlin, Germany
| | | | | | - Gerhard Hindricks
- Department of Electrophysiology, Heart Center Leipzig at University of Leipzig, Leipzig, Germany
| | - Andreas Bollmann
- Department of Electrophysiology, Heart Center Leipzig at University of Leipzig, Leipzig, Germany.,Real World Evidence and Health Technology Assessment, Helios Health Institute, Berlin, Germany
| |
Collapse
|
2
|
Alakus TB, Turkoglu I. Comparison of deep learning approaches to predict COVID-19 infection. CHAOS, SOLITONS, AND FRACTALS 2020; 140:110120. [PMID: 33519109 PMCID: PMC7833512 DOI: 10.1016/j.chaos.2020.110120] [Citation(s) in RCA: 107] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 07/10/2020] [Indexed: 05/05/2023]
Abstract
The SARS-CoV2 virus, which causes COVID-19 (coronavirus disease) has become a pandemic and has expanded all over the world. Because of increasing number of cases day by day, it takes time to interpret the laboratory findings thus the limitations in terms of both treatment and findings are emerged. Due to such limitations, the need for clinical decisions making system with predictive algorithms has arisen. Predictive algorithms could potentially ease the strain on healthcare systems by identifying the diseases. In this study, we perform clinical predictive models that estimate, using deep learning and laboratory data, which patients are likely to receive a COVID-19 disease. To evaluate the predictive performance of our models, precision, F1-score, recall, AUC, and accuracy scores calculated. Models were tested with 18 laboratory findings from 600 patients and validated with 10 fold cross-validation and train-test split approaches. The experimental results indicate that our predictive models identify patients that have COVID-19 disease at an accuracy of 86.66%, F1-score of 91.89%, precision of 86.75%, recall of 99.42%, and AUC of 62.50%. It is observed that predictive models trained on laboratory findings could be used to predict COVID-19 infection, and can be helpful for medical experts to prioritize the resources correctly. Our models (available at (https://github.com/burakalakuss/COVID-19-Clinical)) can be employed to assists medical experts in validating their initial laboratory findings, and can also be used for clinical prediction studies.
Collapse
Affiliation(s)
- Talha Burak Alakus
- Kirklareli University, Engineering Faculty, Department of Software Engineering, Kirklareli, 39000, Turkey
| | - Ibrahim Turkoglu
- Firat University, Technology Faculty, Department of Software Engineering, Elazig, 23119, Turkey
| |
Collapse
|
3
|
Schwab P, DuMont Schütte A, Dietz B, Bauer S. Clinical Predictive Models for COVID-19: Systematic Study. J Med Internet Res 2020; 22:e21439. [PMID: 32976111 PMCID: PMC7541040 DOI: 10.2196/21439] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 08/30/2020] [Accepted: 09/14/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND COVID-19 is a rapidly emerging respiratory disease caused by SARS-CoV-2. Due to the rapid human-to-human transmission of SARS-CoV-2, many health care systems are at risk of exceeding their health care capacities, in particular in terms of SARS-CoV-2 tests, hospital and intensive care unit (ICU) beds, and mechanical ventilators. Predictive algorithms could potentially ease the strain on health care systems by identifying those who are most likely to receive a positive SARS-CoV-2 test, be hospitalized, or admitted to the ICU. OBJECTIVE The aim of this study is to develop, study, and evaluate clinical predictive models that estimate, using machine learning and based on routinely collected clinical data, which patients are likely to receive a positive SARS-CoV-2 test or require hospitalization or intensive care. METHODS Using a systematic approach to model development and optimization, we trained and compared various types of machine learning models, including logistic regression, neural networks, support vector machines, random forests, and gradient boosting. To evaluate the developed models, we performed a retrospective evaluation on demographic, clinical, and blood analysis data from a cohort of 5644 patients. In addition, we determined which clinical features were predictive to what degree for each of the aforementioned clinical tasks using causal explanations. RESULTS Our experimental results indicate that our predictive models identified patients that test positive for SARS-CoV-2 a priori at a sensitivity of 75% (95% CI 67%-81%) and a specificity of 49% (95% CI 46%-51%), patients who are SARS-CoV-2 positive that require hospitalization with 0.92 area under the receiver operator characteristic curve (AUC; 95% CI 0.81-0.98), and patients who are SARS-CoV-2 positive that require critical care with 0.98 AUC (95% CI 0.95-1.00). CONCLUSIONS Our results indicate that predictive models trained on routinely collected clinical data could be used to predict clinical pathways for COVID-19 and, therefore, help inform care and prioritize resources.
Collapse
Affiliation(s)
| | | | - Benedikt Dietz
- Eidgenössische Technische Hochschule Zürich, Zürich, Switzerland
| | - Stefan Bauer
- Max Planck Institute for Intelligent Systems, Tübingen, Germany
| |
Collapse
|
4
|
Tuti T, Agweyu A, Mwaniki P, Peek N, English M. An exploration of mortality risk factors in non-severe pneumonia in children using clinical data from Kenya. BMC Med 2017; 15:201. [PMID: 29129186 PMCID: PMC5682642 DOI: 10.1186/s12916-017-0963-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 10/19/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Childhood pneumonia is the leading infectious cause of mortality in children younger than 5 years old. Recent updates to World Health Organization pneumonia guidelines recommend outpatient care for a population of children previously classified as high risk. This revision has been challenged by policymakers in Africa, where mortality related to pneumonia is higher than in other regions and often complicated by comorbidities. This study aimed to identify factors that best discriminate inpatient mortality risk in non-severe pneumonia and explore whether these factors offer any added benefit over the current criteria used to identify children with pneumonia requiring inpatient care. METHODS We undertook a retrospective cohort study of children aged 2-59 months admitted with a clinical diagnosis of pneumonia at 14 public hospitals in Kenya between February 2014 and February 2016. Using machine learning techniques, we analysed whether clinical characteristics and common comorbidities increased the risk of inpatient mortality for non-severe pneumonia. The topmost risk factors were subjected to decision curve analysis to explore if using them as admission criteria had any net benefit above the current criteria. RESULTS Out of 16,162 children admitted with pneumonia during the study period, 10,687 were eligible for subsequent analysis. Inpatient mortality within this non-severe group was 252/10,687 (2.36%). Models demonstrated moderately good performance; the partial least squares discriminant analysis model had higher sensitivity for predicting mortality in comparison to logistic regression. Elevated respiratory rate (≥70 bpm), age 2-11 months and weight-for-age Z-score (WAZ) < -3SD were highly discriminative of mortality. These factors ranked consistently across the different models. For a risk threshold probability of 7-14%, there is a net benefit to admitting the patient sub-populations with these features as additional criteria alongside those currently used to classify severe pneumonia. Of the population studied, 70.54% met at least one of these criteria. Sensitivity analyses indicated that the overall results were not significantly affected by variations in pneumonia severity classification criteria. CONCLUSIONS Children with non-severe pneumonia aged 2-11 months or with respiratory rate ≥ 70 bpm or very low WAZ experience risks of inpatient mortality comparable to severe pneumonia. Inpatient care is warranted in these high-risk groups of children.
Collapse
Affiliation(s)
- Timothy Tuti
- KEMRI - Wellcome Trust Research Programme, Nairobi, Kenya.
| | - Ambrose Agweyu
- KEMRI - Wellcome Trust Research Programme, Nairobi, Kenya
| | - Paul Mwaniki
- KEMRI - Wellcome Trust Research Programme, Nairobi, Kenya
| | - Niels Peek
- Centre for Health Informatics, Division of Informatics, Imaging & Data Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Science Centre, Manchester, UK.,NIHR Greater Manchester Primary Care Patient Safety Translational Research Centre, Manchester, UK
| | - Mike English
- KEMRI - Wellcome Trust Research Programme, Nairobi, Kenya.,Nuffield Department of Medicine, Oxford University, Oxford, UK
| | | |
Collapse
|
5
|
Naydenova E, Tsanas A, Howie S, Casals-Pascual C, De Vos M. The power of data mining in diagnosis of childhood pneumonia. J R Soc Interface 2016; 13:20160266. [PMID: 27466436 PMCID: PMC4971218 DOI: 10.1098/rsif.2016.0266] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 07/05/2016] [Indexed: 11/12/2022] Open
Abstract
Childhood pneumonia is the leading cause of death of children under the age of 5 years globally. Diagnostic information on the presence of infection, severity and aetiology (bacterial versus viral) is crucial for appropriate treatment. However, the derivation of such information requires advanced equipment (such as X-rays) and clinical expertise to correctly assess observational clinical signs (such as chest indrawing); both of these are often unavailable in resource-constrained settings. In this study, these challenges were addressed through the development of a suite of data mining tools, facilitating automated diagnosis through quantifiable features. Findings were validated on a large dataset comprising 780 children diagnosed with pneumonia and 801 age-matched healthy controls. Pneumonia was identified via four quantifiable vital signs (98.2% sensitivity and 97.6% specificity). Moreover, it was shown that severity can be determined through a combination of three vital signs and two lung sounds (72.4% sensitivity and 82.2% specificity); addition of a conventional biomarker (C-reactive protein) further improved severity predictions (89.1% sensitivity and 81.3% specificity). Finally, we demonstrated that aetiology can be determined using three vital signs and a newly proposed biomarker (lipocalin-2) (81.8% sensitivity and 90.6% specificity). These results suggest that a suite of carefully designed machine learning tools can be used to support multi-faceted diagnosis of childhood pneumonia in resource-constrained settings, compensating for the shortage of expensive equipment and highly trained clinicians.
Collapse
Affiliation(s)
- Elina Naydenova
- Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, UK
| | - Athanasios Tsanas
- Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, UK
| | - Stephen Howie
- Child Survival Theme, Medical Research Council Unit, Serrekunda, The Gambia
| | - Climent Casals-Pascual
- Nuffield Department of Medicine, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Maarten De Vos
- Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, UK
| |
Collapse
|
6
|
Guillame-Bert M, Dubrawski A, Wang D, Hravnak M, Clermont G, Pinsky MR. Learning temporal rules to forecast instability in continuously monitored patients. J Am Med Inform Assoc 2016; 24:47-53. [PMID: 27274020 DOI: 10.1093/jamia/ocw048] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Revised: 03/02/2016] [Accepted: 03/05/2016] [Indexed: 11/12/2022] Open
Abstract
Inductive machine learning, and in particular extraction of association rules from data, has been successfully used in multiple application domains, such as market basket analysis, disease prognosis, fraud detection, and protein sequencing. The appeal of rule extraction techniques stems from their ability to handle intricate problems yet produce models based on rules that can be comprehended by humans, and are therefore more transparent. Human comprehension is a factor that may improve adoption and use of data-driven decision support systems clinically via face validity. In this work, we explore whether we can reliably and informatively forecast cardiorespiratory instability (CRI) in step-down unit (SDU) patients utilizing data from continuous monitoring of physiologic vital sign (VS) measurements. We use a temporal association rule extraction technique in conjunction with a rule fusion protocol to learn how to forecast CRI in continuously monitored patients. We detail our approach and present and discuss encouraging empirical results obtained using continuous multivariate VS data from the bedside monitors of 297 SDU patients spanning 29 346 hours (3.35 patient-years) of observation. We present example rules that have been learned from data to illustrate potential benefits of comprehensibility of the extracted models, and we analyze the empirical utility of each VS as a potential leading indicator of an impending CRI event.
Collapse
Affiliation(s)
| | - Artur Dubrawski
- Robotics Institute, Auton Lab, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Donghan Wang
- Robotics Institute, Auton Lab, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Marilyn Hravnak
- Schools of Nursing and Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Gilles Clermont
- Schools of Nursing and Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Michael R Pinsky
- Schools of Nursing and Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
7
|
Aelvoet W, Terryn N, Blommaert A, Molenberghs G, Hens N, De Smet F, Callens M, Beutels P. Community-acquired pneumonia (CAP) hospitalizations and deaths: is there a role for quality improvement through inter-hospital comparisons? Int J Qual Health Care 2015; 28:22-32. [PMID: 26590376 DOI: 10.1093/intqhc/mzv092] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2015] [Indexed: 01/12/2023] Open
Abstract
OBJECTIVE To assess between-hospital variations in standardized in-hospital mortality ratios of community-acquired pneumonia (CAP), and identify possible leads for quality improvement. DESIGN We used an administrative database to estimate standardized in-hospital mortality ratios for 111 Belgian hospitals, by carrying out a set of hierarchical logistic regression models, intended to disentangle therapeutic attitudes and biases. To facilitate the detection of false-negative/positive results, we added an inconclusive zone to the funnel plots, derived from the results of the study. Data quality was validated by comparison with (i) alternative data from the largest Belgian Sickness Fund, (ii) published German hospital data and (iii) the results of an on-site audit. SETTING All Belgian hospital discharge records from 2004 to 2007. STUDY PARTICIPANTS A total of 111 776 adult patients were admitted for CAP. MAIN OUTCOME MEASURE Risk-adjusted standardized in-hospital mortality ratios. RESULTS Out of the 111 hospitals, we identified five and six outlying hospitals, with standardized mortality ratios of CAP consistently on the extremes of the distribution, as providing possibly better or worse care, respectively, and 18 other hospitals as having possible quality weaknesses/strengths. At the individuals' level of the analysis, adjusted odds ratios showed the paramount importance of old age, comorbidity and mechanical ventilation. The data compared well with the different validation sources. CONCLUSIONS Despite the limitations inherent to administrative data, it seemed possible to establish inter-hospital differences in standardized in-hospital mortality ratios of CAP and to identify leads for quality improvement. Monitoring is needed to assess progress in quality.
Collapse
Affiliation(s)
- W Aelvoet
- Federal Public Service (FPS) Health, Food Chain Safety and Environment, Eurostation Bloc II-First Floor-01D327, Place Victor Horta 40 bte 10, B-1060 Brussels, Belgium Vrije Universiteit Brussel, Faculteit Geneeskunde en Farmacie, Brussels, Belgium
| | - N Terryn
- Federal Public Service (FPS) Health, Food Chain Safety and Environment, Eurostation Bloc II-First Floor-01D327, Place Victor Horta 40 bte 10, B-1060 Brussels, Belgium
| | - A Blommaert
- Centre for Health Economics Research and Modeling Infectious Diseases (CHERMID), Vaccine and Infectious Disease Institute (WHO Collaborating Centre), University of Antwerp, Antwerp, Belgium
| | - G Molenberghs
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Universiteit Hasselt and KU Leuven, Belgium
| | - N Hens
- Centre for Health Economics Research and Modeling Infectious Diseases (CHERMID), Vaccine and Infectious Disease Institute (WHO Collaborating Centre), University of Antwerp, Antwerp, Belgium Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BIOSTAT), Hasselt University
| | - F De Smet
- National Alliance of Christian Mutualities, Brussels, Belgium Department of Public Health and Primary Care, Occupational, Environmental and Insurance Medicine, KU Leuven, Louvain, Belgium
| | - M Callens
- National Alliance of Christian Mutualities, Brussels, Belgium
| | - P Beutels
- Centre for Health Economics Research and Modeling Infectious Diseases (CHERMID), Vaccine and Infectious Disease Institute (WHO Collaborating Centre), University of Antwerp, Antwerp, Belgium School of Public Health and Community Medicine, The University of New South Wales, Sydney, Australia
| |
Collapse
|
8
|
Vaz de Melo POS. How Many Political Parties Should Brazil Have? A Data-Driven Method to Assess and Reduce Fragmentation in Multi-Party Political Systems. PLoS One 2015; 10:e0140217. [PMID: 26466365 PMCID: PMC4605521 DOI: 10.1371/journal.pone.0140217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 09/23/2015] [Indexed: 11/28/2022] Open
Abstract
In June 2013, Brazil faced the largest and most significant mass protests in a
generation. These were exacerbated by the population’s disenchantment towards its
highly fragmented party system, which is composed by a very large number of political
parties. Under these circumstances, presidents are constrained by informal coalition
governments, bringing very harmful consequences to the country. In this work I
propose ARRANGE, a
dAta
dRiven method
foRAssessing and
reduciNGparty fragmEntation
in a country. ARRANGE uses as input the roll call data for congress
votes on bills and amendments as a proxy for political preferences and ideology. With
that, ARRANGE finds the minimum number of parties required to house
all congressmen without decreasing party discipline. When applied to Brazil’s
historical roll call data, ARRANGE was able to generate 23 distinct
configurations that, compared with the status quo, have (i) a
significant smaller number of parties, (ii) a higher discipline of partisans towards
their parties and (iii) a more even distribution of partisans into parties.
ARRANGE is fast and parsimonious, relying on a single, intuitive
parameter.
Collapse
Affiliation(s)
- Pedro O. S. Vaz de Melo
- Computer Science Department, Universidade Federal de Minas Gerais, Belo
Horizonte, Minas Gerais, Brazil
- * E-mail:
| |
Collapse
|