1
|
Wu H, Zhang H, Karwath A, Ibrahim Z, Shi T, Zhang X, Wang K, Sun J, Dhaliwal K, Bean D, Cardoso VR, Li K, Teo JT, Banerjee A, Gao-Smith F, Whitehouse T, Veenith T, Gkoutos GV, Wu X, Dobson R, Guthrie B. Ensemble learning for poor prognosis predictions: A case study on SARS-CoV-2. J Am Med Inform Assoc 2021; 28:791-800. [PMID: 33185672 PMCID: PMC7717299 DOI: 10.1093/jamia/ocaa295] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 11/11/2020] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Risk prediction models are widely used to inform evidence-based clinical decision making. However, few models developed from single cohorts can perform consistently well at population level where diverse prognoses exist (such as the SARS-CoV-2 [severe acute respiratory syndrome coronavirus 2] pandemic). This study aims at tackling this challenge by synergizing prediction models from the literature using ensemble learning. MATERIALS AND METHODS In this study, we selected and reimplemented 7 prediction models for COVID-19 (coronavirus disease 2019) that were derived from diverse cohorts and used different implementation techniques. A novel ensemble learning framework was proposed to synergize them for realizing personalized predictions for individual patients. Four diverse international cohorts (2 from the United Kingdom and 2 from China; N = 5394) were used to validate all 8 models on discrimination, calibration, and clinical usefulness. RESULTS Results showed that individual prediction models could perform well on some cohorts while poorly on others. Conversely, the ensemble model achieved the best performances consistently on all metrics quantifying discrimination, calibration, and clinical usefulness. Performance disparities were observed in cohorts from the 2 countries: all models achieved better performances on the China cohorts. DISCUSSION When individual models were learned from complementary cohorts, the synergized model had the potential to achieve better performances than any individual model. Results indicate that blood parameters and physiological measurements might have better predictive powers when collected early, which remains to be confirmed by further studies. CONCLUSIONS Combining a diverse set of individual prediction models, the ensemble method can synergize a robust and well-performing model by choosing the most competent ones for individual patients.
Collapse
Affiliation(s)
- Honghan Wu
- Institute of Health Informatics, University College London,
London, United Kingdom
- Health Data Research UK, University College London, London,
United Kingdom
| | - Huayu Zhang
- Centre for Medical Informatics, Usher Institute, University of
Edinburgh, Edinburgh, United Kingdom
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of
Birmingham, Birmingham, United Kingdom
- Health Data Research UK, University of Birmingham, Birmingham,
United Kingdom
| | - Zina Ibrahim
- Health Data Research UK, University College London, London,
United Kingdom
- Department of Biostatistics and Health Informatics, Institute of Psychiatry,
Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Ting Shi
- Centre for Global Health, Usher Institute, University of
Edinburgh, Edinburgh, United Kingdom
| | - Xin Zhang
- Department of Pulmonary and Critical Care Medicine, People’s Liberation Army
Joint Logistic Support Force 920th Hospital, Kunming, China
| | - Kun Wang
- Department of Pulmonary and Critical Care Medicine, Shanghai East Hospital,
Tongji University, Shanghai, China
| | - Jiaxing Sun
- Department of Pulmonary and Critical Care Medicine, Shanghai East Hospital,
Tongji University, Shanghai, China
| | - Kevin Dhaliwal
- Centre for Inflammation Research, Queens Medical Research Institute, University
of Edinburgh, Edinburgh, United
Kingdom
| | - Daniel Bean
- Department of Biostatistics and Health Informatics, Institute of Psychiatry,
Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Victor Roth Cardoso
- Institute of Cancer and Genomic Sciences, University of
Birmingham, Birmingham, United Kingdom
- Health Data Research UK, University of Birmingham, Birmingham,
United Kingdom
| | - Kezhi Li
- Institute of Health Informatics, University College London,
London, United Kingdom
| | - James T Teo
- Department of Stroke and Neurology, King’s College Hospital NHS Foundation
Trust, London, United Kingdom
| | - Amitava Banerjee
- Institute of Health Informatics, University College London,
London, United Kingdom
| | - Fang Gao-Smith
- Department of Intensive Care Medicine, Queen Elizabeth Hospital
Birmingham, Birmingham, United Kingdom
- Birmingham Acute Care Research, University of Birmingham,
Birmingham, United Kingdom
| | - Tony Whitehouse
- Department of Intensive Care Medicine, Queen Elizabeth Hospital
Birmingham, Birmingham, United Kingdom
- Birmingham Acute Care Research, University of Birmingham,
Birmingham, United Kingdom
| | - Tonny Veenith
- Department of Intensive Care Medicine, Queen Elizabeth Hospital
Birmingham, Birmingham, United Kingdom
- Birmingham Acute Care Research, University of Birmingham,
Birmingham, United Kingdom
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of
Birmingham, Birmingham, United Kingdom
- Health Data Research UK, University of Birmingham, Birmingham,
United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham NHS
Foundation Trust, Birmingham, United
Kingdom
| | - Xiaodong Wu
- Department of Pulmonary and Critical Care Medicine, Shanghai East Hospital,
Tongji University, Shanghai, China
- Department of Pulmonary and Critical Care Medicine, Taikang Tongji
Hospital, Wuhan, China
| | - Richard Dobson
- Institute of Health Informatics, University College London,
London, United Kingdom
- Health Data Research UK, University College London, London,
United Kingdom
- Department of Biostatistics and Health Informatics, Institute of Psychiatry,
Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Bruce Guthrie
- Centre for Population Health Sciences, Usher Institute, University of
Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|