1
|
Zhang Y, Liu B, Bunting KV, Brind D, Thorley A, Karwath A, Lu W, Zhou D, Wang X, Mobley AR, Tica O, Gkoutos GV, Kotecha D, Duan J. Development of automated neural network prediction for echocardiographic left ventricular ejection fraction. Front Med (Lausanne) 2024; 11:1354070. [PMID: 38686369 PMCID: PMC11057494 DOI: 10.3389/fmed.2024.1354070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/18/2024] [Indexed: 05/02/2024] Open
Abstract
Introduction The echocardiographic measurement of left ventricular ejection fraction (LVEF) is fundamental to the diagnosis and classification of patients with heart failure (HF). Methods This paper aimed to quantify LVEF automatically and accurately with the proposed pipeline method based on deep neural networks and ensemble learning. Within the pipeline, an Atrous Convolutional Neural Network (ACNN) was first trained to segment the left ventricle (LV), before employing the area-length formulation based on the ellipsoid single-plane model to calculate LVEF values. This formulation required inputs of LV area, derived from segmentation using an improved Jeffrey's method, as well as LV length, derived from a novel ensemble learning model. To further improve the pipeline's accuracy, an automated peak detection algorithm was used to identify end-diastolic and end-systolic frames, avoiding issues with human error. Subsequently, single-beat LVEF values were averaged across all cardiac cycles to obtain the final LVEF. Results This method was developed and internally validated in an open-source dataset containing 10,030 echocardiograms. The Pearson's correlation coefficient was 0.83 for LVEF prediction compared to expert human analysis (p < 0.001), with a subsequent area under the receiver operator curve (AUROC) of 0.98 (95% confidence interval 0.97 to 0.99) for categorisation of HF with reduced ejection (HFrEF; LVEF<40%). In an external dataset with 200 echocardiograms, this method achieved an AUC of 0.90 (95% confidence interval 0.88 to 0.91) for HFrEF assessment. Conclusion The automated neural network-based calculation of LVEF is comparable to expert clinicians performing time-consuming, frame-by-frame manual evaluations of cardiac systolic function.
Collapse
Affiliation(s)
- Yuting Zhang
- School of Computer Science, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| | - Boyang Liu
- Manchester University NHS Foundation Trust, Manchester, United Kingdom
| | - Karina V. Bunting
- Institute of Cardiovascular Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre and West Midlands NHS Secure Data Environment, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
| | - David Brind
- Institute of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- Centre for Health Data Science, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| | - Alexander Thorley
- School of Computer Science, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
- Centre for Health Data Science, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| | - Wenqi Lu
- Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom
| | - Diwei Zhou
- Department of Mathematical Sciences, Loughborough University, Loughborough, United Kingdom
| | - Xiaoxia Wang
- Institute of Cardiovascular Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre and West Midlands NHS Secure Data Environment, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
| | - Alastair R. Mobley
- Institute of Cardiovascular Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre and West Midlands NHS Secure Data Environment, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
| | - Otilia Tica
- Institute of Cardiovascular Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| | - Georgios V. Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- Centre for Health Data Science, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| | - Dipak Kotecha
- Institute of Cardiovascular Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre and West Midlands NHS Secure Data Environment, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
| | - Jinming Duan
- School of Computer Science, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| |
Collapse
|
2
|
Gill SK, Karwath A, Uh HW, Cardoso VR, Gu Z, Barsky A, Slater L, Acharjee A, Duan J, Dall'Olio L, el Bouhaddani S, Chernbumroong S, Stanbury M, Haynes S, Asselbergs FW, Grobbee DE, Eijkemans MJC, Gkoutos GV, Kotecha D. Artificial intelligence to enhance clinical value across the spectrum of cardiovascular healthcare. Eur Heart J 2023; 44:713-725. [PMID: 36629285 PMCID: PMC9976986 DOI: 10.1093/eurheartj/ehac758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 11/22/2022] [Accepted: 12/05/2022] [Indexed: 01/12/2023] Open
Abstract
Artificial intelligence (AI) is increasingly being utilized in healthcare. This article provides clinicians and researchers with a step-wise foundation for high-value AI that can be applied to a variety of different data modalities. The aim is to improve the transparency and application of AI methods, with the potential to benefit patients in routine cardiovascular care. Following a clear research hypothesis, an AI-based workflow begins with data selection and pre-processing prior to analysis, with the type of data (structured, semi-structured, or unstructured) determining what type of pre-processing steps and machine-learning algorithms are required. Algorithmic and data validation should be performed to ensure the robustness of the chosen methodology, followed by an objective evaluation of performance. Seven case studies are provided to highlight the wide variety of data modalities and clinical questions that can benefit from modern AI techniques, with a focus on applying them to cardiovascular disease management. Despite the growing use of AI, further education for healthcare workers, researchers, and the public are needed to aid understanding of how AI works and to close the existing gap in knowledge. In addition, issues regarding data access, sharing, and security must be addressed to ensure full engagement by patients and the public. The application of AI within healthcare provides an opportunity for clinicians to deliver a more personalized approach to medical care by accounting for confounders, interactions, and the rising prevalence of multi-morbidity.
Collapse
Affiliation(s)
- Simrat K Gill
- Institute of Cardiovascular Sciences, University of Birmingham, Vincent Drive, B15 2TT Birmingham, UK
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Andreas Karwath
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Institute of Cancer and Genomic Sciences, University of Birmingham, Vincent Drive, B15 2TT Birmingham, UK
| | - Hae-Won Uh
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Victor Roth Cardoso
- Institute of Cardiovascular Sciences, University of Birmingham, Vincent Drive, B15 2TT Birmingham, UK
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Institute of Cancer and Genomic Sciences, University of Birmingham, Vincent Drive, B15 2TT Birmingham, UK
| | - Zhujie Gu
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Andrey Barsky
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Institute of Cancer and Genomic Sciences, University of Birmingham, Vincent Drive, B15 2TT Birmingham, UK
| | - Luke Slater
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Institute of Cancer and Genomic Sciences, University of Birmingham, Vincent Drive, B15 2TT Birmingham, UK
| | - Animesh Acharjee
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Institute of Cancer and Genomic Sciences, University of Birmingham, Vincent Drive, B15 2TT Birmingham, UK
| | - Jinming Duan
- School of Computer Science, University of Birmingham, Birmingham, UK
- Alan Turing Institute, London, UK
| | - Lorenzo Dall'Olio
- Department of Physics and Astronomy, University of Bologna, Bologna, Italy
| | - Said el Bouhaddani
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Saisakul Chernbumroong
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Institute of Cancer and Genomic Sciences, University of Birmingham, Vincent Drive, B15 2TT Birmingham, UK
| | | | | | - Folkert W Asselbergs
- Amsterdam University Medical Center, Department of Cardiology, University of Amsterdam, Amsterdam, The Netherlands
- Health Data Research UK and Institute of Health Informatics, University College London, London, UK
| | - Diederick E Grobbee
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Marinus J C Eijkemans
- Julius Center for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Georgios V Gkoutos
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Institute of Cancer and Genomic Sciences, University of Birmingham, Vincent Drive, B15 2TT Birmingham, UK
| | - Dipak Kotecha
- Institute of Cardiovascular Sciences, University of Birmingham, Vincent Drive, B15 2TT Birmingham, UK
- Health Data Research UK Midlands, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
3
|
Taib BG, Karwath A, Wensley K, Minku L, Gkoutos GV, Moiemen N. Artificial intelligence in the management and treatment of burns: A systematic review and meta-analyses. J Plast Reconstr Aesthet Surg 2023; 77:133-161. [PMID: 36571960 DOI: 10.1016/j.bjps.2022.11.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 10/17/2022] [Accepted: 11/17/2022] [Indexed: 11/24/2022]
Abstract
INTRODUCTION AND AIM Artificial Intelligence (AI) is already being successfully employed to aid the interpretation of multiple facets of burns care. In the light of the growing influence of AI, this systematic review and diagnostic test accuracy meta-analyses aim to appraise and summarise the current direction of research in this field. METHOD A systematic literature review was conducted of relevant studies published between 1990 and 2021, yielding 35 studies. Twelve studies were suitable for a Diagnostic Test Meta-Analyses. RESULTS The studies generally focussed on burn depth (Accuracy 68.9%-95.4%, Sensitivity 90.8% and Specificity 84.4%), burn segmentation (Accuracy 76.0%-99.4%, Sensitivity 97.9% and specificity 97.6%) and burn related mortality (Accuracy >90%-97.5% Sensitivity 92.9% and specificity 93.4%). Neural networks were the most common machine learning (ML) algorithm utilised in 69% of the studies. The QUADAS-2 tool identified significant heterogeneity between studies. DISCUSSION The potential application of AI in the management of burns patients is promising, especially given its propitious results across a spectrum of dimensions, including burn depth, size, mortality, related sepsis and acute kidney injuries. The accuracy of the results analysed within this study is comparable to current practices in burns care. CONCLUSION The application of AI in the treatment and management of burns patients, as a series of point of care diagnostic adjuncts, is promising. Whilst AI is a potentially valuable tool, a full evaluation of its current utility and potential is limited by significant variations in research methodology and reporting.
Collapse
Affiliation(s)
- Bilal Gani Taib
- Burns and Plastic Surgery Department, Queen Elizabeth Hospital, Mindelsohn Way, Birmingham B15 2TH, United Kingdom.
| | - A Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom; Health Data Research UK Midlands Site, Birmingham, United Kingdom; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, United Kingdom
| | - K Wensley
- Burns and Plastic Surgery Department, Queen Elizabeth Hospital, Mindelsohn Way, Birmingham B15 2TH, United Kingdom
| | - L Minku
- School of Computer Science, University of Birmingham, Birmingham, United Kingdom
| | - G V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom; Health Data Research UK Midlands Site, Birmingham, United Kingdom; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, United Kingdom; NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, United Kingdom
| | - N Moiemen
- College of Medical and Dental Sciences, University of Birmingham, United Kingdom; Centre for Conflict Wound Research, Scar Free Foundation, Birmingham, United Kingdom; NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, United Kingdom
| |
Collapse
|
4
|
Slater K, Williams JA, Schofield PN, Russell S, Pendleton SC, Karwath A, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Klarigi: Characteristic explanations for semantic biomedical data. Comput Biol Med 2023; 153:106425. [PMID: 36638616 DOI: 10.1016/j.compbiomed.2022.106425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/04/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022]
Abstract
Annotation of biomedical entities with ontology classes provides for formal semantic analysis and mobilisation of background knowledge in determining their relationships. To date, enrichment analysis has been routinely employed to identify classes that are over-represented in annotations across sets of groups, such as biosample gene expression profiles or patient phenotypes, and is useful for a range of tasks including differential diagnosis and causative variant prioritisation. These approaches, however, usually consider only univariate relationships, make limited use of the semantic features of ontologies, and provide limited information and evaluation of the explanatory power of both singular and grouped candidate classes. Moreover, they are not designed to solve the problem of deriving cohesive, characteristic, and discriminatory sets of classes for entity groups. We have developed a new tool, called Klarigi, which introduces multiple scoring heuristics for identification of classes that are both compositional and discriminatory for groups of entities annotated with ontology classes. The tool includes a novel algorithm for derivation of multivariable semantic explanations for entity groups, makes use of semantic inference through live use of an ontology reasoner, and includes a classification method for identifying the discriminatory power of candidate sets, in addition to significance testing apposite to traditional enrichment approaches. We describe the design and implementation of Klarigi, including its scoring and explanation determination methods, and evaluate its use in application to two test cases with clinical significance, comparing and contrasting methods and results with literature-based and enrichment analysis methods. We demonstrate that Klarigi produces characteristic and discriminatory explanations for groups of biomedical entities in two settings. We also show that these explanations recapitulate and extend the knowledge held in existing biomedical databases and literature for several diseases. We conclude that Klarigi provides a distinct and valuable perspective on biomedical datasets when compared with traditional enrichment methods, and therefore constitutes a new method by which biomedical datasets can be explored, contributing to improved insight into semantic data.
Collapse
Affiliation(s)
- Karin Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
| | - John A Williams
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Paul N Schofield
- Department of Physiology, Development, and Neuroscience, University of Cambridge, UK
| | - Sophie Russell
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Samantha C Pendleton
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Andreas Karwath
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Hilary Fanning
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Simon Ball
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| |
Collapse
|
5
|
Wu H, Wang M, Wu J, Francis F, Chang YH, Shavick A, Dong H, Poon MTC, Fitzpatrick N, Levine AP, Slater LT, Handy A, Karwath A, Gkoutos GV, Chelala C, Shah AD, Stewart R, Collier N, Alex B, Whiteley W, Sudlow C, Roberts A, Dobson RJB. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. NPJ Digit Med 2022; 5:186. [PMID: 36544046 PMCID: PMC9770568 DOI: 10.1038/s41746-022-00730-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
Much of the knowledge and information needed for enabling high-quality clinical research is stored in free-text format. Natural language processing (NLP) has been used to extract information from these sources at scale for several decades. This paper aims to present a comprehensive review of clinical NLP for the past 15 years in the UK to identify the community, depict its evolution, analyse methodologies and applications, and identify the main barriers. We collect a dataset of clinical NLP projects (n = 94; £ = 41.97 m) funded by UK funders or the European Union's funding programmes. Additionally, we extract details on 9 funders, 137 organisations, 139 persons and 431 research papers. Networks are created from timestamped data interlinking all entities, and network analysis is subsequently applied to generate insights. 431 publications are identified as part of a literature review, of which 107 are eligible for final analysis. Results show, not surprisingly, clinical NLP in the UK has increased substantially in the last 15 years: the total budget in the period of 2019-2022 was 80 times that of 2007-2010. However, the effort is required to deepen areas such as disease (sub-)phenotyping and broaden application domains. There is also a need to improve links between academia and industry and enable deployments in real-world settings for the realisation of clinical NLP's great potential in care delivery. The major barriers include research and development access to hospital data, lack of capable computational resources in the right places, the scarcity of labelled data and barriers to sharing of pretrained models.
Collapse
Affiliation(s)
- Honghan Wu
- Institute of Health Informatics, University College London, London, UK.
| | - Minhong Wang
- Institute of Health Informatics, University College London, London, UK
| | - Jinge Wu
- Institute of Health Informatics, University College London, London, UK
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Farah Francis
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Yun-Hsuan Chang
- Institute of Health Informatics, University College London, London, UK
| | - Alex Shavick
- Research Department of Pathology, UCL Cancer Institute, University College London, London, UK
| | - Hang Dong
- Usher Institute, University of Edinburgh, Edinburgh, UK
- Department of Computer Science, University of Oxford, Oxford, UK
| | | | | | - Adam P Levine
- Research Department of Pathology, UCL Cancer Institute, University College London, London, UK
| | - Luke T Slater
- Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
| | - Alex Handy
- Institute of Health Informatics, University College London, London, UK
- University College London Hospitals NHS Trust, London, UK
| | - Andreas Karwath
- Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
| | - Claude Chelala
- Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Anoop Dinesh Shah
- Institute of Health Informatics, University College London, London, UK
| | - Robert Stewart
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Nigel Collier
- Theoretical and Applied Linguistics, Faculty of Modern & Medieval Languages & Linguistics, University of Cambridge, Cambridge, UK
| | - Beatrice Alex
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, UK
| | | | - Cathie Sudlow
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Angus Roberts
- Department of Biostatistics & Health Informatics, King's College London, London, UK
| | - Richard J B Dobson
- Institute of Health Informatics, University College London, London, UK
- Department of Biostatistics & Health Informatics, King's College London, London, UK
| |
Collapse
|
6
|
Manley SE, Karwath A, Williams J, Nightingale P, Webber J, Raghavan R, Barratt A, Webster C, Round R, Stratton I, Gkoutos G, Roberts G, Mostafa S, Ghosh S. use of HbA1c for new diagnosis of diabetes in those with hyperglycaemia on admission to or attendance at hospital urgently requires research. Br J Diabetes 2022. [DOI: 10.15277/bjd.2022.386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The prevalence of diabetes in Birmingham is 11% but it is 22% in hospital inpatients. Queen Elizabeth Hospital in Birmingham (QEHB) serves a multi-ethnic population with 6% Afro-Caribbean, 19% South Asian and 70% White European.
A clinical audit of 18,965 emergency admissions to QEHB showed that 5% were undiagnosed but had admission glucose in the ‘diabetes’ range and 16% were in the ‘at risk’ range. The proportion of Afro-Caribbeans (7%) and South Asians (8%) in the ‘diabetes’ range was higher than White Europeans (5%). Given the magnitude of the problem, this paper explores the issues concerning the use of reflex HbA1c testing in the UK for diagnosis of diabetes in hospital admissions. HbA1c testing is suitable for most patients but conditions affecting red blood cell turnover invalidate the results in a small number of people.
However, there are pertinent questions relating to the introduction of such testing in the NHS on a routine basis. Literature searches on a topical question ‘Is hyperglycaemia identified during emergency admission/attendance acted upon?’, were performed from 2016 to 2021 and 2016 to 2022. They identified 21 different, relevant, research papers - 5 from Australia, 9 from Europe including 4 from the UK, 5 from America and 1 each from Canada and Africa. These papers revealed an absence of established procedures for the management and follow-up of routinely detected hyperglycaemia using HbA1c when no previous diabetes diagnosis was recorded.
Further work is required to determine the role of reflex HbA1c testing for diagnosis of diabetes in admissions with hyperglycaemia, and the cost-effectiveness and role of point-of-care HbA1c testing.
Collapse
|
7
|
Williams JA, Burgess S, Suckling J, Lalousis PA, Batool F, Griffiths SL, Palmer E, Karwath A, Barsky A, Gkoutos GV, Wood S, Barnes NM, David AS, Donohoe G, Neill JC, Deakin B, Khandaker GM, Upthegrove R. Inflammation and Brain Structure in Schizophrenia and Other Neuropsychiatric Disorders: A Mendelian Randomization Study. JAMA Psychiatry 2022; 79:498-507. [PMID: 35353173 PMCID: PMC8968718 DOI: 10.1001/jamapsychiatry.2022.0407] [Citation(s) in RCA: 77] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 02/09/2022] [Indexed: 02/02/2023]
Abstract
Importance Previous in vitro and postmortem research suggests that inflammation may lead to structural brain changes via activation of microglia and/or astrocytic dysfunction in a range of neuropsychiatric disorders. Objective To investigate the relationship between inflammation and changes in brain structures in vivo and to explore a transcriptome-driven functional basis with relevance to mental illness. Design, Setting, and Participants This study used multistage linked analyses, including mendelian randomization (MR), gene expression correlation, and connectivity analyses. A total of 20 688 participants in the UK Biobank, which includes clinical, genomic, and neuroimaging data, and 6 postmortem brains from neurotypical individuals in the Allen Human Brain Atlas (AHBA), including RNA microarray data. Data were extracted in February 2021 and analyzed between March and October 2021. Exposures Genetic variants regulating levels and activity of circulating interleukin 1 (IL-1), IL-2, IL-6, C-reactive protein (CRP), and brain-derived neurotrophic factor (BDNF) were used as exposures in MR analyses. Main Outcomes and Measures Brain imaging measures, including gray matter volume (GMV) and cortical thickness (CT), were used as outcomes. Associations were considered significant at a multiple testing-corrected threshold of P < 1.1 × 10-4. Differential gene expression in AHBA data was modeled in brain regions mapped to areas significant in MR analyses; genes were tested for biological and disease overrepresentation in annotation databases and for connectivity in protein-protein interaction networks. Results Of 20 688 participants in the UK Biobank sample, 10 828 (52.3%) were female, and the mean (SD) age was 55.5 (7.5) years. In the UK Biobank sample, genetically predicted levels of IL-6 were associated with GMV in the middle temporal cortex (z score, 5.76; P = 8.39 × 10-9), inferior temporal (z score, 3.38; P = 7.20 × 10-5), fusiform (z score, 4.70; P = 2.60 × 10-7), and frontal (z score, -3.59; P = 3.30 × 10-5) cortex together with CT in the superior frontal region (z score, -5.11; P = 3.22 × 10-7). No significant associations were found for IL-1, IL-2, CRP, or BDNF after correction for multiple comparison. In the AHBA sample, 5 of 6 participants (83%) were male, and the mean (SD) age was 42.5 (13.4) years. Brain-wide coexpression analysis showed a highly interconnected network of genes preferentially expressed in the middle temporal gyrus (MTG), which further formed a highly connected protein-protein interaction network with IL-6 (enrichment test of expected vs observed network given the prevalence and degree of interactions in the STRING database: 43 nodes/30 edges observed vs 8 edges expected; mean node degree, 1.4; genome-wide significance, P = 4.54 × 10-9). MTG differentially expressed genes that were functionally enriched for biological processes in schizophrenia, autism spectrum disorder, and epilepsy. Conclusions and Relevance In this study, genetically determined IL-6 was associated with brain structure and potentially affects areas implicated in developmental neuropsychiatric disorders, including schizophrenia and autism.
Collapse
Affiliation(s)
- John A. Williams
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom
- Institute for Translational Medicine, University of Birmingham, Birmingham, United Kingdom
- Health Data Research UK (HRD), Midlands Site, Birmingham, United Kingdom
| | - Stephen Burgess
- Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, United Kingdom
- Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - John Suckling
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
| | - Paris Alexandros Lalousis
- Institute for Mental Health, University of Birmingham, Birmingham, United Kingdom
- Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| | - Fatima Batool
- Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, United Kingdom
| | - Sian Lowri Griffiths
- Institute for Mental Health, University of Birmingham, Birmingham, United Kingdom
- Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| | - Edward Palmer
- Institute for Mental Health, University of Birmingham, Birmingham, United Kingdom
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom
- Institute for Translational Medicine, University of Birmingham, Birmingham, United Kingdom
- Health Data Research UK (HRD), Midlands Site, Birmingham, United Kingdom
| | - Andrey Barsky
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom
- Institute for Translational Medicine, University of Birmingham, Birmingham, United Kingdom
| | - Georgios V. Gkoutos
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom
- Institute for Translational Medicine, University of Birmingham, Birmingham, United Kingdom
- Health Data Research UK (HRD), Midlands Site, Birmingham, United Kingdom
| | - Stephen Wood
- Institute for Mental Health, University of Birmingham, Birmingham, United Kingdom
- Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
- Orygen, Melbourne, Australia
- Centre for Youth Mental Health, University of Melbourne, Melbourne, Australia
| | - Nicholas M. Barnes
- Institute for Clinical Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Anthony S. David
- Institute of Mental Health, University College London, London, United Kingdom
| | - Gary Donohoe
- School of Psychology, National University of Ireland Galway, Galway, Ireland
- Centre for Neuroimaging, Cognition and Genomics, National University of Ireland Galway, Galway, Ireland
| | - Joanna C. Neill
- Division of Pharmacy and Optometry, School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| | - Bill Deakin
- Division of Neuroscience and Experimental Psychology, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| | - Golam M. Khandaker
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
- Centre for Academic Mental Health, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
- Avon and Wiltshire Mental Health Partnership NHS Trust, Bristol, United Kingdom
- NIHR Bristol Biomedical Research Centre, Bristol, United Kingdom
| | - Rachel Upthegrove
- Institute for Mental Health, University of Birmingham, Birmingham, United Kingdom
- Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
- Early Intervention Service, Birmingham Women’s and Children’s NHS Foundation Trust, Birmingham, United Kingdom
| |
Collapse
|
8
|
Slater LT, Russell S, Makepeace S, Carberry A, Karwath A, Williams JA, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Evaluating semantic similarity methods for comparison of text-derived phenotype profiles. BMC Med Inform Decis Mak 2022; 22:33. [PMID: 35123470 PMCID: PMC8818208 DOI: 10.1186/s12911-022-01770-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/21/2022] [Indexed: 11/16/2022] Open
Abstract
Background Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area. Methods We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III). Results 300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures. Conclusion We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.
Collapse
|
9
|
Slater LT, Karwath A, Hoehndorf R, Gkoutos GV. Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile Similarity. Front Digit Health 2021; 3:781227. [PMID: 34939069 PMCID: PMC8685209 DOI: 10.3389/fdgth.2021.781227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/12/2021] [Indexed: 11/13/2022] Open
Abstract
Semantic similarity is a useful approach for comparing patient phenotypes, and holds the potential of an effective method for exploiting text-derived phenotypes for differential diagnosis, text and document classification, and outcome prediction. While approaches for context disambiguation are commonly used in text mining applications, forming a standard component of information extraction pipelines, their effects on semantic similarity calculations have not been widely explored. In this work, we evaluate how inclusion and disclusion of negated and uncertain mentions of concepts from text-derived phenotypes affects similarity of patients, and the use of those profiles to predict diagnosis. We report on the effectiveness of these approaches and report a very small, yet significant, improvement in performance when classifying primary diagnosis over MIMIC-III patient visits.
Collapse
Affiliation(s)
- Luke T Slater
- Centre for Computational Biology, College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom.,University Hospitals Birmingham National Health Service Foundation Trust, Birmingham, United Kingdom.,MRC Health Data Research UK (HDR UK) Midlands, Birmingham, United Kingdom
| | - Andreas Karwath
- Centre for Computational Biology, College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom.,University Hospitals Birmingham National Health Service Foundation Trust, Birmingham, United Kingdom.,MRC Health Data Research UK (HDR UK) Midlands, Birmingham, United Kingdom
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Georgios V Gkoutos
- Centre for Computational Biology, College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom.,University Hospitals Birmingham National Health Service Foundation Trust, Birmingham, United Kingdom.,MRC Health Data Research UK (HDR UK) Midlands, Birmingham, United Kingdom.,National Institute for Health Research Experimental Cancer Medicine Centre, Birmingham, United Kingdom.,National Institute for Health Research Surgical Reconstruction and Microbiology Research Centre, Birmingham, United Kingdom.,National Institute for Health Research Biomedical Research Centre, Birmingham, United Kingdom
| |
Collapse
|
10
|
Wehr MM, Sarang SS, Rooseboom M, Boogaard PJ, Karwath A, Escher SE. RespiraTox - Development of a QSAR model to predict human respiratory irritants. Regul Toxicol Pharmacol 2021; 128:105089. [PMID: 34861320 DOI: 10.1016/j.yrtph.2021.105089] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 11/17/2021] [Accepted: 11/23/2021] [Indexed: 11/25/2022]
Abstract
Respiratory irritation is an important human health endpoint in chemical risk assessment. There are two established modes of action of respiratory irritation, 1) sensory irritation mediated by the interaction with sensory neurons, potentially stimulating trigeminal nerve, and 2) direct tissue irritation. The aim of our research was to, develop a QSAR method to predict human respiratory irritants, and to potentially reduce the reliance on animal testing for the identification of respiratory irritants. Compounds are classified as irritating based on combined evidence from different types of toxicological data, including inhalation studies with acute and repeated exposure. The curated project database comprised 1997 organic substances, 1553 being classified as irritating and 444 as non-irritating. A comparison of machine learning approaches, including Logistic Regression (LR), Random Forests (RFs), and Gradient Boosted Decision Trees (GBTs), showed, the best classification was obtained by GBTs. The LR model resulted in an area under the curve (AUC) of 0.65, while the optimal performance for both RFs and GBTs gives an AUC of 0.71. In addition to the classification and the information on the applicability domain, the web-based tool provides a list of structurally similar analogues together with their experimental data to facilitate expert review for read-across purposes.
Collapse
Affiliation(s)
- Matthias M Wehr
- Fraunhofer Institute for Toxicology and Experimental Medicine - ITEM, Hannover, Germany.
| | | | | | - Peter J Boogaard
- Shell International, Shell Health, The Hague, Netherlands; Wageningen University & Research, Wageningen, Netherlands
| | | | - Sylvia E Escher
- Fraunhofer Institute for Toxicology and Experimental Medicine - ITEM, Hannover, Germany.
| |
Collapse
|
11
|
Slater K, Williams JA, Karwath A, Fanning H, Ball S, Schofield PN, Hoehndorf R, Gkoutos GV. Multi-faceted semantic clustering with text-derived phenotypes. Comput Biol Med 2021; 138:104904. [PMID: 34600327 PMCID: PMC8573608 DOI: 10.1016/j.compbiomed.2021.104904] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 09/22/2021] [Accepted: 09/23/2021] [Indexed: 02/03/2023]
Abstract
Identification of ontology concepts in clinical narrative text enables the creation of phenotype profiles that can be associated with clinical entities, such as patients or drugs. Constructing patient phenotype profiles using formal ontologies enables their analysis via semantic similarity, in turn enabling the use of background knowledge in clustering or classification analyses. However, traditional semantic similarity approaches collapse complex relationships between patient phenotypes into a unitary similarity scores for each pair of patients. Moreover, single scores may be based only on matching terms with the greatest information content (IC), ignoring other dimensions of patient similarity. This process necessarily leads to a loss of information in the resulting representation of patient similarity, and is especially apparent when using very large text-derived and highly multi-morbid phenotype profiles. Moreover, it renders finding a biological explanation for similarity very difficult; the black box problem. In this article, we explore the generation of multiple semantic similarity scores for patients based on different facets of their phenotypic manifestation, which we define through different sub-graphs in the Human Phenotype Ontology. We further present a new methodology for deriving sets of qualitative class descriptions for groups of entities described by ontology terms. Leveraging this strategy to obtain meaningful explanations for our semantic clusters alongside other evaluation techniques, we show that semantic clustering with ontology-derived facets enables the representation, and thus identification of, clinically relevant phenotype relationships not easily recoverable using overall clustering alone. In this way, we demonstrate the potential of faceted semantic clustering for gaining a deeper and more nuanced understanding of text-derived patient phenotypes.
Collapse
Affiliation(s)
- Karin Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK) Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
| | - John A Williams
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Andreas Karwath
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK) Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Hilary Fanning
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Simon Ball
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Paul N Schofield
- Dept of Physiology, Development, and Neuroscience, University of Cambridge, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Saudi Arabia
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK) Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| |
Collapse
|
12
|
Karwath A, Bunting KV, Gill SK, Tica O, Pendleton S, Aziz F, Barsky AD, Chernbumroong S, Duan J, Mobley AR, Cardoso VR, Slater K, Williams JA, Bruce EJ, Wang X, Flather MD, Coats AJS, Gkoutos GV, Kotecha D. Redefining β-blocker response in heart failure patients with sinus rhythm and atrial fibrillation: a machine learning cluster analysis. Lancet 2021; 398:1427-1435. [PMID: 34474011 PMCID: PMC8542730 DOI: 10.1016/s0140-6736(21)01638-x] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/01/2021] [Accepted: 07/07/2021] [Indexed: 01/19/2023]
Abstract
BACKGROUND Mortality remains unacceptably high in patients with heart failure and reduced left ventricular ejection fraction (LVEF) despite advances in therapeutics. We hypothesised that a novel artificial intelligence approach could better assess multiple and higher-dimension interactions of comorbidities, and define clusters of β-blocker efficacy in patients with sinus rhythm and atrial fibrillation. METHODS Neural network-based variational autoencoders and hierarchical clustering were applied to pooled individual patient data from nine double-blind, randomised, placebo-controlled trials of β blockers. All-cause mortality during median 1·3 years of follow-up was assessed by intention to treat, stratified by electrocardiographic heart rhythm. The number of clusters and dimensions was determined objectively, with results validated using a leave-one-trial-out approach. This study was prospectively registered with ClinicalTrials.gov (NCT00832442) and the PROSPERO database of systematic reviews (CRD42014010012). FINDINGS 15 659 patients with heart failure and LVEF of less than 50% were included, with median age 65 years (IQR 56-72) and LVEF 27% (IQR 21-33). 3708 (24%) patients were women. In sinus rhythm (n=12 822), most clusters demonstrated a consistent overall mortality benefit from β blockers, with odds ratios (ORs) ranging from 0·54 to 0·74. One cluster in sinus rhythm of older patients with less severe symptoms showed no significant efficacy (OR 0·86, 95% CI 0·67-1·10; p=0·22). In atrial fibrillation (n=2837), four of five clusters were consistent with the overall neutral effect of β blockers versus placebo (OR 0·92, 0·77-1·10; p=0·37). One cluster of younger atrial fibrillation patients at lower mortality risk but similar LVEF to average had a statistically significant reduction in mortality with β blockers (OR 0·57, 0·35-0·93; p=0·023). The robustness and consistency of clustering was confirmed for all models (p<0·0001 vs random), and cluster membership was externally validated across the nine independent trials. INTERPRETATION An artificial intelligence-based clustering approach was able to distinguish prognostic response from β blockers in patients with heart failure and reduced LVEF. This included patients in sinus rhythm with suboptimal efficacy, as well as a cluster of patients with atrial fibrillation where β blockers did reduce mortality. FUNDING Medical Research Council, UK, and EU/EFPIA Innovative Medicines Initiative BigData@Heart.
Collapse
Affiliation(s)
- Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | - Karina V Bunting
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | - Simrat K Gill
- Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK
| | - Otilia Tica
- Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | - Samantha Pendleton
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
| | - Furqan Aziz
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | - Andrey D Barsky
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | | | - Jinming Duan
- Computer Sciences, University of Birmingham, Birmingham, UK
| | - Alastair R Mobley
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | - Victor Roth Cardoso
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK; Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | - Karin Slater
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | - John A Williams
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | - Emma-Jane Bruce
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | - Xiaoxia Wang
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK
| | | | | | - Georgios V Gkoutos
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK.
| | - Dipak Kotecha
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK; Health Data Research UK Midlands Site, Birmingham, UK.
| |
Collapse
|
13
|
Chapman M, Mumtaz S, Rasmussen LV, Karwath A, Gkoutos GV, Gao C, Thayer D, Pacheco JA, Parkinson H, Richesson RL, Jefferson E, Denaxas S, Curcin V. Desiderata for the development of next-generation electronic health record phenotype libraries. Gigascience 2021; 10:giab059. [PMID: 34508578 PMCID: PMC8434766 DOI: 10.1093/gigascience/giab059] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/15/2021] [Accepted: 08/18/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. METHODS A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. RESULTS We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. CONCLUSIONS There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.
Collapse
Affiliation(s)
- Martin Chapman
- Department of Population Health Sciences, King's College London, London, SE1 1UL, UK
| | - Shahzad Mumtaz
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Luke V Rasmussen
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Chuang Gao
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Dan Thayer
- SAIL Databank, Swansea University, Swansea, SA2 8PP, UK
| | - Jennifer A Pacheco
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Rachel L Richesson
- Department of Learning Health Sciences, University of Michigan Medical School, MI 48109, USA
| | - Emily Jefferson
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
| | - Vasa Curcin
- Department of Population Health Sciences, King's College London, London, SE1 1UL, UK
| |
Collapse
|
14
|
Pendleton SC, Slater K, Karwath A, Gilbert RM, Davis N, Pesudovs K, Liu X, Denniston AK, Gkoutos GV, Braithwaite T. Development and application of the ocular immune-mediated inflammatory diseases ontology enhanced with synonyms from online patient support forum conversation. Comput Biol Med 2021; 135:104542. [PMID: 34139439 PMCID: PMC8404035 DOI: 10.1016/j.compbiomed.2021.104542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 05/27/2021] [Accepted: 05/30/2021] [Indexed: 11/28/2022]
Abstract
BACKGROUND Unstructured text created by patients represents a rich, but relatively inaccessible resource for advancing patient-centred care. This study aimed to develop an ontology for ocular immune-mediated inflammatory diseases (OcIMIDo), as a tool to facilitate data extraction and analysis, illustrating its application to online patient support forum data. METHODS We developed OcIMIDo using clinical guidelines, domain expertise, and cross-references to classes from other biomedical ontologies. We developed an approach to add patient-preferred synonyms text-mined from oliviasvision.org online forum, using statistical ranking. We validated the approach with split-sampling and comparison to manual extraction. Using OcIMIDo, we then explored the frequency of OcIMIDo classes and synonyms, and their potential association with natural language sentiment expressed in each online forum post. FINDINGS OcIMIDo (version 1.2) includes 661 classes, describing anatomy, clinical phenotype, disease activity status, complications, investigations, interventions and functional impacts. It contains 1661 relationships and axioms, 2851 annotations, including 1131 database cross-references, and 187 patient-preferred synonyms. To illustrate OcIMIDo's potential applications, we explored 9031 forum posts, revealing frequent mention of different clinical phenotypes, treatments, and complications. Language sentiment analysis of each post was generally positive (median 0.12, IQR 0.01-0.24). In multivariable logistic regression, the odds of a post expressing negative sentiment were significantly associated with first posts as compared to replies (OR 3.3, 95% CI 2.8 to 3.9, p < 0.001). CONCLUSION We report the development and validation of a new ontology for inflammatory eye diseases, which includes patient-preferred synonyms, and can be used to explore unstructured patient or physician-reported text data, with many potential applications.
Collapse
Affiliation(s)
- Samantha C Pendleton
- Institute of Cancer and Genomic Sciences, University of Birmingham, UK; University Hospitals Birmingham NHS Foundation Trust, UK.
| | - Karin Slater
- Institute of Cancer and Genomic Sciences, University of Birmingham, UK; University Hospitals Birmingham NHS Foundation Trust, UK
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, UK; University Hospitals Birmingham NHS Foundation Trust, UK; Health Data Research, UK
| | - Rose M Gilbert
- Moorfields Eye Hospital NHS Foundation Trust, London, UK; Institute of Ophthalmology, University College London, UK
| | - Nicola Davis
- Olivia's Vision, Southampton Buildings, London, UK
| | - Konrad Pesudovs
- School of Optometry and Vision Science, University of New South Wales, Australia
| | - Xiaoxuan Liu
- University Hospitals Birmingham NHS Foundation Trust, UK; Institute of Inflammation and Ageing, University of Birmingham, UK
| | - Alastair K Denniston
- University Hospitals Birmingham NHS Foundation Trust, UK; Health Data Research, UK; Institute of Inflammation and Ageing, University of Birmingham, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, UK; University Hospitals Birmingham NHS Foundation Trust, UK; Health Data Research, UK
| | - Tasanee Braithwaite
- University Hospitals Birmingham NHS Foundation Trust, UK; Institute of Applied Health Research, University of Birmingham, UK; The Medical Eye Unit, St Thomas' Hospital NHS Foundation Trust, London, UK
| |
Collapse
|
15
|
Slater K, Karwath A, Williams JA, Russell S, Makepeace S, Carberry A, Hoehndorf R, Gkoutos GV. Towards similarity-based differential diagnostics for common diseases. Comput Biol Med 2021; 133:104360. [PMID: 33836447 PMCID: PMC8204262 DOI: 10.1016/j.compbiomed.2021.104360] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/22/2021] [Accepted: 03/24/2021] [Indexed: 11/30/2022]
Abstract
Ontology-based phenotype profiles have been utilised for the purpose of differential diagnosis of rare genetic diseases, and for decision support in specific disease domains. Particularly, semantic similarity facilitates diagnostic hypothesis generation through comparison with disease phenotype profiles. However, the approach has not been applied for differential diagnosis of common diseases, or generalised clinical diagnostics from uncurated text-derived phenotypes. In this work, we describe the development of an approach for deriving patient phenotype profiles from clinical narrative text, and apply this to text associated with MIMIC-III patient visits. We then explore the use of semantic similarity with those text-derived phenotypes to classify primary patient diagnosis, comparing the use of patient-patient similarity and patient-disease similarity using phenotype-disease profiles previously mined from literature. We also consider a combined approach, in which literature-derived phenotypes are extended with the content of text-derived phenotypes we mined from 500 patients. The results reveal a powerful approach, showing that in one setting, uncurated text phenotypes can be used for differential diagnosis of common diseases, making use of information both inside and outside the setting. While the methods themselves should be explored for further optimisation, they could be applied to a variety of clinical tasks, such as differential diagnosis, cohort discovery, document and text classification, and outcome prediction.
Collapse
Affiliation(s)
- Karin Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
| | - Andreas Karwath
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - John A Williams
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Sophie Russell
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Silver Makepeace
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Alexander Carberry
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Saudi Arabia
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK) Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| |
Collapse
|
16
|
Bunting KV, Gill SK, Sitch A, Mehta S, O'Connor K, Lip GY, Kirchhof P, Strauss VY, Rahimi K, Camm AJ, Stanbury M, Griffith M, Townend JN, Gkoutos GV, Karwath A, Steeds RP, Kotecha D. Improving the diagnosis of heart failure in patients with atrial fibrillation. Heart 2021; 107:902-908. [PMID: 33692093 PMCID: PMC8142420 DOI: 10.1136/heartjnl-2020-318557] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 01/21/2021] [Accepted: 01/25/2021] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE To improve the echocardiographic assessment of heart failure in patients with atrial fibrillation (AF) by comparing conventional averaging of consecutive beats with an index-beat approach, whereby measurements are taken after two cycles with similar R-R interval. METHODS Transthoracic echocardiography was performed using a standardised and blinded protocol in patients enrolled in the RATE-AF (RAte control Therapy Evaluation in permanent Atrial Fibrillation) randomised trial. We compared reproducibility of the index-beat and conventional consecutive-beat methods to calculate left ventricular ejection fraction (LVEF), global longitudinal strain (GLS) and E/e' (mitral E wave max/average diastolic tissue Doppler velocity), and assessed intraoperator/interoperator variability, time efficiency and validity against natriuretic peptides. RESULTS 160 patients were included, 46% of whom were women, with a median age of 75 years (IQR 69-82) and a median heart rate of 100 beats per minute (IQR 86-112). The index-beat had the lowest within-beat coefficient of variation for LVEF (32%, vs 51% for 5 consecutive beats and 53% for 10 consecutive beats), GLS (26%, vs 43% and 42%) and E/e' (25%, vs 41% and 41%). Intraoperator (n=50) and interoperator (n=18) reproducibility were both superior for index-beats and this method was quicker to perform (p<0.001): 35.4 s to measure E/e' (95% CI 33.1 to 37.8) compared with 44.7 s for 5-beat (95% CI 41.8 to 47.5) and 98.1 s for 10-beat (95% CI 91.7 to 104.4) analyses. Using a single index-beat did not compromise the association of LVEF, GLS or E/e' with natriuretic peptide levels. CONCLUSIONS Compared with averaging of multiple beats in patients with AF, the index-beat approach improves reproducibility and saves time without a negative impact on validity, potentially improving the diagnosis and classification of heart failure in patients with AF.
Collapse
Affiliation(s)
- Karina V Bunting
- Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK
- Cardiology Department, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Simrat K Gill
- Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK
- Cardiology Department, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Alice Sitch
- NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, UK
- Test Evaluation Research Group, Institute of Applied Health Research, University of Birmingham, Birmingham, UK
| | - Samir Mehta
- University of Birmingham Clinical Trials Unit, Birmingham, UK
| | - Kieran O'Connor
- Cardiology Department, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Gregory Yh Lip
- Thrombosis Research Unit, Aalborg University, Aalborg, Denmark
- Liverpool Centre for Cardiovascular Science, University of Liverpool and Liverpool Heart and Chest Hospital NHS Foundation Trust, Liverpool, UK
| | - Paulus Kirchhof
- Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK
- University Heart and Vascular Center, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | | | - Kazem Rahimi
- Deep Medicine, Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
| | - A John Camm
- Cardiology Clinical Academic Group - Molecular & Clinical Sciences Institute, St George's University of London, London, UK
| | | | - Michael Griffith
- Cardiology Department, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Jonathan N Townend
- Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK
- Cardiology Department, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
- Health Data Research (HDR)-UK Midlands, Birmingham, UK
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
| | - Richard P Steeds
- Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK
- Cardiology Department, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Dipak Kotecha
- Institute of Cardiovascular Sciences, University of Birmingham, Birmingham, UK
- Cardiology Department, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Health Data Research (HDR)-UK Midlands, Birmingham, UK
| |
Collapse
|
17
|
Wu H, Zhang H, Karwath A, Ibrahim Z, Shi T, Zhang X, Wang K, Sun J, Dhaliwal K, Bean D, Cardoso VR, Li K, Teo JT, Banerjee A, Gao-Smith F, Whitehouse T, Veenith T, Gkoutos GV, Wu X, Dobson R, Guthrie B. Ensemble learning for poor prognosis predictions: A case study on SARS-CoV-2. J Am Med Inform Assoc 2021; 28:791-800. [PMID: 33185672 PMCID: PMC7717299 DOI: 10.1093/jamia/ocaa295] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 11/11/2020] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Risk prediction models are widely used to inform evidence-based clinical decision making. However, few models developed from single cohorts can perform consistently well at population level where diverse prognoses exist (such as the SARS-CoV-2 [severe acute respiratory syndrome coronavirus 2] pandemic). This study aims at tackling this challenge by synergizing prediction models from the literature using ensemble learning. MATERIALS AND METHODS In this study, we selected and reimplemented 7 prediction models for COVID-19 (coronavirus disease 2019) that were derived from diverse cohorts and used different implementation techniques. A novel ensemble learning framework was proposed to synergize them for realizing personalized predictions for individual patients. Four diverse international cohorts (2 from the United Kingdom and 2 from China; N = 5394) were used to validate all 8 models on discrimination, calibration, and clinical usefulness. RESULTS Results showed that individual prediction models could perform well on some cohorts while poorly on others. Conversely, the ensemble model achieved the best performances consistently on all metrics quantifying discrimination, calibration, and clinical usefulness. Performance disparities were observed in cohorts from the 2 countries: all models achieved better performances on the China cohorts. DISCUSSION When individual models were learned from complementary cohorts, the synergized model had the potential to achieve better performances than any individual model. Results indicate that blood parameters and physiological measurements might have better predictive powers when collected early, which remains to be confirmed by further studies. CONCLUSIONS Combining a diverse set of individual prediction models, the ensemble method can synergize a robust and well-performing model by choosing the most competent ones for individual patients.
Collapse
Affiliation(s)
- Honghan Wu
- Institute of Health Informatics, University College London,
London, United Kingdom
- Health Data Research UK, University College London, London,
United Kingdom
| | - Huayu Zhang
- Centre for Medical Informatics, Usher Institute, University of
Edinburgh, Edinburgh, United Kingdom
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of
Birmingham, Birmingham, United Kingdom
- Health Data Research UK, University of Birmingham, Birmingham,
United Kingdom
| | - Zina Ibrahim
- Health Data Research UK, University College London, London,
United Kingdom
- Department of Biostatistics and Health Informatics, Institute of Psychiatry,
Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Ting Shi
- Centre for Global Health, Usher Institute, University of
Edinburgh, Edinburgh, United Kingdom
| | - Xin Zhang
- Department of Pulmonary and Critical Care Medicine, People’s Liberation Army
Joint Logistic Support Force 920th Hospital, Kunming, China
| | - Kun Wang
- Department of Pulmonary and Critical Care Medicine, Shanghai East Hospital,
Tongji University, Shanghai, China
| | - Jiaxing Sun
- Department of Pulmonary and Critical Care Medicine, Shanghai East Hospital,
Tongji University, Shanghai, China
| | - Kevin Dhaliwal
- Centre for Inflammation Research, Queens Medical Research Institute, University
of Edinburgh, Edinburgh, United
Kingdom
| | - Daniel Bean
- Department of Biostatistics and Health Informatics, Institute of Psychiatry,
Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Victor Roth Cardoso
- Institute of Cancer and Genomic Sciences, University of
Birmingham, Birmingham, United Kingdom
- Health Data Research UK, University of Birmingham, Birmingham,
United Kingdom
| | - Kezhi Li
- Institute of Health Informatics, University College London,
London, United Kingdom
| | - James T Teo
- Department of Stroke and Neurology, King’s College Hospital NHS Foundation
Trust, London, United Kingdom
| | - Amitava Banerjee
- Institute of Health Informatics, University College London,
London, United Kingdom
| | - Fang Gao-Smith
- Department of Intensive Care Medicine, Queen Elizabeth Hospital
Birmingham, Birmingham, United Kingdom
- Birmingham Acute Care Research, University of Birmingham,
Birmingham, United Kingdom
| | - Tony Whitehouse
- Department of Intensive Care Medicine, Queen Elizabeth Hospital
Birmingham, Birmingham, United Kingdom
- Birmingham Acute Care Research, University of Birmingham,
Birmingham, United Kingdom
| | - Tonny Veenith
- Department of Intensive Care Medicine, Queen Elizabeth Hospital
Birmingham, Birmingham, United Kingdom
- Birmingham Acute Care Research, University of Birmingham,
Birmingham, United Kingdom
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of
Birmingham, Birmingham, United Kingdom
- Health Data Research UK, University of Birmingham, Birmingham,
United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham NHS
Foundation Trust, Birmingham, United
Kingdom
| | - Xiaodong Wu
- Department of Pulmonary and Critical Care Medicine, Shanghai East Hospital,
Tongji University, Shanghai, China
- Department of Pulmonary and Critical Care Medicine, Taikang Tongji
Hospital, Wuhan, China
| | - Richard Dobson
- Institute of Health Informatics, University College London,
London, United Kingdom
- Health Data Research UK, University College London, London,
United Kingdom
- Department of Biostatistics and Health Informatics, Institute of Psychiatry,
Psychology and Neuroscience, King’s College London, London, United Kingdom
| | - Bruce Guthrie
- Centre for Population Health Sciences, Usher Institute, University of
Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
18
|
Carr E, Bendayan R, Bean D, Stammers M, Wang W, Zhang H, Searle T, Kraljevic Z, Shek A, Phan HTT, Muruet W, Gupta RK, Shinton AJ, Wyatt M, Shi T, Zhang X, Pickles A, Stahl D, Zakeri R, Noursadeghi M, O'Gallagher K, Rogers M, Folarin A, Karwath A, Wickstrøm KE, Köhn-Luque A, Slater L, Cardoso VR, Bourdeaux C, Holten AR, Ball S, McWilliams C, Roguski L, Borca F, Batchelor J, Amundsen EK, Wu X, Gkoutos GV, Sun J, Pinto A, Guthrie B, Breen C, Douiri A, Wu H, Curcin V, Teo JT, Shah AM, Dobson RJB. Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study. BMC Med 2021; 19:23. [PMID: 33472631 PMCID: PMC7817348 DOI: 10.1186/s12916-020-01893-3] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 12/16/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The National Early Warning Score (NEWS2) is currently recommended in the UK for the risk stratification of COVID-19 patients, but little is known about its ability to detect severe cases. We aimed to evaluate NEWS2 for the prediction of severe COVID-19 outcome and identify and validate a set of blood and physiological parameters routinely collected at hospital admission to improve upon the use of NEWS2 alone for medium-term risk stratification. METHODS Training cohorts comprised 1276 patients admitted to King's College Hospital National Health Service (NHS) Foundation Trust with COVID-19 disease from 1 March to 30 April 2020. External validation cohorts included 6237 patients from five UK NHS Trusts (Guy's and St Thomas' Hospitals, University Hospitals Southampton, University Hospitals Bristol and Weston NHS Foundation Trust, University College London Hospitals, University Hospitals Birmingham), one hospital in Norway (Oslo University Hospital), and two hospitals in Wuhan, China (Wuhan Sixth Hospital and Taikang Tongji Hospital). The outcome was severe COVID-19 disease (transfer to intensive care unit (ICU) or death) at 14 days after hospital admission. Age, physiological measures, blood biomarkers, sex, ethnicity, and comorbidities (hypertension, diabetes, cardiovascular, respiratory and kidney diseases) measured at hospital admission were considered in the models. RESULTS A baseline model of 'NEWS2 + age' had poor-to-moderate discrimination for severe COVID-19 infection at 14 days (area under receiver operating characteristic curve (AUC) in training cohort = 0.700, 95% confidence interval (CI) 0.680, 0.722; Brier score = 0.192, 95% CI 0.186, 0.197). A supplemented model adding eight routinely collected blood and physiological parameters (supplemental oxygen flow rate, urea, age, oxygen saturation, C-reactive protein, estimated glomerular filtration rate, neutrophil count, neutrophil/lymphocyte ratio) improved discrimination (AUC = 0.735; 95% CI 0.715, 0.757), and these improvements were replicated across seven UK and non-UK sites. However, there was evidence of miscalibration with the model tending to underestimate risks in most sites. CONCLUSIONS NEWS2 score had poor-to-moderate discrimination for medium-term COVID-19 outcome which raises questions about its use as a screening tool at hospital admission. Risk stratification was improved by including readily available blood and physiological parameters measured at hospital admission, but there was evidence of miscalibration in external sites. This highlights the need for a better understanding of the use of early warning scores for COVID.
Collapse
Affiliation(s)
- Ewan Carr
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, 16 De Crespigny Park, London, SE5 8AF, UK.
| | - Rebecca Bendayan
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
- NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - Daniel Bean
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
- Health Data Research UK London, University College London, London, UK
| | - Matt Stammers
- Clinical Informatics Research Unit, University of Southampton, Coxford Rd., Southampton, SO16 5AF, UK
- NIHR Biomedical Research Centre at University Hospital Southampton NHS Trust, Coxford Road, Southampton, UK
- UHS Digital, University Hospital Southampton, Tremona Road, Southampton, SO16 6YD, UK
| | - Wenjuan Wang
- School of Population Health and Environmental Sciences, King's College London, London, UK
| | - Huayu Zhang
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Thomas Searle
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
- NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - Zeljko Kraljevic
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
| | - Anthony Shek
- Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Hang T T Phan
- Clinical Informatics Research Unit, University of Southampton, Coxford Rd., Southampton, SO16 5AF, UK
- NIHR Biomedical Research Centre at University Hospital Southampton NHS Trust, Coxford Road, Southampton, UK
| | - Walter Muruet
- School of Population Health and Environmental Sciences, King's College London, London, UK
| | - Rishi K Gupta
- UCL Institute for Global Health, University College London Hospitals NHS Trust, London, UK
| | - Anthony J Shinton
- UHS Digital, University Hospital Southampton, Tremona Road, Southampton, SO16 6YD, UK
| | - Mike Wyatt
- University Hospitals Bristol and Weston NHS Foundation Trust, Bristol, UK
| | - Ting Shi
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Xin Zhang
- Department of Pulmonary and Critical Care Medicine, People's Liberation Army Joint Logistic Support Force 920th Hospital, Kunming, Yunnan, China
| | - Andrew Pickles
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
- NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - Daniel Stahl
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
| | - Rosita Zakeri
- King's College Hospital NHS Foundation Trust, London, UK
- School of Cardiovascular Medicine & Sciences, King's College London British Heart Foundation Centre of Excellence, London, SE5 9NU, UK
| | - Mahdad Noursadeghi
- UCL Division of Infection and Immunity, University College London Hospitals NHS Trust, London, UK
| | - Kevin O'Gallagher
- King's College Hospital NHS Foundation Trust, London, UK
- School of Cardiovascular Medicine & Sciences, King's College London British Heart Foundation Centre of Excellence, London, SE5 9NU, UK
| | - Matt Rogers
- University Hospitals Bristol and Weston NHS Foundation Trust, Bristol, UK
| | - Amos Folarin
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
- Health Data Research UK London, University College London, London, UK
- Institute of Health Informatics, University College London, London, UK
- NIHR Biomedical Research Centre at University College London Hospitals NHS Foundation Trust, London, UK
| | - Andreas Karwath
- College of Medical and Dental Sciences, Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Health Data Research UK Midlands, Birmingham, UK
| | - Kristin E Wickstrøm
- Department of Medical Biochemistry, Blood Cell Research Group, Oslo University Hospital, Oslo, Norway
| | - Alvaro Köhn-Luque
- Oslo Centre for Biostatistics and Epidemiology, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Luke Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Health Data Research UK Midlands, Birmingham, UK
| | - Victor Roth Cardoso
- College of Medical and Dental Sciences, Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Health Data Research UK Midlands, Birmingham, UK
| | | | - Aleksander Rygh Holten
- Department of Acute Medicine, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Simon Ball
- Health Data Research UK Midlands, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Chris McWilliams
- Department of Engineering Mathematics, University of Bristol, Bristol, UK
| | - Lukasz Roguski
- Health Data Research UK London, University College London, London, UK
- Institute of Health Informatics, University College London, London, UK
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Florina Borca
- Clinical Informatics Research Unit, University of Southampton, Coxford Rd., Southampton, SO16 5AF, UK
- NIHR Biomedical Research Centre at University Hospital Southampton NHS Trust, Coxford Road, Southampton, UK
- UHS Digital, University Hospital Southampton, Tremona Road, Southampton, SO16 6YD, UK
| | - James Batchelor
- Clinical Informatics Research Unit, University of Southampton, Coxford Rd., Southampton, SO16 5AF, UK
| | - Erik Koldberg Amundsen
- Department of Medical Biochemistry, Blood Cell Research Group, Oslo University Hospital, Oslo, Norway
| | - Xiaodong Wu
- Department of Pulmonary and Critical Care Medicine, Shanghai East Hospital, Tongji University, Shanghai, China
- Department of Pulmonary and Critical Care Medicine, Taikang Tongji Hospital, Wuhan, China
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Health Data Research UK Midlands, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Jiaxing Sun
- Department of Pulmonary and Critical Care Medicine, Shanghai East Hospital, Tongji University, Shanghai, China
| | - Ashwin Pinto
- UHS Digital, University Hospital Southampton, Tremona Road, Southampton, SO16 6YD, UK
| | - Bruce Guthrie
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Cormac Breen
- School of Population Health and Environmental Sciences, King's College London, London, UK
| | - Abdel Douiri
- School of Population Health and Environmental Sciences, King's College London, London, UK
| | - Honghan Wu
- Health Data Research UK London, University College London, London, UK
- Institute of Health Informatics, University College London, London, UK
| | - Vasa Curcin
- School of Population Health and Environmental Sciences, King's College London, London, UK
| | - James T Teo
- Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- King's College Hospital NHS Foundation Trust, London, UK
| | - Ajay M Shah
- King's College Hospital NHS Foundation Trust, London, UK
- School of Cardiovascular Medicine & Sciences, King's College London British Heart Foundation Centre of Excellence, London, SE5 9NU, UK
| | - Richard J B Dobson
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, 16 De Crespigny Park, London, SE5 8AF, UK
- NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
- Health Data Research UK London, University College London, London, UK
- Institute of Health Informatics, University College London, London, UK
- NIHR Biomedical Research Centre at University College London Hospitals NHS Foundation Trust, London, UK
| |
Collapse
|
19
|
Escher S, Mangelsdorf I, Hoffmann-Doerr S, Partosch F, Karwath A, Schroeder K, Zapf A, Batke M. Time extrapolation in regulatory risk assessment: The impact of study differences on the extrapolation factors. Regul Toxicol Pharmacol 2020; 112:104584. [DOI: 10.1016/j.yrtph.2020.104584] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 01/13/2020] [Accepted: 01/15/2020] [Indexed: 10/25/2022]
|
20
|
Althubaiti S, Karwath A, Dallol A, Noor A, Alkhayyat SS, Alwassia R, Mineta K, Gojobori T, Beggs AD, Schofield PN, Gkoutos GV, Hoehndorf R. Ontology-based prediction of cancer driver genes. Sci Rep 2019; 9:17405. [PMID: 31757986 PMCID: PMC6874647 DOI: 10.1038/s41598-019-53454-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 10/29/2019] [Indexed: 01/05/2023] Open
Abstract
Identifying and distinguishing cancer driver genes among thousands of candidate mutations remains a major challenge. Accurate identification of driver genes and driver mutations is critical for advancing cancer research and personalizing treatment based on accurate stratification of patients. Due to inter-tumor genetic heterogeneity many driver mutations within a gene occur at low frequencies, which make it challenging to distinguish them from non-driver mutations. We have developed a novel method for identifying cancer driver genes. Our approach utilizes multiple complementary types of information, specifically cellular phenotypes, cellular locations, functions, and whole body physiological phenotypes as features. We demonstrate that our method can accurately identify known cancer driver genes and distinguish between their role in different types of cancer. In addition to confirming known driver genes, we identify several novel candidate driver genes. We demonstrate the utility of our method by validating its predictions in nasopharyngeal cancer and colorectal cancer using whole exome and whole genome sequencing.
Collapse
Affiliation(s)
- Sara Althubaiti
- Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Andreas Karwath
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, B15 2TT, Birmingham, United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, B15 2TT, Birmingham, United Kingdom
| | - Ashraf Dallol
- Centre of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Adeeb Noor
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 80221, Saudi Arabia
| | | | - Rolina Alwassia
- Radiation Oncology Unit, King Abdulaziz University Hospital, Jeddah, Saudi Arabia
| | - Katsuhiko Mineta
- Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Takashi Gojobori
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Andrew D Beggs
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, B15 2TT, Birmingham, United Kingdom
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, CB2 3EG, Cambridge, United Kingdom
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, B15 2TT, Birmingham, United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, B15 2TT, Birmingham, United Kingdom
- NIHR Experimental Cancer Medicine Centre, B15 2TT, Birmingham, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, B15 2TT, Birmingham, UK
- MRC Health Data Research UK (HDR UK) Midlands, Birmingham, United Kingdom
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
21
|
Geilke M, Karwath A, Frank E, Kramer S. Online estimation of discrete, continuous, and conditional joint densities using classifier chains. Data Min Knowl Discov 2017. [DOI: 10.1007/s10618-017-0546-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
22
|
Karwath A, Hubrich M, Kramer S. Convolutional Neural Networks for the Identification of Regions of Interest in PET Scans: A Study of Representation Learning for Diagnosing Alzheimer’s Disease. Artif Intell Med 2017. [DOI: 10.1007/978-3-319-59758-4_36] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
23
|
Abstract
Background Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking. Results We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of small molecules in virtual 3D space. The present work describes the new functionalities in CheS-Mapper 2.0, that facilitate the analysis of (Q)SAR information and allows the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. The approach is generic: It is model-independent and can handle physico-chemical and structural input features as well as quantitative and qualitative endpoints. Conclusions Visual validation with CheS-Mapper enables analyzing (Q)SAR information in the data and indicates how this information is employed by the (Q)SAR model. It reveals, if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org. Graphical abstract Comparing actual and predicted activity values with CheS-Mapper.
Collapse
|
24
|
Gütlein M, Helma C, Karwath A, Kramer S. Erratum: A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR. Mol Inform 2013. [DOI: 10.1002/minf.201380841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
25
|
Gütlein M, Helma C, Karwath A, Kramer S. A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR. Mol Inform 2013; 32:516-28. [PMID: 27481669 DOI: 10.1002/minf.201200134] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2012] [Accepted: 03/08/2013] [Indexed: 01/13/2023]
Abstract
(Q)SAR model validation is essential to ensure the quality of inferred models and to indicate future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to accept the (Q)SAR model, and to approve its use in real world scenarios as alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model, in particular whether to employ variants of cross-validation or external test set validation, is still under discussion. In this paper, we empirically compare a k-fold cross-validation with external test set validation. To this end we introduce a workflow allowing to realistically simulate the common problem setting of building predictive models for relatively small datasets. The workflow allows to apply the built and validated models on large amounts of unseen data, and to compare the performance of the different validation approaches. The experimental results indicate that cross-validation produces higher performant (Q)SAR models than external test set validation, reduces the variance of the results, while at the same time underestimates the performance on unseen compounds. The experimental results reported in this paper suggest that, contrary to current conception in the community, cross-validation may play a significant role in evaluating the predictivity of (Q)SAR models.
Collapse
Affiliation(s)
- Martin Gütlein
- Institute for Physics, Albert-Ludwigs-Universität Freiburg, Hermann Herder Str. 3, D-79104 Freiburg.
| | | | - Andreas Karwath
- Information Systems, Institut für Informatik, Johannes Gutenberg Universität Mainz, Staudingerweg 9, D-55128 Mainz
| | - Stefan Kramer
- Information Systems, Institut für Informatik, Johannes Gutenberg Universität Mainz, Staudingerweg 9, D-55128 Mainz
| |
Collapse
|
26
|
Abstract
Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kind of features, like structural fragments as well as quantitative chemical descriptors. These features can be highlighted within CheS-Mapper, which aids the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. As a final function, the tool can also be used to select and export specific subsets of a given dataset for further analysis.
Collapse
Affiliation(s)
- Martin Gütlein
- Department of Computer Science, Albert-Ludwigs-Universität Freiburg, Freiburg im Breisgau, Germany
| | - Andreas Karwath
- Department of Computer Science, Albert-Ludwigs-Universität Freiburg, Freiburg im Breisgau, Germany
| | - Stefan Kramer
- Institute for Computer Science, Johannes Gutenberg-Universität Mainz, Mainz, Germany
| |
Collapse
|
27
|
|
28
|
Hardy B, Douglas N, Helma C, Rautenberg M, Jeliazkova N, Jeliazkov V, Nikolova I, Benigni R, Tcheremenskaia O, Kramer S, Girschick T, Buchwald F, Wicker J, Karwath A, Gütlein M, Maunz A, Sarimveis H, Melagraki G, Afantitis A, Sopasakis P, Gallagher D, Poroikov V, Filimonov D, Zakharov A, Lagunin A, Gloriozova T, Novikov S, Skvortsova N, Druzhilovsky D, Chawla S, Ghosh I, Ray S, Patel H, Escher S. Collaborative development of predictive toxicology applications. J Cheminform 2010; 2:7. [PMID: 20807436 PMCID: PMC2941473 DOI: 10.1186/1758-2946-2-7] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Accepted: 08/31/2010] [Indexed: 12/02/2022] Open
Abstract
OpenTox provides an interoperable, standards-based Framework for the support of predictive toxicology data management, algorithms, modelling, validation and reporting. It is relevant to satisfying the chemical safety assessment requirements of the REACH legislation as it supports access to experimental data, (Quantitative) Structure-Activity Relationship models, and toxicological information through an integrating platform that adheres to regulatory requirements and OECD validation principles. Initial research defined the essential components of the Framework including the approach to data access, schema and management, use of controlled vocabularies and ontologies, architecture, web service and communications protocols, and selection and integration of algorithms for predictive modelling. OpenTox provides end-user oriented tools to non-computational specialists, risk assessors, and toxicological experts in addition to Application Programming Interfaces (APIs) for developers of new applications. OpenTox actively supports public standards for data representation, interfaces, vocabularies and ontologies, Open Source approaches to core platform components, and community-based collaboration approaches, so as to progress system interoperability goals.The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation.Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way.
Collapse
Affiliation(s)
- Barry Hardy
- Douglas Connect, Baermeggenweg 14, 4314 Zeiningen, Switzerland
| | - Nicki Douglas
- Douglas Connect, Baermeggenweg 14, 4314 Zeiningen, Switzerland
| | - Christoph Helma
- In silico Toxicology, Altkircher Str. 4 CH-4052 Basel, Switzerland
| | - Micha Rautenberg
- In silico Toxicology, Altkircher Str. 4 CH-4052 Basel, Switzerland
| | | | | | | | - Romualdo Benigni
- Istituto Superiore di Sanità, Environment and Health Department, Istituto Superiore di Sanita', Viale Regina Elena 299, Rome 00161, Italy
| | - Olga Tcheremenskaia
- Istituto Superiore di Sanità, Environment and Health Department, Istituto Superiore di Sanita', Viale Regina Elena 299, Rome 00161, Italy
| | - Stefan Kramer
- Technical University of Munich, Technische Universität München, Arcisstr. 21, 80333 München, Germany
| | - Tobias Girschick
- Technical University of Munich, Technische Universität München, Arcisstr. 21, 80333 München, Germany
| | - Fabian Buchwald
- Technical University of Munich, Technische Universität München, Arcisstr. 21, 80333 München, Germany
| | - Joerg Wicker
- Technical University of Munich, Technische Universität München, Arcisstr. 21, 80333 München, Germany
| | - Andreas Karwath
- Albert-Ludwigs University Freiburg, 79110 Freiburg i.Br., Germany
| | - Martin Gütlein
- Albert-Ludwigs University Freiburg, 79110 Freiburg i.Br., Germany
| | - Andreas Maunz
- Albert-Ludwigs University Freiburg, 79110 Freiburg i.Br., Germany
| | - Haralambos Sarimveis
- National Technical University of Athens, School of Chemical Engineering, Heroon Polytechneiou 9, 15780, Zographou, Athens, Greece
| | - Georgia Melagraki
- National Technical University of Athens, School of Chemical Engineering, Heroon Polytechneiou 9, 15780, Zographou, Athens, Greece
| | - Antreas Afantitis
- National Technical University of Athens, School of Chemical Engineering, Heroon Polytechneiou 9, 15780, Zographou, Athens, Greece
| | - Pantelis Sopasakis
- National Technical University of Athens, School of Chemical Engineering, Heroon Polytechneiou 9, 15780, Zographou, Athens, Greece
| | | | - Vladimir Poroikov
- Institute of Biomedical Chemistry of Russian Academy of Sciences, 119121 Moscow, Russia
| | - Dmitry Filimonov
- Institute of Biomedical Chemistry of Russian Academy of Sciences, 119121 Moscow, Russia
| | - Alexey Zakharov
- Institute of Biomedical Chemistry of Russian Academy of Sciences, 119121 Moscow, Russia
| | - Alexey Lagunin
- Institute of Biomedical Chemistry of Russian Academy of Sciences, 119121 Moscow, Russia
| | - Tatyana Gloriozova
- Institute of Biomedical Chemistry of Russian Academy of Sciences, 119121 Moscow, Russia
| | - Sergey Novikov
- Institute of Biomedical Chemistry of Russian Academy of Sciences, 119121 Moscow, Russia
| | - Natalia Skvortsova
- Institute of Biomedical Chemistry of Russian Academy of Sciences, 119121 Moscow, Russia
| | - Dmitry Druzhilovsky
- Institute of Biomedical Chemistry of Russian Academy of Sciences, 119121 Moscow, Russia
| | - Sunil Chawla
- Seascape Learning, 271 Double Story, New Rajinder Ngr., New Delhi 110060, India
| | - Indira Ghosh
- Jawaharlal Nehru University, New Mehrauli Road, New Delhi 110067, India
| | - Surajit Ray
- Jawaharlal Nehru University, New Mehrauli Road, New Delhi 110067, India
| | - Hitesh Patel
- Jawaharlal Nehru University, New Mehrauli Road, New Delhi 110067, India
| | - Sylvia Escher
- Fraunhofer Institute for Toxicology & Experimental Medicine, Nikolai-Fuchs-Str. 1, 30625 Hannover, Germany
| |
Collapse
|
29
|
Abstract
Most approaches to structure-activity-relationship (SAR) prediction proceed in two steps. In the first step, a typically large set of fingerprints, or fragments of interest, is constructed (either by hand or by some recent data mining techniques). In the second step, machine learning techniques are applied to obtain a predictive model. The result is often not only a highly accurate but also hard to interpret model. In this paper, we demonstrate the capabilities of a novel SAR algorithm, SMIREP, which tightly integrates the fragment and model generation steps and which yields simple models in the form of a small set of IF-THEN rules. These rules contain SMILES fragments, which are easy to understand to the computational chemist. SMIREP combines ideas from the well-known IREP rule learner with a novel fragmentation algorithm for SMILES strings. SMIREP has been evaluated on three problems: the prediction of binding activities for the estrogen receptor (Environmental Protection Agency's (EPA's) Distributed Structure-Searchable Toxicity (DSSTox) National Center for Toxicological Research estrogen receptor (NCTRER) Database), the prediction of mutagenicity using the carcinogenic potency database (CPDB), and the prediction of biodegradability on a subset of the Environmental Fate Database (EFDB). In these applications, SMIREP has the advantage of producing easily interpretable rules while having predictive accuracies that are comparable to those of alternative state-of-the-art techniques.
Collapse
Affiliation(s)
- Andreas Karwath
- Institut für Informatik, Albert-Ludwigs Universtität Freiburg, Georges-Köhler-Allee 079, D-79110 Freiburg, Germany.
| | | |
Collapse
|
30
|
Clare A, Karwath A, Ougham H, King RD. Functional bioinformatics for Arabidopsis thaliana. Bioinformatics 2006. [DOI: 10.1093/bioinformatics/btl169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
31
|
Abstract
MOTIVATION The genome of Arabidopsis thaliana, which has the best understood plant genome, still has approximately one-third of its genes with no functional annotation at all from either MIPS or TAIR. We have applied our Data Mining Prediction (DMP) method to the problem of predicting the functional classes of these protein sequences. This method is based on using a hybrid machine-learning/data-mining method to identify patterns in the bioinformatic data about sequences that are predictive of function. We use data about sequence, predicted secondary structure, predicted structural domain, InterPro patterns, sequence similarity profile and expressions data. RESULTS We predicted the functional class of a high percentage of the Arabidopsis genes with currently unknown function. These predictions are interpretable and have good test accuracies. We describe in detail seven of the rules produced.
Collapse
Affiliation(s)
- A Clare
- Department of Computer Science, University of Wales Aberystwyth SY23 3DB, UK.
| | | | | | | |
Collapse
|
32
|
Karwath A, King RD. Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics 2002; 3:11. [PMID: 11972320 PMCID: PMC107726 DOI: 10.1186/1471-2105-3-11] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2001] [Accepted: 04/23/2002] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The inference of homology between proteins is a key problem in molecular biology The current best approaches only identify approximately 50% of homologies (with a false positive rate set at 1/1000). RESULTS We present Homology Induction (HI), a new approach to inferring homology. HI uses machine learning to bootstrap from standard sequence similarity search methods. First a standard method is run, then HI learns rules which are true for sequences of high similarity to the target (assumed homologues) and not true for general sequences, these rules are then used to discriminate sequences in the twilight zone. To learn the rules HI describes the sequences in a novel way based on a bioinformatic knowledge base, and the machine learning method of inductive logic programming. To evaluate HI we used the PDB40D benchmark which lists sequences of known homology but low sequence similarity. We compared the HI methodology with PSI-BLAST alone and found HI performed significantly better. In addition, Receiver Operating Characteristic (ROC) curve analysis showed that these improvements were robust for all reasonable error costs. The predictive homology rules learnt by HI by can be interpreted biologically to provide insight into conserved features of homologous protein families. CONCLUSIONS HI is a new technique for the detection of remote protein homology--a central bioinformatic problem. HI with PSI-BLAST is shown to outperform PSI-BLAST for all error costs. It is expect that similar improvements would be obtained using HI with any sequence similarity method.
Collapse
Affiliation(s)
- Andreas Karwath
- Department of Computer Sciences, University of Wales, Aberystwyth, SY23 3DB, UK
| | - Ross D King
- Department of Computer Sciences, University of Wales, Aberystwyth, SY23 3DB, UK
| |
Collapse
|
33
|
Abstract
MOTIVATION Data Mining Prediction (DMP) is a novel approach to predicting protein functional class from sequence. DMP works even in the absence of a homologous protein of known function. We investigate the utility of different ways of representing protein sequence in DMP (residue frequencies, phylogeny, predicted structure) using the Escherichia coli genome as a model. RESULTS Using the different representations DMP learnt prediction rules that were more accurate than default at every level of function using every type of representation. The most effective way to represent sequence was using phylogeny (75% accuracy and 13% coverage of unassigned ORFs at the most general level of function: 69% accuracy and 7% coverage at the most detailed). We tested different methods for combining predictions from the different types of representation. These improved both the accuracy and coverage of predictions, e.g. 40% of all unassigned ORFs could be predicted at an estimated accuracy of 60% and 5% of unassigned ORFs could be predicted at an estimated accuracy of 86%.
Collapse
Affiliation(s)
- R D King
- Department of Computer Science, University of Wales, Aberystwyth, Penglais, Aberystwyth, Ceredigion SY23 3DB, Wales, UK.
| | | | | | | |
Collapse
|
34
|
King RD, Karwath A, Clare A, Dehaspe L. Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast 2000; 17. [PMID: 11119305 PMCID: PMC2448385 DOI: 10.1002/1097-0061(200012)17:4<283::aid-yea52>3.0.co;2-f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli.
Collapse
Affiliation(s)
- Ross D. King
- Department of Computer ScienceUniversity of WalesAberystwyth, Penglais AberystwythCeredigionSY23 3DBUK
| | - Andreas Karwath
- Department of Computer ScienceUniversity of WalesAberystwyth, Penglais AberystwythCeredigionSY23 3DBUK
| | - Amanda Clare
- Department of Computer ScienceUniversity of WalesAberystwyth, Penglais AberystwythCeredigionSY23 3DBUK
| | - Luc Dehaspe
- PharmaDMCelestijnenlaan 200ALeuvenB-3001Belgium
| |
Collapse
|
35
|
King RD, Karwath A, Clare A, Dehaspe L. Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast 2000; 17:283-93. [PMID: 11119305 PMCID: PMC2448385 DOI: 10.1002/1097-0061(200012)17:4<283::aid-yea52>3.0.co;2-f] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli.
Collapse
Affiliation(s)
- R D King
- Department of Computer Science, University of Wales, Aberystwyth, Penglais, Aberystwyth, Ceredigion SY23 3DB, UK
| | | | | | | |
Collapse
|