1
|
Agius R, Riis-Jensen AC, Wimmer B, da Cunha-Bang C, Murray DD, Poulsen CB, Bertelsen MB, Schwartz B, Lundgren JD, Langberg H, Niemann CU. Deployment and validation of the CLL treatment infection model adjoined to an EHR system. NPJ Digit Med 2024; 7:147. [PMID: 38839920 PMCID: PMC11153589 DOI: 10.1038/s41746-024-01132-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 05/08/2024] [Indexed: 06/07/2024] Open
Abstract
Research algorithms are seldom externally validated or integrated into clinical practice, leaving unknown challenges in deployment. In such efforts, one needs to address challenges related to data harmonization, the performance of an algorithm in unforeseen missingness, automation and monitoring of predictions, and legal frameworks. We here describe the deployment of a high-dimensional data-driven decision support model into an EHR and derive practical guidelines informed by this deployment that includes the necessary processes, stakeholders and design requirements for a successful deployment. For this, we describe our deployment of the chronic lymphocytic leukemia (CLL) treatment infection model (CLL-TIM) as a stand-alone platform adjoined to an EPIC-based Danish Electronic Health Record (EHR), with the presentation of personalized predictions in a clinical context. CLL-TIM is an 84-variable data-driven prognostic model utilizing 7-year medical patient records and predicts the 2-year risk composite outcome of infection and/or treatment post-CLL diagnosis. As an independent validation cohort for this deployment, we used a retrospective population-based cohort of patients diagnosed with CLL from 2018 onwards (n = 1480). Unexpectedly high levels of missingness for key CLL-TIM variables were exhibited upon deployment. High dimensionality, with the handling of missingness, and predictive confidence were critical design elements that enabled trustworthy predictions and thus serves as a priority for prognostic models seeking deployment in new EHRs. Our setup for deployment, including automation and monitoring into EHR that meets Medical Device Regulations, may be used as step-by-step guidelines for others aiming at designing and deploying research algorithms into clinical practice.
Collapse
Affiliation(s)
- Rudi Agius
- Department of Hematology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | | | - Bettina Wimmer
- SP Sundhedsdata, The Data Unit, Capital Region of Denmark, Copenhagen, Denmark
| | - Caspar da Cunha-Bang
- Department of Hematology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Daniel Dawson Murray
- Center of Excellence for Health, Immunity, and Infections (CHIP), Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | | | | | - Berit Schwartz
- Rigshospitalets Innoovationscenter, Copenhagen University Hospital Rigshopsitalet, Copenhagen, Denmark
| | - Jens Dilling Lundgren
- Center of Excellence for Health, Immunity, and Infections (CHIP), Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Henning Langberg
- Rigshospitalets Innoovationscenter, Copenhagen University Hospital Rigshopsitalet, Copenhagen, Denmark
| | - Carsten Utoft Niemann
- Department of Hematology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark.
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
2
|
Saravana Kumar K, Ramasubramanian S. A clinical decision support system for heart disease prediction with ensemble two-fold classification framework. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-221165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Cardiovascular disease (CVD) is a severe public health concern globally. Early and accurate CVD diagnosis is a difficult task but a necessary endeavour required to prevent further damage and protect patients’ lives. Machine Learning (ML)-based Clinical Decision Support Systems (CDSS) have the potential to assist healthcare providers in making accurate CVD diagnoses and treatments. Clinical data usually contains missing values (MVs); hence, the incorporated imputation techniques for ML have become a critical consideration when working with real-world medical datasets. Furthermore, removing instances with MVs will lead to essential data loss and produce incorrect results. To overcome these issues, this paper proposes an efficient and reliable CDSS with Ensemble Two-Fold Classification (ETC) framework for classifying heart diseases. The effectiveness of the proposed ETC framework using different supervised ML algorithms is evaluated with four distinct imputation methods for handling MVs over the standard benchmark dataset, viz., the University of California, Irwin (UCI). Experimental results show that our proposed ETC framework with the k-Nearest Neighbors(k-NN) imputation method achieves better classification accuracy of 0.9999 and a lesser error rate of 0.0989 compared to other imputation methods and classifiers with similar execution times.
Collapse
Affiliation(s)
- K. Saravana Kumar
- IT Department, UCE(BIT CAMPUS), Anna University, Trichy, TamilNadu, India
| | - S. Ramasubramanian
- Maths Department, UCE(BIT CAMPUS), Anna University, Trichy, TamilNadu, India
| |
Collapse
|
3
|
Prabhakar SK, Rajaguru H, Kim C, Won DO. A Fusion-Based Technique With Hybrid Swarm Algorithm and Deep Learning for Biosignal Classification. Front Hum Neurosci 2022; 16:895761. [PMID: 35721347 PMCID: PMC9203681 DOI: 10.3389/fnhum.2022.895761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 05/02/2022] [Indexed: 12/02/2022] Open
Abstract
The vital data about the electrical activities of the brain are carried by the electroencephalography (EEG) signals. The recordings of the electrical activity of brain neurons in a rhythmic and spontaneous manner from the scalp surface are measured by EEG. One of the most important aspects in the field of neuroscience and neural engineering is EEG signal analysis, as it aids significantly in dealing with the commercial applications as well. To uncover the highly useful information for neural classification activities, EEG studies incorporated with machine learning provide good results. In this study, a Fusion Hybrid Model (FHM) with Singular Value Decomposition (SVD) Based Estimation of Robust Parameters is proposed for efficient feature extraction of the biosignals and to understand the essential information it has for analyzing the brain functionality. The essential features in terms of parameter components are extracted using the developed hybrid model, and a specialized hybrid swarm technique called Hybrid Differential Particle Artificial Bee (HDPAB) algorithm is proposed for feature selection. To make the EEG more practical and to be used in a plethora of applications, the robust classification of these signals is necessary thereby relying less on the trained professionals. Therefore, the classification is done initially using the proposed Zero Inflated Poisson Mixture Regression Model (ZIPMRM) and then it is also classified with a deep learning methodology, and the results are compared with other standard machine learning techniques. This proposed flow of methodology is validated on a few standard Biosignal datasets, and finally, a good classification accuracy of 98.79% is obtained for epileptic dataset and 98.35% is obtained for schizophrenia dataset.
Collapse
Affiliation(s)
- Sunil Kumar Prabhakar
- Department of Artificial Intelligence Convergence, Hallym University, Chuncheon, South Korea
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam, India
| | - Chulho Kim
- Department of Neurology, Chuncheon Sacred Heart Hospital, Chuncheon, South Korea
| | - Dong-Ok Won
- Department of Artificial Intelligence Convergence, Hallym University, Chuncheon, South Korea
- *Correspondence: Dong-Ok Won,
| |
Collapse
|
4
|
Yu TH, Su BH, Battalora LC, Liu S, Tseng YJ. Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power. Brief Bioinform 2022; 23:bbab377. [PMID: 34530437 PMCID: PMC8769704 DOI: 10.1093/bib/bbab377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 07/30/2021] [Accepted: 08/23/2021] [Indexed: 12/28/2022] Open
Abstract
The trade-off between a machine learning (ML) and deep learning (DL) model's predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure-activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provide structural insights due to their black box-like nature. Lack of interpretability and further to provide easy simple rules would be challenging for CNS-QSAR models. To address these issues, we develop a protocol to combine the power of ML and DL to generate a set of simple rules that are easy to interpret with high prediction power. A data set of 940 market drugs (315 CNS-active, 625 CNS-inactive) with support vector machine and graph convolutional network algorithms were used. Individual ML/DL modeling methods were also constructed for comparison. The performance of these models was evaluated using an additional external dataset of 117 market drugs (42 CNS-active, 75 CNS-inactive). Fingerprint-split validation was adopted to ensure model stringency and generalizability. The resulting novel hybrid ensemble model outperformed other constituent traditional QSAR models with an accuracy of 0.96 and an F1 score of 0.95. With the power of the interpretability provided with this protocol, our model laid down a set of simple physicochemical rules to determine whether a compound can be a CNS drug using six sub-structural features. These rules displayed higher classification ability than classical guidelines, with higher specificity and more mechanistic insights than just for blood-brain barrier permeability. This hybrid protocol can potentially be used for other drug property predictions.
Collapse
Affiliation(s)
- Tzu-Hui Yu
- National Taiwan University in Bio-Industry Communication and Development, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106
| | - Bo-Han Su
- Department of Computer Science and Information Engineering of National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106
| | | | - Sin Liu
- Graduate Institute of Biomedical Electronics and Bioinformatics of National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106
| | - Yufeng Jane Tseng
- Graduate Institute of Biomedical Electronics and Bioinformatics, Department of Computer Science and Information Engineering and School of Pharmacy at National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106
| |
Collapse
|
5
|
Machine learning can identify newly diagnosed patients with CLL at high risk of infection. Nat Commun 2020; 11:363. [PMID: 31953409 PMCID: PMC6969150 DOI: 10.1038/s41467-019-14225-8] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 12/11/2019] [Indexed: 12/14/2022] Open
Abstract
Infections have become the major cause of morbidity and mortality among patients with chronic lymphocytic leukemia (CLL) due to immune dysfunction and cytotoxic CLL treatment. Yet, predictive models for infection are missing. In this work, we develop the CLL Treatment-Infection Model (CLL-TIM) that identifies patients at risk of infection or CLL treatment within 2 years of diagnosis as validated on both internal and external cohorts. CLL-TIM is an ensemble algorithm composed of 28 machine learning algorithms based on data from 4,149 patients with CLL. The model is capable of dealing with heterogeneous data, including the high rates of missing data to be expected in the real-world setting, with a precision of 72% and a recall of 75%. To address concerns regarding the use of complex machine learning algorithms in the clinic, for each patient with CLL, CLL-TIM provides explainable predictions through uncertainty estimates and personalized risk factors. Chronic lymphocytic leukemia is an indolent disease, and many patients succumb to infection rather than the direct effects of the disease. Here, the authors use medical records and machine learning to predict the patients that may be at risk of infection, which may enable a change in the course of their treatment.
Collapse
|
6
|
Elshatoury H, Avots E, Anbarjafari G. Volumetric Histogram-Based Alzheimer's Disease Detection Using Support Vector Machine. J Alzheimers Dis 2019; 72:515-524. [PMID: 31609690 DOI: 10.3233/jad-190704] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this research work, machine learning techniques are used to classify magnetic resonance imaging brain scans of people with Alzheimer's disease. This work deals with binary classification between Alzheimer's disease and cognitively normal. Supervised learning algorithms were used to train classifiers in which the accuracies are being compared. The database used is from The Alzheimer's Disease Neuroimaging Initiative (ADNI). Histogram is used for all slices of all images. Based on the highest performance, specific slices were selected for further examination. Majority voting and weighted voting is applied in which the accuracy is calculated and the best result is 69.5% for majority voting.
Collapse
Affiliation(s)
- Heba Elshatoury
- iCV Research Lab, Institute of Technology, University of Tartu, Tartu, Estonia
| | - Egils Avots
- iCV Research Lab, Institute of Technology, University of Tartu, Tartu, Estonia
| | - Gholamreza Anbarjafari
- iCV Research Lab, Institute of Technology, University of Tartu, Tartu, Estonia.,Department of Electrical and Electronic Engineering, Hasan Kalyoncu University, Gaziantep, Turkey
| | | |
Collapse
|
7
|
Pesaranghader A, Viktor H, Paquet E. Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams. Mach Learn 2018. [DOI: 10.1007/s10994-018-5719-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|