1
|
Saleem MA, Javeed A, Akarathanawat W, Chutinet A, Suwanwela NC, Kaewplung P, Chaitusaney S, Deelertpaiboon S, Srisiri W, Benjapolakul W. An intelligent learning system based on electronic health records for unbiased stroke prediction. Sci Rep 2024; 14:23052. [PMID: 39367027 PMCID: PMC11452373 DOI: 10.1038/s41598-024-73570-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 09/18/2024] [Indexed: 10/06/2024] Open
Abstract
Stroke has a negative impact on people's lives and is one of the leading causes of death and disability worldwide. Early detection of symptoms can significantly help predict stroke and promote a healthy lifestyle. Researchers have developed several methods to predict strokes using machine learning (ML) techniques. However, the proposed systems have suffered from the following two main problems. The first problem is that the machine learning models are biased due to the uneven distribution of classes in the dataset. Recent research has not adequately addressed this problem, and no preventive measures have been taken. Synthetic Minority Oversampling (SMOTE) has been used to remove bias and balance the training of the proposed ML model. The second problem is to solve the problem of lower classification accuracy of machine learning models. We proposed a learning system that combines an autoencoder with a linear discriminant analysis (LDA) model to increase the accuracy of the proposed ML model for stroke prediction. Relevant features are extracted from the feature space using the autoencoder, and the extracted subset is then fed into the LDA model for stroke classification. The hyperparameters of the LDA model are found using a grid search strategy. However, the conventional accuracy metric does not truly reflect the performance of ML models. Therefore, we employed several evaluation metrics to validate the efficiency of the proposed model. Consequently, we evaluated the proposed model's accuracy, sensitivity, specificity, area under the curve (AUC), and receiver operator characteristic (ROC). The experimental results show that the proposed model achieves a sensitivity and specificity of 98.51% and 97.56%, respectively, with an accuracy of 99.24% and a balanced accuracy of 98.00%.
Collapse
Affiliation(s)
- Muhammad Asim Saleem
- Center of Excellence in Artificial Intelligence, Machine Learning and Smart Grid Technology, Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Ashir Javeed
- Aging Research Center, Karolinska Institutet, 171 65, Stockholm, Sweden
| | - Wasan Akarathanawat
- Division of Neurology, Department of Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, 10330, Thailand
- Chulalongkorn Stroke Center, King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok, 10330, Thailand
- Chula Neuroscience Center, King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Aurauma Chutinet
- Division of Neurology, Department of Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, 10330, Thailand
- Chulalongkorn Stroke Center, King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok, 10330, Thailand
- Chula Neuroscience Center, King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Nijasri Charnnarong Suwanwela
- Division of Neurology, Department of Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, 10330, Thailand
- Chulalongkorn Stroke Center, King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok, 10330, Thailand
- Chula Neuroscience Center, King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Pasu Kaewplung
- Center of Excellence in Artificial Intelligence, Machine Learning and Smart Grid Technology, Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand.
| | - Surachai Chaitusaney
- Center of Excellence in Artificial Intelligence, Machine Learning and Smart Grid Technology, Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Sunchai Deelertpaiboon
- Center of Excellence in Artificial Intelligence, Machine Learning and Smart Grid Technology, Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Wattanasak Srisiri
- Center of Excellence in Artificial Intelligence, Machine Learning and Smart Grid Technology, Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Watit Benjapolakul
- Center of Excellence in Artificial Intelligence, Machine Learning and Smart Grid Technology, Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand.
| |
Collapse
|
2
|
Cai YQ, Gong DX, Tang LY, Cai Y, Li HJ, Jing TC, Gong M, Hu W, Zhang ZW, Zhang X, Zhang GW. Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions. J Med Internet Res 2024; 26:e47645. [PMID: 38869157 PMCID: PMC11316160 DOI: 10.2196/47645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 10/30/2023] [Accepted: 06/12/2024] [Indexed: 06/14/2024] Open
Abstract
In recent years, there has been explosive development in artificial intelligence (AI), which has been widely applied in the health care field. As a typical AI technology, machine learning models have emerged with great potential in predicting cardiovascular diseases by leveraging large amounts of medical data for training and optimization, which are expected to play a crucial role in reducing the incidence and mortality rates of cardiovascular diseases. Although the field has become a research hot spot, there are still many pitfalls that researchers need to pay close attention to. These pitfalls may affect the predictive performance, credibility, reliability, and reproducibility of the studied models, ultimately reducing the value of the research and affecting the prospects for clinical application. Therefore, identifying and avoiding these pitfalls is a crucial task before implementing the research. However, there is currently a lack of a comprehensive summary on this topic. This viewpoint aims to analyze the existing problems in terms of data quality, data set characteristics, model design, and statistical methods, as well as clinical implications, and provide possible solutions to these problems, such as gathering objective data, improving training, repeating measurements, increasing sample size, preventing overfitting using statistical methods, using specific AI algorithms to address targeted issues, standardizing outcomes and evaluation criteria, and enhancing fairness and replicability, with the goal of offering reference and assistance to researchers, algorithm developers, policy makers, and clinical practitioners.
Collapse
Affiliation(s)
- Yu-Qing Cai
- The First Hospital of China Medical University, Shenyang, China
| | - Da-Xin Gong
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| | - Li-Ying Tang
- The First Hospital of China Medical University, Shenyang, China
| | - Yue Cai
- The First Hospital of China Medical University, Shenyang, China
| | - Hui-Jun Li
- Shenyang Medical & Film Science and Technology Co, Ltd, Shenyang, China
| | - Tian-Ci Jing
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| | | | - Wei Hu
- Bayi Orthopedic Hospital, Chengdu, China
| | - Zhen-Wei Zhang
- China Rongtong Medical & Healthcare Co, Ltd, Chengdu, China
| | - Xingang Zhang
- Department of Cardiology, The First Hospital of China Medical University, Shenyang, China
| | - Guang-Wei Zhang
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| |
Collapse
|
3
|
Islam U, Mehmood G, Al-Atawi AA, Khan F, Alwageed HS, Cascone L. NeuroHealth guardian: A novel hybrid approach for precision brain stroke prediction and healthcare analytics. J Neurosci Methods 2024; 409:110210. [PMID: 38968974 DOI: 10.1016/j.jneumeth.2024.110210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 06/14/2024] [Accepted: 06/28/2024] [Indexed: 07/07/2024]
Abstract
Stroke is a severe illness, that requires early stroke detection and intervention, as this would help prevent the worsening of the condition. The research is done to solve stroke prediction problem, which may be divided into a number of sub-problems such as an individual's predisposition to develop stroke. To attain this objective, a multiturn dataset consisting of various health features, such as age, gender, hypertension, and glucose levels, takes a central role. A multiple approach was put forward concentrating on integrating the machine learning techniques, such as Logistic Regression, Naive Bayes, K-Nearest Neighbors, and Support Vector Machine (SV), together to develop an ensemble machine called Neuro-Health Guardian. The hypothesis "Neuro-Health Guardian Model" integrates these algorithms into one, purported to make stroke prediction more accurate. The topic dives into each instance of preparation of data for analysis, data visualization techniques, selection of the right model, training, testing, ensembling, evaluation, and prediction. The models are validated with error rate accounted from their accuracy, precision, recall, F1 score, and finally confusion matrices for a look. The study's result is showing that the ensemble model that combines the multiple algorithms has the edge over them and this is evidently by the fact that it can predict stroke rises. Additionally, accuracy, precision, recall, and F1 scores are measured in all models and the comparison is done to provide a clear comparison of the models' performance. In short, the article presented the formation of the ongoing stroke prediction that revealed the ensemble model as a good anticipation. Precise stroke predisposition forecasting can assist in early intervention thereby preventing stroke-related deaths, and limiting disability burden by stroke. The conclusions that have come out of this study offer a great action item for the development of predictive models related to stroke prevention and treatment.
Collapse
Affiliation(s)
- Umar Islam
- Department of Computer Science IQRA National University, Swat Campus, Pakistan
| | - Gulzar Mehmood
- Department of Computer Science IQRA National University, Swat Campus, Pakistan
| | - Abdullah A Al-Atawi
- Department of Computer Science, Applied College, University of Tabuk, Tabuk 47512, Saudi Arabia
| | - Faheem Khan
- Department of Computer Engineering, Gachon University, Seongnam-si 13120, South Korea.
| | | | - Lucia Cascone
- Department of Computer Science, University of Salerno, Fisciano, Italy
| |
Collapse
|
4
|
Kanning JP, van Os HJA, Rakers M, Wermer MJH, Geerlings MI, Ruigrok YM. Prediction of aneurysmal subarachnoid hemorrhage in comparison with other stroke types using routine care data. PLoS One 2024; 19:e0303868. [PMID: 38820263 PMCID: PMC11142441 DOI: 10.1371/journal.pone.0303868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 05/01/2024] [Indexed: 06/02/2024] Open
Abstract
Aneurysmal subarachnoid hemorrhage (aSAH) can be prevented by early detection and treatment of intracranial aneurysms in high-risk individuals. We investigated whether individuals at high risk of aSAH in the general population can be identified by developing an aSAH prediction model with electronic health records (EHR) data. To assess the aSAH model's relative performance, we additionally developed prediction models for acute ischemic stroke (AIS) and intracerebral hemorrhage (ICH) and compared the discriminative performance of the models. We included individuals aged ≥35 years without history of stroke from a Dutch routine care database (years 2007-2020) and defined outcomes aSAH, AIS and ICH using International Classification of Diseases (ICD) codes. Potential predictors included sociodemographic data, diagnoses, medications, and blood measurements. We cross-validated a Cox proportional hazards model with an elastic net penalty on derivation cohorts and reported the c-statistic and 10-year calibration on validation cohorts. We examined 1,040,855 individuals (mean age 54.6 years, 50.9% women) for a total of 10,173,170 person-years (median 11 years). 17,465 stroke events occurred during follow-up: 723 aSAH, 14,659 AIS, and 2,083 ICH. The aSAH model's c-statistic was 0.61 (95%CI 0.57-0.65), which was lower than the c-statistic of the AIS (0.77, 95%CI 0.77-0.78) and ICH models (0.77, 95%CI 0.75-0.78). All models were well-calibrated. The aSAH model identified 19 predictors, of which the 10 strongest included age, female sex, population density, socioeconomic status, oral contraceptive use, gastroenterological complaints, obstructive airway medication, epilepsy, childbirth complications, and smoking. Discriminative performance of the aSAH prediction model was moderate, while it was good for the AIS and ICH models. We conclude that it is currently not feasible to accurately identify individuals at increased risk for aSAH using EHR data.
Collapse
Affiliation(s)
- Jos P. Kanning
- UMC Utrecht Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Hendrikus J. A. van Os
- Department of Neurology, Leiden University Medical Center, Leiden, The Netherlands
- Department of Public Health & Primary Care and National eHealth Living Lab, Leiden University Medical Center, Leiden, The Netherlands
| | - Margot Rakers
- Department of Public Health & Primary Care and National eHealth Living Lab, Leiden University Medical Center, Leiden, The Netherlands
| | - Marieke J. H. Wermer
- Department of Neurology, Leiden University Medical Center, Leiden, The Netherlands
- Department of Neurology, University Medical Center Groningen, Groningen, The Netherlands
| | - Mirjam I. Geerlings
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Department of General Practice, Amsterdam UMC, Location University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health, Aging & Later life, and Personalized Medicine, Amsterdam, The Netherlands
- Amsterdam Neuroscience, Neurodegeneration, and Mood, Anxiety, Psychosis, Stress, and Sleep, Amsterdam, The Netherlands
| | - Ynte M. Ruigrok
- UMC Utrecht Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
5
|
Zhang Y, Yu M, Tong C, Zhao Y, Han J. CA-UNet Segmentation Makes a Good Ischemic Stroke Risk Prediction. Interdiscip Sci 2024; 16:58-72. [PMID: 37626263 DOI: 10.1007/s12539-023-00583-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 08/27/2023]
Abstract
Stroke is still the World's second major factor of death, as well as the third major factor of death and disability. Ischemic stroke is a type of stroke, in which early detection and treatment are the keys to preventing ischemic strokes. However, due to the limitation of privacy protection and labeling difficulties, there are only a few studies on the intelligent automatic diagnosis of stroke or ischemic stroke, and the results are unsatisfactory. Therefore, we collect some data and propose a 3D carotid Computed Tomography Angiography (CTA) image segmentation model called CA-UNet for fully automated extraction of carotid arteries. We explore the number of down-sampling times applicable to carotid segmentation and design a multi-scale loss function to resolve the loss of detailed features during the process of down-sampling. Moreover, based on CA-Unet, we propose an ischemic stroke risk prediction model to predict the risk in patients using their 3D CTA images, electronic medical records, and medical history. We have validated the efficacy of our segmentation model and prediction model through comparison tests. Our method can provide reliable diagnoses and results that benefit patients and medical professionals.
Collapse
Affiliation(s)
- Yuqi Zhang
- School of Computer Science and Engineering, Beihang University, Beijing, China
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Mengbo Yu
- School of Computer Science and Engineering, Beihang University, Beijing, China
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Chao Tong
- School of Computer Science and Engineering, Beihang University, Beijing, China.
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China.
| | - Yanqing Zhao
- Department of Interventional Radiology and Vascular Surgery, Peking University Third Hospital, Beijing, China
| | - Jintao Han
- Department of Interventional Radiology and Vascular Surgery, Peking University Third Hospital, Beijing, China
| |
Collapse
|
6
|
Pungitore S, Subbian V. Assessment of Prediction Tasks and Time Window Selection in Temporal Modeling of Electronic Health Record Data: a Systematic Review. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2023; 7:313-331. [PMID: 37637723 PMCID: PMC10449760 DOI: 10.1007/s41666-023-00143-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 04/12/2023] [Accepted: 07/28/2023] [Indexed: 08/29/2023]
Abstract
Temporal electronic health record (EHR) data are often preferred for clinical prediction tasks because they offer more complete representations of a patient's pathophysiology than static data. A challenge when working with temporal EHR data is problem formulation, which includes defining the time windows of interest and the prediction task. Our objective was to conduct a systematic review that assessed the definition and reporting of concepts relevant to temporal clinical prediction tasks. We searched PubMed® and IEEE Xplore® databases for studies from January 1, 2010 applying machine learning models to EHR data for patient outcome prediction. Publications applying time-series methods were selected for further review. We identified 92 studies and summarized them by clinical context and definition and reporting of the prediction problem. For the time windows of interest, 12 studies did not discuss window lengths, 57 used a single set of window lengths, and 23 evaluated the relationship between window length and model performance. We also found that 72 studies had appropriate reporting of the prediction task. However, evaluation of prediction problem formulation for temporal EHR data was complicated by heterogeneity in assessing and reporting of these concepts. Even among studies modeling similar clinical outcomes, there were variations in terminology used to describe the prediction problem, rationale for window lengths, and determination of the outcome of interest. As temporal modeling using EHR data expands, minimal reporting standards should include time-series specific concerns to promote rigor and reproducibility in future studies and facilitate model implementation in clinical settings. Supplementary Information The online version contains supplementary material available at 10.1007/s41666-023-00143-4.
Collapse
Affiliation(s)
- Sarah Pungitore
- Program in Applied Mathematics, Department of Mathematics, 617 N Santa Rita Ave, Tucson, AZ 85721 USA
| | - Vignesh Subbian
- Department of Biomedical Engineering, The University of Arizona, Tucson, AZ 85721-0020 USA
- Department of Systems and Industrial Engineering, The University of Arizona, Tucson, AZ 85721-0020 USA
| |
Collapse
|
7
|
Karpov OE, Pitsik EN, Kurkin SA, Maksimenko VA, Gusev AV, Shusharina NN, Hramov AE. Analysis of Publication Activity and Research Trends in the Field of AI Medical Applications: Network Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:5335. [PMID: 37047950 PMCID: PMC10094658 DOI: 10.3390/ijerph20075335] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 03/17/2023] [Accepted: 03/22/2023] [Indexed: 06/19/2023]
Abstract
Artificial intelligence (AI) has revolutionized numerous industries, including medicine. In recent years, the integration of AI into medical practices has shown great promise in enhancing the accuracy and efficiency of diagnosing diseases, predicting patient outcomes, and personalizing treatment plans. This paper aims at the exploration of the AI-based medicine research using network approach and analysis of existing trends based on PubMed. Our findings are based on the results of PubMed search queries and analysis of the number of papers obtained by the different search queries. Our goal is to explore how are the AI-based methods used in healthcare research, which approaches and techniques are the most popular, and to discuss the potential reasoning behind the obtained results. Using analysis of the co-occurrence network constructed using VOSviewer software, we detected the main clusters of interest in AI-based healthcare research. Then, we proceeded with the thorough analysis of publication activity in various categories of medical AI research, including research on different AI-based methods applied to different types of medical data. We analyzed the results of query processing in the PubMed database over the past 5 years obtained via a specifically designed strategy for generating search queries based on the thorough selection of keywords from different categories of interest. We provide a comprehensive analysis of existing applications of AI-based methods to medical data of different modalities, including the context of various medical fields and specific diseases that carry the greatest danger to the human population.
Collapse
Affiliation(s)
- Oleg E. Karpov
- National Medical and Surgical Center Named after N. I. Pirogov, Ministry of Healthcare of the Russian Federation, 105203 Moscow, Russia
| | - Elena N. Pitsik
- Baltic Center for Neurotechnology and Artificial Intelligence, Immanuel Kant Baltic Federal University, 236041 Kaliningrad, Russia; (E.N.P.); (S.A.K.); (V.A.M.); (N.N.S.)
| | - Semen A. Kurkin
- Baltic Center for Neurotechnology and Artificial Intelligence, Immanuel Kant Baltic Federal University, 236041 Kaliningrad, Russia; (E.N.P.); (S.A.K.); (V.A.M.); (N.N.S.)
| | - Vladimir A. Maksimenko
- Baltic Center for Neurotechnology and Artificial Intelligence, Immanuel Kant Baltic Federal University, 236041 Kaliningrad, Russia; (E.N.P.); (S.A.K.); (V.A.M.); (N.N.S.)
| | - Alexander V. Gusev
- K-Skai LLC, 185031 Petrozavodsk, Russia
- Federal Research Institute for Health Organization and Informatics, 127254 Moscow, Russia
| | - Natali N. Shusharina
- Baltic Center for Neurotechnology and Artificial Intelligence, Immanuel Kant Baltic Federal University, 236041 Kaliningrad, Russia; (E.N.P.); (S.A.K.); (V.A.M.); (N.N.S.)
| | - Alexander E. Hramov
- Baltic Center for Neurotechnology and Artificial Intelligence, Immanuel Kant Baltic Federal University, 236041 Kaliningrad, Russia; (E.N.P.); (S.A.K.); (V.A.M.); (N.N.S.)
| |
Collapse
|
8
|
Chen M, Tan X, Padman R. A Machine Learning Approach to Support Urgent Stroke Triage Using Administrative Data and Social Determinants of Health at Hospital Presentation: Retrospective Study. J Med Internet Res 2023; 25:e36477. [PMID: 36716097 PMCID: PMC9926350 DOI: 10.2196/36477] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 07/17/2022] [Accepted: 12/18/2022] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND The key to effective stroke management is timely diagnosis and triage. Machine learning (ML) methods developed to assist in detecting stroke have focused on interpreting detailed clinical data such as clinical notes and diagnostic imaging results. However, such information may not be readily available when patients are initially triaged, particularly in rural and underserved communities. OBJECTIVE This study aimed to develop an ML stroke prediction algorithm based on data widely available at the time of patients' hospital presentations and assess the added value of social determinants of health (SDoH) in stroke prediction. METHODS We conducted a retrospective study of the emergency department and hospitalization records from 2012 to 2014 from all the acute care hospitals in the state of Florida, merged with the SDoH data from the American Community Survey. A case-control design was adopted to construct stroke and stroke mimic cohorts. We compared the algorithm performance and feature importance measures of the ML models (ie, gradient boosting machine and random forest) with those of the logistic regression model based on 3 sets of predictors. To provide insights into the prediction and ultimately assist care providers in decision-making, we used TreeSHAP for tree-based ML models to explain the stroke prediction. RESULTS Our analysis included 143,203 hospital visits of unique patients, and it was confirmed based on the principal diagnosis at discharge that 73% (n=104,662) of these patients had a stroke. The approach proposed in this study has high sensitivity and is particularly effective at reducing the misdiagnosis of dangerous stroke chameleons (false-negative rate <4%). ML classifiers consistently outperformed the benchmark logistic regression in all 3 input combinations. We found significant consistency across the models in the features that explain their performance. The most important features are age, the number of chronic conditions on admission, and primary payer (eg, Medicare or private insurance). Although both the individual- and community-level SDoH features helped improve the predictive performance of the models, the inclusion of the individual-level SDoH features led to a much larger improvement (area under the receiver operating characteristic curve increased from 0.694 to 0.823) than the inclusion of the community-level SDoH features (area under the receiver operating characteristic curve increased from 0.823 to 0.829). CONCLUSIONS Using data widely available at the time of patients' hospital presentations, we developed a stroke prediction model with high sensitivity and reasonable specificity. The prediction algorithm uses variables that are routinely collected by providers and payers and might be useful in underresourced hospitals with limited availability of sensitive diagnostic tools or incomplete data-gathering capabilities.
Collapse
Affiliation(s)
- Min Chen
- Department of Information Systems & Business Analytics, College of Business, Florida International University, Miami, FL, United States
| | - Xuan Tan
- Department of Information Systems and Analytics, Leavey School of Business, Santa Clara University, Santa Clara, CA, United States
| | - Rema Padman
- The H John Heinz III College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
9
|
Lv P, Yang J, Wang J, Guo Y, Tang Q, Magnier B, Lin J, Zhou J. Ischemic stroke prediction of patients with carotid atherosclerotic stenosis via multi-modality fused network. Front Neurosci 2023; 17:1118376. [PMID: 36908778 PMCID: PMC9998529 DOI: 10.3389/fnins.2023.1118376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 02/06/2023] [Indexed: 03/14/2023] Open
Abstract
Carotid atherosclerotic stenosis of the carotid artery is an important cause of ischemic cerebrovascular disease. The aim of this study was to predict the presence or absence of clinical symptoms in unknown patients by studying the existence or lack of symptoms of patients with carotid atherosclerotic stenosis. First, a deep neural network prediction model based on brain MRI imaging data of patients with multiple modalities is constructed; it uses the multi-modality features extracted from the neural network as inputs and the incidence of diagnosis as output to train the model. Then, a machine learning-based classification algorithm is developed to utilize the clinical features for comparison and evaluation. The experimental results showed that the deep learning model using imaging data could better predict the clinical symptom classification of patients. As part of preventive medicine, this study could help patients with carotid atherosclerosis narrowing to prepare for stroke prevention based on the prediction results.
Collapse
Affiliation(s)
- Peng Lv
- Department of Radiology, Zhongshan Hospital, Fudan University and Shanghai Institute of Medical Imaging, Shanghai, China
| | - Jing Yang
- School of Medicine, Xiamen University, Xiamen, China
| | - Jiacheng Wang
- Department of Computer Science at School of Informatics, Xiamen University, Xiamen, China
| | - Yi Guo
- Department of Radiology, Zhongshan Hospital Xiamen, Fudan University, Xiamen, China.,Xiamen Municipal Clinical Research Center for Medical Imaging, Xiamen, China
| | - Qiying Tang
- Department of Radiology, Zhongshan Hospital Xiamen, Fudan University, Xiamen, China.,Xiamen Municipal Clinical Research Center for Medical Imaging, Xiamen, China
| | - Baptiste Magnier
- Euromov Digital Health in Motion, Univ Montpellier, IMT Mines Ales, Ales, France
| | - Jiang Lin
- Department of Radiology, Zhongshan Hospital, Fudan University and Shanghai Institute of Medical Imaging, Shanghai, China
| | - Jianjun Zhou
- Department of Radiology, Zhongshan Hospital Xiamen, Fudan University, Xiamen, China.,Xiamen Municipal Clinical Research Center for Medical Imaging, Xiamen, China
| |
Collapse
|
10
|
Statistical modeling of health space based on metabolic stress and oxidative stress scores. BMC Public Health 2022; 22:1701. [PMID: 36076235 PMCID: PMC9454208 DOI: 10.1186/s12889-022-14081-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 08/23/2022] [Indexed: 11/12/2022] Open
Abstract
Background
Health space (HS) is a statistical way of visualizing individual’s health status in multi-dimensional space. In this study, we propose a novel HS in two-dimensional space based on scores of metabolic stress and of oxidative stress. Methods These scores were derived from three statistical models: logistic regression model, logistic mixed effect model, and proportional odds model. HSs were developed using Korea National Health And Nutrition Examination Survey data with 32,140 samples. To evaluate and compare the performance of the HSs, we also developed the Health Space Index (HSI) which is a quantitative performance measure based on the approximate 95% confidence ellipses of HS. Results Through simulation studies, we confirmed that HS from the proportional odds model showed highest power in discriminating health status of individual (subject). Further validation studies were conducted using two independent cohort datasets: a health examination dataset from Ewha-Boramae cohort with 862 samples and a population-based cohort from the Korea association resource project with 3,199 samples. Conclusions These validation studies using two independent datasets successfully demonstrated the usefulness of the proposed HS. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-022-14081-0.
Collapse
|
11
|
Fitzsimmons L, Dewan M, Dexheimer JW. Diversity in Machine Learning: A Systematic Review of Text-Based Diagnostic Applications. Appl Clin Inform 2022; 13:569-582. [PMID: 35613914 DOI: 10.1055/s-0042-1749119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open
Abstract
OBJECTIVE As the storage of clinical data has transitioned into electronic formats, medical informatics has become increasingly relevant in providing diagnostic aid. The purpose of this review is to evaluate machine learning models that use text data for diagnosis and to assess the diversity of the included study populations. METHODS We conducted a systematic literature review on three public databases. Two authors reviewed every abstract for inclusion. Articles were included if they used or developed machine learning algorithms to aid in diagnosis. Articles focusing on imaging informatics were excluded. RESULTS From 2,260 identified papers, we included 78. Of the machine learning models used, neural networks were relied upon most frequently (44.9%). Studies had a median population of 661.5 patients, and diseases and disorders of 10 different body systems were studied. Of the 35.9% (N = 28) of papers that included race data, 57.1% (N = 16) of study populations were majority White, 14.3% were majority Asian, and 7.1% were majority Black. In 75% (N = 21) of papers, White was the largest racial group represented. Of the papers included, 43.6% (N = 34) included the sex ratio of the patient population. DISCUSSION With the power to build robust algorithms supported by massive quantities of clinical data, machine learning is shaping the future of diagnostics. Limitations of the underlying data create potential biases, especially if patient demographics are unknown or not included in the training. CONCLUSION As the movement toward clinical reliance on machine learning accelerates, both recording demographic information and using diverse training sets should be emphasized. Extrapolating algorithms to demographics beyond the original study population leaves large gaps for potential biases.
Collapse
Affiliation(s)
- Lane Fitzsimmons
- College of Agriculture and Life Science, Cornell University, Ithaca, New York, United States
| | - Maya Dewan
- Division of Critical Care Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States
| | - Judith W Dexheimer
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States.,Division of Emergency Medicine; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, United States
| |
Collapse
|
12
|
Huang W, Ying TW, Chin WLC, Baskaran L, Marcus OEH, Yeo KK, Kiong NS. Application of ensemble machine learning algorithms on lifestyle factors and wearables for cardiovascular risk prediction. Sci Rep 2022; 12:1033. [PMID: 35058500 PMCID: PMC8776753 DOI: 10.1038/s41598-021-04649-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open
Abstract
This study looked at novel data sources for cardiovascular risk prediction including detailed lifestyle questionnaire and continuous blood pressure monitoring, using ensemble machine learning algorithms (MLAs). The reference conventional risk score compared against was the Framingham Risk Score (FRS). The outcome variables were low or high risk based on calcium score 0 or calcium score 100 and above. Ensemble MLAs were built based on naive bayes, random forest and support vector classifier for low risk and generalized linear regression, support vector regressor and stochastic gradient descent regressor for high risk categories. MLAs were trained on 600 Southeast Asians aged 21 to 69 years free of cardiovascular disease. All MLAs outperformed the FRS for low and high-risk categories. MLA based on lifestyle questionnaire only achieved AUC of 0.715 (95% CI 0.681, 0.750) and 0.710 (95% CI 0.653, 0.766) for low and high risk respectively. Combining all groups of risk factors (lifestyle survey questionnaires, clinical blood tests, 24-h ambulatory blood pressure and heart rate monitoring) along with feature selection, prediction of low and high CVD risk groups were further enhanced to 0.791 (95% CI 0.759, 0.822) and 0.790 (95% CI 0.745, 0.836). Besides conventional predictors, self-reported physical activity, average daily heart rate, awake blood pressure variability and percentage time in diastolic hypertension were important contributors to CVD risk classification.
Collapse
Affiliation(s)
- Weiting Huang
- National Heart Centre Singapore, 5 Hospital Drive, Singapore, 169609, Singapore.
| | - Tan Wei Ying
- Institute of Data Science, National University of Singapore, Singapore, Singapore
| | | | - Lohendran Baskaran
- National Heart Centre Singapore, 5 Hospital Drive, Singapore, 169609, Singapore
| | | | - Khung Keong Yeo
- National Heart Centre Singapore, 5 Hospital Drive, Singapore, 169609, Singapore
| | - Ng See Kiong
- Institute of Data Science, National University of Singapore, Singapore, Singapore
| |
Collapse
|
13
|
Chen J, Chen Y, Li J, Wang J, Lin Z, Nandi AK. Stroke Risk Prediction with Hybrid Deep Transfer Learning Framework. IEEE J Biomed Health Inform 2021; 26:411-422. [PMID: 34115602 DOI: 10.1109/jbhi.2021.3088750] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Stroke has become a leading cause of death and long-term disability in the world, and there is no effective treatment.Deep learning-based approaches have the potential to outperform existing stroke risk prediction models, they rely on large well-labeled data. Due to the strict privacy protection policy in health-care systems, stroke data is usually distributed among different hospitals in small pieces. In addition, the positive and negative instances of such data are extremely imbalanced. Transfer learning solves small data issue by exploiting the knowledge of a correlated domain, especially when multiple source are available.In this work, we propose a novel Hybrid Deep Transfer Learning-based Stroke Risk Prediction (HDTL-SRP) scheme to exploit the knowledge structure from multiple correlated sources (i.e.,external stroke data, chronic diseases data, such as hypertension and diabetes). The proposed framework has been extensively tested in synthetic and real-world scenarios, and it outperforms the state-of-the-art stroke risk prediction models. It also shows the potential of real-world deployment among multiple hospitals aided with 5G/B5G infrastructures.
Collapse
|
14
|
Mining incomplete clinical data for the early assessment of Kawasaki disease based on feature clustering and convolutional neural networks. Artif Intell Med 2020; 105:101859. [DOI: 10.1016/j.artmed.2020.101859] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 02/26/2020] [Accepted: 04/03/2020] [Indexed: 12/20/2022]
|
15
|
Cheon S, Kim J, Lim J. The Use of Deep Learning to Predict Stroke Patient Mortality. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:E1876. [PMID: 31141892 PMCID: PMC6603534 DOI: 10.3390/ijerph16111876] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 05/23/2019] [Accepted: 05/24/2019] [Indexed: 12/21/2022]
Abstract
The increase in stroke incidence with the aging of the Korean population will rapidly impose an economic burden on society. Timely treatment can improve stroke prognosis. Awareness of stroke warning signs and appropriate actions in the event of a stroke improve outcomes. Medical service use and health behavior data are easier to collect than medical imaging data. Here, we used a deep neural network to detect stroke using medical service use and health behavior data; we identified 15,099 patients with stroke. Principal component analysis (PCA) featuring quantile scaling was used to extract relevant background features from medical records; we used these to predict stroke. We compared our method (a scaled PCA/deep neural network [DNN] approach) to five other machine-learning methods. The area under the curve (AUC) value of our method was 83.48%; hence; it can be used by both patients and doctors to prescreen for possible stroke.
Collapse
Affiliation(s)
- Songhee Cheon
- Department of Physical Therapy, Youngsan University, Yangsan 626-790, Korea.
| | - Jungyoon Kim
- Department of Computer Science, Kent State University, Kent, OH 44242, USA.
| | - Jihye Lim
- Department of Healthcare Management, Youngsan University, Yangsan 626-790, Korea.
| |
Collapse
|