1
|
Lim H, Park Y, Hong JH, Yoo KB, Seo KD. Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study. Eur J Med Res 2024; 29:6. [PMID: 38173022 PMCID: PMC10763197 DOI: 10.1186/s40001-023-01594-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Many studies have evaluated stroke using claims data; most of these studies have defined ischemic stroke using an operational definition following the rule-based method. Rule-based methods tend to overestimate the number of patients with ischemic stroke. OBJECTIVES We aimed to identify an appropriate algorithm for identifying stroke by applying machine learning (ML) techniques to analyze the claims data. METHODS We obtained the data from the Korean National Health Insurance Service database, which is linked to the Ilsan Hospital database (n = 30,897). The performance of prediction models (extreme gradient boosting [XGBoost] or gated recurrent unit [GRU]) was evaluated using the area under the receiver operating characteristic curve (AUROC), the area under precision-recall curve (AUPRC), and calibration curve. RESULTS In total, 30,897 patients were enrolled in this study, 3145 of whom (10.18%) had ischemic stroke. XGBoost, a tree-based ML technique, had the AUROC was 94.46% and AUPRC was 92.80%. GRU showed the highest accuracy (99.81%), precision (99.92%) and recall (99.69%). CONCLUSIONS We proposed recurrent neural network-based deep learning techniques to improve stroke phenotyping. This can be expected to produce rapid and more accurate results than the rule-based methods.
Collapse
Affiliation(s)
- Hyunsun Lim
- Department of Research and Analysis, National Health Insurance Service Ilsan Hospital, Goyang, Republic of Korea
| | - Youngmin Park
- Department of Family Medicine, National Health Insurance Service Ilsan Hospital, Goyang, Republic of Korea
| | - Jung Hwa Hong
- Department of Research and Analysis, National Health Insurance Service Ilsan Hospital, Goyang, Republic of Korea
| | - Ki-Bong Yoo
- Division of Health Administration, Yonsei University, Wonju, Republic of Korea
| | - Kwon-Duk Seo
- Department of Neurology, National Health Insurance Service Ilsan Hospital, Goyang, Republic of Korea.
- Department of Neurology, Graduate School of Medicine, Kangwon National University, Chuncheon, Republic of Korea.
| |
Collapse
|
2
|
Li Q, Chi L, Zhao W, Wu L, Jiao C, Zheng X, Zhang K, Li X. Machine learning prediction of motor function in chronic stroke patients: a systematic review and meta-analysis. Front Neurol 2023; 14:1039794. [PMID: 37388543 PMCID: PMC10299899 DOI: 10.3389/fneur.2023.1039794] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 05/25/2023] [Indexed: 07/01/2023] Open
Abstract
Background Recent studies have reported that machine learning (ML), with a relatively strong capacity for processing non-linear data and adaptive ability, could improve the accuracy and efficiency of prediction. The article summarizes the published studies on ML models that predict motor function 3-6 months post-stroke. Methods A systematic literature search was conducted in PubMed, Embase, Cochorane and Web of Science as of April 3, 2023 for studies on ML prediction of motor function in stroke patients. The quality of the literature was assessed using the Prediction model Risk Of Bias Assessment Tool (PROBAST). A random-effects model was preferred for meta-analysis using R4.2.0 because of the different variables and parameters. Results A total of 44 studies were included in this meta-analysis, involving 72,368 patients and 136 models. Models were categorized into subgroups according to the predicted outcome Modified Rankin Scale cut-off value and whether they were constructed based on radiomics. C-statistics, sensitivity, and specificity were calculated. The random-effects model showed that the C-statistics of all models were 0.81 (95% CI: 0.79; 0.83) in the training set and 0.82 (95% CI: 0.80; 0.85) in the validation set. According to different Modified Rankin Scale cut-off values, C-statistics of ML models predicting Modified Rankin Scale>2(used most widely) in stroke patients were 0.81 (95% CI: 0.78; 0.84) in the training set, and 0.84 (95% CI: 0.81; 0.87) in the validation set. C-statistics of radiomics-based ML models in the training set and validation set were 0.81 (95% CI: 0.78; 0.84) and 0.87 (95% CI: 0.83; 0.90), respectively. Conclusion ML can be used as an assessment tool for predicting the motor function in patients with 3-6 months of post-stroke. Additionally, the study found that ML models with radiomics as a predictive variable were also demonstrated to have good predictive capabilities. This systematic review provides valuable guidance for the future optimization of ML prediction systems that predict poor motor outcomes in stroke patients. Systematic review registration https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42022335260, identifier: CRD42022335260.
Collapse
Affiliation(s)
- Qinglin Li
- Second Clinical Medical School, Heilongjiang University of Chinese Medicine, Harbin, Heilongjiang, China
| | - Lei Chi
- Department of Acupuncture, The Second Affiliated Hospital of Heilongjiang University of Chinese Medicine, Harbin, Heilongjiang, China
| | - Weiying Zhao
- Second Clinical Medical School, Heilongjiang University of Chinese Medicine, Harbin, Heilongjiang, China
| | - Lei Wu
- Department of Acupuncture, The Third Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China
| | - Chuanxu Jiao
- Department of Neurorehabilitation, Taizhou Enze Medical Center Luqiao Hospital, Taizhou, Zhejiang, China
| | - Xue Zheng
- Second Clinical Medical School, Heilongjiang University of Chinese Medicine, Harbin, Heilongjiang, China
| | - Kaiyue Zhang
- Second Clinical Medical School, Heilongjiang University of Chinese Medicine, Harbin, Heilongjiang, China
| | - Xiaoning Li
- Department of Acupuncture, The Second Affiliated Hospital of Heilongjiang University of Chinese Medicine, Harbin, Heilongjiang, China
| |
Collapse
|
3
|
Surianarayanan C, Lawrence JJ, Chelliah PR, Prakash E, Hewage C. Convergence of Artificial Intelligence and Neuroscience towards the Diagnosis of Neurological Disorders-A Scoping Review. SENSORS (BASEL, SWITZERLAND) 2023; 23:3062. [PMID: 36991773 PMCID: PMC10053494 DOI: 10.3390/s23063062] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/09/2023] [Accepted: 03/09/2023] [Indexed: 06/19/2023]
Abstract
Artificial intelligence (AI) is a field of computer science that deals with the simulation of human intelligence using machines so that such machines gain problem-solving and decision-making capabilities similar to that of the human brain. Neuroscience is the scientific study of the struczture and cognitive functions of the brain. Neuroscience and AI are mutually interrelated. These two fields help each other in their advancements. The theory of neuroscience has brought many distinct improvisations into the AI field. The biological neural network has led to the realization of complex deep neural network architectures that are used to develop versatile applications, such as text processing, speech recognition, object detection, etc. Additionally, neuroscience helps to validate the existing AI-based models. Reinforcement learning in humans and animals has inspired computer scientists to develop algorithms for reinforcement learning in artificial systems, which enables those systems to learn complex strategies without explicit instruction. Such learning helps in building complex applications, like robot-based surgery, autonomous vehicles, gaming applications, etc. In turn, with its ability to intelligently analyze complex data and extract hidden patterns, AI fits as a perfect choice for analyzing neuroscience data that are very complex. Large-scale AI-based simulations help neuroscientists test their hypotheses. Through an interface with the brain, an AI-based system can extract the brain signals and commands that are generated according to the signals. These commands are fed into devices, such as a robotic arm, which helps in the movement of paralyzed muscles or other human parts. AI has several use cases in analyzing neuroimaging data and reducing the workload of radiologists. The study of neuroscience helps in the early detection and diagnosis of neurological disorders. In the same way, AI can effectively be applied to the prediction and detection of neurological disorders. Thus, in this paper, a scoping review has been carried out on the mutual relationship between AI and neuroscience, emphasizing the convergence between AI and neuroscience in order to detect and predict various neurological disorders.
Collapse
Affiliation(s)
| | | | | | - Edmond Prakash
- Research Center for Creative Arts, University for the Creative Arts (UCA), Farnham GU9 7DS, UK
| | - Chaminda Hewage
- Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff CF5 2YB, UK
| |
Collapse
|
4
|
Eysenbach G, Tan X, Padman R. A Machine Learning Approach to Support Urgent Stroke Triage Using Administrative Data and Social Determinants of Health at Hospital Presentation: Retrospective Study. J Med Internet Res 2023; 25:e36477. [PMID: 36716097 PMCID: PMC9926350 DOI: 10.2196/36477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 07/17/2022] [Accepted: 12/18/2022] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND The key to effective stroke management is timely diagnosis and triage. Machine learning (ML) methods developed to assist in detecting stroke have focused on interpreting detailed clinical data such as clinical notes and diagnostic imaging results. However, such information may not be readily available when patients are initially triaged, particularly in rural and underserved communities. OBJECTIVE This study aimed to develop an ML stroke prediction algorithm based on data widely available at the time of patients' hospital presentations and assess the added value of social determinants of health (SDoH) in stroke prediction. METHODS We conducted a retrospective study of the emergency department and hospitalization records from 2012 to 2014 from all the acute care hospitals in the state of Florida, merged with the SDoH data from the American Community Survey. A case-control design was adopted to construct stroke and stroke mimic cohorts. We compared the algorithm performance and feature importance measures of the ML models (ie, gradient boosting machine and random forest) with those of the logistic regression model based on 3 sets of predictors. To provide insights into the prediction and ultimately assist care providers in decision-making, we used TreeSHAP for tree-based ML models to explain the stroke prediction. RESULTS Our analysis included 143,203 hospital visits of unique patients, and it was confirmed based on the principal diagnosis at discharge that 73% (n=104,662) of these patients had a stroke. The approach proposed in this study has high sensitivity and is particularly effective at reducing the misdiagnosis of dangerous stroke chameleons (false-negative rate <4%). ML classifiers consistently outperformed the benchmark logistic regression in all 3 input combinations. We found significant consistency across the models in the features that explain their performance. The most important features are age, the number of chronic conditions on admission, and primary payer (eg, Medicare or private insurance). Although both the individual- and community-level SDoH features helped improve the predictive performance of the models, the inclusion of the individual-level SDoH features led to a much larger improvement (area under the receiver operating characteristic curve increased from 0.694 to 0.823) than the inclusion of the community-level SDoH features (area under the receiver operating characteristic curve increased from 0.823 to 0.829). CONCLUSIONS Using data widely available at the time of patients' hospital presentations, we developed a stroke prediction model with high sensitivity and reasonable specificity. The prediction algorithm uses variables that are routinely collected by providers and payers and might be useful in underresourced hospitals with limited availability of sensitive diagnostic tools or incomplete data-gathering capabilities.
Collapse
Affiliation(s)
| | - Xuan Tan
- Department of Information Systems and Analytics, Leavey School of Business, Santa Clara University, Santa Clara, CA, United States
| | - Rema Padman
- The H John Heinz III College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
5
|
Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023; 30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. MATERIALS AND METHODS We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. RESULTS Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. DISCUSSION Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. CONCLUSION Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.
Collapse
Affiliation(s)
- Siyue Yang
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | | | - Ellen Stephenson
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Karen Tu
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
6
|
Lee T, Jeon ET, Jung JM, Lee M. Deep-Learning-Based Stroke Screening Using Skeleton Data from Neurological Examination Videos. J Pers Med 2022; 12:jpm12101691. [PMID: 36294830 PMCID: PMC9604814 DOI: 10.3390/jpm12101691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/25/2022] [Accepted: 10/07/2022] [Indexed: 11/19/2022] Open
Abstract
According to the Korea Institute for Health and Social Affairs, in 2017, the elderly, aged 65 or older, had an average of 2.7 chronic diseases per person. The concern for the medical welfare of the elderly is increasing due to a low birth rate, an aging population, and the lack of medical personnel. The demand for services that take user age, cognitive capacity, and difficulty into account is rising. As a result, there is an increased demand for smart healthcare systems that can lower hospital admissions and offer patients individualized care. This has motivated us to develop an AI system that can easily screen and manage neurological diseases through videos. As neurological diseases can be diagnosed by visual analysis to some extent, in this study, we set out to estimate the possibility of a person having a neurological disease from videos. Among neurological diseases, we focus on stroke because it is a common condition in the elderly population and results in high mortality and morbidity worldwide. The proposed method consists of three steps: (1) transforming neurological examination videos into landmark data, (2) converting the landmark data into recurrence plots, and (3) estimating the possibility of a stroke using deep neural networks. Major features, such as the hand, face, pupil, and body movements of a person are extracted from test videos taken under several neurological examination protocols using deep-learning-based landmark extractors. Sequences of these landmark data are then converted into recurrence plots, which can be interpreted as images. These images can be fed into convolutional neural networks to classify stroke using feature-fusion techniques. A case study of the application of a disease screening test to assess the capability of the proposed method is presented.
Collapse
Affiliation(s)
- Taeho Lee
- Department of Electrical and Electronic Engineering, Hanyang University, 55 Hanyangdaehak-ro, Sangnok-gu, Ansan 15588, Korea
| | - Eun-Tae Jeon
- Department of Radiology, SMG-SNU Boramae Medical Center, Seoul National University College of Medicine, Seoul 07061, Korea
| | - Jin-Man Jung
- Department of Neurology, Korea University Ansan Hospital, Ansan 15355, Korea
- Zebrafish Translational Medical Research Center, Korea University, Ansan 15328, Korea
| | - Minsik Lee
- Department of Electrical and Electronic Engineering, Hanyang University, 55 Hanyangdaehak-ro, Sangnok-gu, Ansan 15588, Korea
- Correspondence: ; Tel.: +82-31-400-5173
| |
Collapse
|
7
|
Liu L, Wu DTY, Spooner SA, Ni Y. Development and Evaluation of an Automated Approach to Detect Weight Abnormalities in Pediatric Weight Charts. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2021:783-792. [PMID: 35308946 PMCID: PMC8861738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Inaccurate body weight measures can cause critical safety events in clinical settings as well as hindering utilization of clinical data for retrospective research. This study focused on developing a machine learning-based automated weight abnormality detector (AWAD) to analyze growth dynamics in pediatric weight charts and detect abnormal weight values. In two reference-standard based evaluation of real-world clinical data, the machine learning models showed good capacity for detecting weight abnormalities and they significantly outperformed the methods proposed in literature (p-value<0.05). A deep learning model with bi-directional long short-term memory networks achieved the best predictive performance, with AUCs ≥0.989 across the two datasets. The positive predictive value and sensitivity achieved by the system suggested more than 98% screening effort reduction potential in weight abnormality detection. Consequently, we hypothesize that the AWAD, when fully deployed, holds great potential to facilitate clinical research and healthcare delivery that rely on accurate and reliable weight measures.
Collapse
Affiliation(s)
- Lei Liu
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Cincinnati, OH
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
| | - Danny T Y Wu
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Cincinnati, OH
- 3Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH
| | - S Andrew Spooner
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
- 3Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH
| | - Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
- 3Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH
| |
Collapse
|
8
|
Mainali S, Darsie ME, Smetana KS. Machine Learning in Action: Stroke Diagnosis and Outcome Prediction. Front Neurol 2021; 12:734345. [PMID: 34938254 PMCID: PMC8685212 DOI: 10.3389/fneur.2021.734345] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 10/28/2021] [Indexed: 01/01/2023] Open
Abstract
The application of machine learning has rapidly evolved in medicine over the past decade. In stroke, commercially available machine learning algorithms have already been incorporated into clinical application for rapid diagnosis. The creation and advancement of deep learning techniques have greatly improved clinical utilization of machine learning tools and new algorithms continue to emerge with improved accuracy in stroke diagnosis and outcome prediction. Although imaging-based feature recognition and segmentation have significantly facilitated rapid stroke diagnosis and triaging, stroke prognostication is dependent on a multitude of patient specific as well as clinical factors and hence accurate outcome prediction remains challenging. Despite its vital role in stroke diagnosis and prognostication, it is important to recognize that machine learning output is only as good as the input data and the appropriateness of algorithm applied to any specific data set. Additionally, many studies on machine learning tend to be limited by small sample size and hence concerted efforts to collate data could improve evaluation of future machine learning tools in stroke. In the present state, machine learning technology serves as a helpful and efficient tool for rapid clinical decision making while oversight from clinical experts is still required to address specific aspects not accounted for in an automated algorithm. This article provides an overview of machine learning technology and a tabulated review of pertinent machine learning studies related to stroke diagnosis and outcome prediction.
Collapse
Affiliation(s)
- Shraddha Mainali
- Department of Neurology, Virginia Commonwealth University, Richmond, VA, United States
| | - Marin E Darsie
- Department of Emergency Medicine, University of Wisconsin Hospitals and Clinics, Madison, WI, United States.,Department of Neurological Surgery, University of Wisconsin Hospitals and Clinics, Madison, WI, United States
| | - Keaton S Smetana
- Department of Pharmacy, The Ohio State University Wexner Medical Center, Columbus, OH, United States
| |
Collapse
|
9
|
Predicting Mortality in Patients with Stroke Using Data Mining Techniques. ACTA INFORMATICA PRAGENSIA 2021. [DOI: 10.18267/j.aip.163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
10
|
Atrial fibrillation detection in primary care during blood pressure measurements and using a smartphone cardiac monitor. Sci Rep 2021; 11:17721. [PMID: 34489508 PMCID: PMC8421380 DOI: 10.1038/s41598-021-97475-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 08/24/2021] [Indexed: 11/22/2022] Open
Abstract
Improved atrial fibrillation (AF) screening methods are required. We detected AF with pulse rate variability (PRV) parameters using a blood pressure device (BP+; Uscom, Sydney, Australia) and with a Kardia Mobile Cardiac Monitor (KMCM; AliveCor, Mountain View, CA). In 421 primary care patients (mean (range) age: 72 (31–99) years), we diagnosed AF (n = 133) from 12-lead electrocardiogram recordings, and performed PRV and KMCM measurements. PRV parameters detected AF with area under curve (AUC) values of up to 0.92. Using the mean of two sequential readings increased AUC to up to 0.94 and improved positive predictive value at a given sensitivity (by up to 18%). The KMCM detected AF with 83% sensitivity and 68% specificity. 89 KMCM recordings were “unclassified” or blank, and PRV detected AF in these with AUC values of up to 0.88. When non-AF arrhythmias (n = 56) were excluded, the KMCM device had increased specificity (73%) and PRV had higher discrimination performance (maximum AUC = 0.96). In decision curve analysis, all PRV parameters consistently achieved a positive net benefit across the range of clinical thresholds. In primary care, AF can be detected by PRV accurately and by KMCM, especially in the absence of non-AF arrhythmias or when combinations of measurements are used.
Collapse
|
11
|
Wyrwa JM, Shirel TM, Hostetter TA, Schneider AL, Hoffmire CA, Stearns-Yoder KA, Forster JE, Odom NE, Brenner LA. Suicide After Stroke in the United States Veteran Health Administration Population. Arch Phys Med Rehabil 2021; 102:1729-1734. [PMID: 33811852 DOI: 10.1016/j.apmr.2021.03.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Revised: 03/10/2021] [Accepted: 03/12/2021] [Indexed: 11/22/2022]
Abstract
OBJECTIVE To evaluate risk for suicide among veterans with a history of stroke, seeking care within the Veterans Health Administration (VHA), we analyzed existing clinical data. DESIGN This retrospective cohort study was approved and performed in accordance with the local Institutional Review Board. Veterans were identified via the VHA's Corporate Data Warehouse. Initial eligibility criteria included confirmed veteran status and at least 90 days of VHA utilization between fiscal years 2001-2015. Cox proportional hazards models were used to assess the association between history of stroke and suicide. Among those veterans who died by suicide, the association between history of stroke and method of suicide was also investigated. SETTING VHA. PARTICIPANTS Veterans with at least 90 days of VHA utilization between fiscal years 2001-2015 (N=1,647,671). Data from these 1,647,671 veterans were analyzed (1,405,762 without stroke and 241,909 with stroke). INTERVENTIONS Not applicable. MAIN OUTCOME MEASURES Suicide and method of suicide. RESULTS The fully adjusted model, which controlled for age, sex, mental health diagnoses, mild traumatic brain injury, and modified Charlson/Deyo Index (stroke-related diagnoses excluded), demonstrated a hazard ratio of 1.13 (95% confidence interval, 1.02-1.25; P=.02). The majority of suicides in both cohorts was by firearm, and a significantly larger proportion of suicides occurred by firearm in the group with stroke than the cohort without (81.2% vs 76.6%). CONCLUSIONS Findings suggest that veterans with a history of stroke are at increased risk for suicide, specifically by firearm, compared with veterans without a history of stroke. Increased efforts are needed to address the mental health needs and lethal means safety of veterans with a history of stroke, with the goal of improving function and decreasing negative psychiatric outcomes, such as suicide.
Collapse
Affiliation(s)
- Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, University of Colorado, School of Medicine, Aurora, CO.
| | - Tyler M Shirel
- Department of Physical Medicine and Rehabilitation, University of Colorado, School of Medicine, Aurora, CO
| | - Trisha A Hostetter
- Veterans Affairs (VA) Rocky Mountain Mental Illness Research Education and Clinical Center (MIRECC), Aurora, CO
| | - Alexandra L Schneider
- Veterans Affairs (VA) Rocky Mountain Mental Illness Research Education and Clinical Center (MIRECC), Aurora, CO
| | - Claire A Hoffmire
- Department of Physical Medicine and Rehabilitation, University of Colorado, School of Medicine, Aurora, CO; Veterans Affairs (VA) Rocky Mountain Mental Illness Research Education and Clinical Center (MIRECC), Aurora, CO
| | - Kelly A Stearns-Yoder
- Department of Physical Medicine and Rehabilitation, University of Colorado, School of Medicine, Aurora, CO; Veterans Affairs (VA) Rocky Mountain Mental Illness Research Education and Clinical Center (MIRECC), Aurora, CO
| | - Jeri E Forster
- Department of Physical Medicine and Rehabilitation, University of Colorado, School of Medicine, Aurora, CO; Veterans Affairs (VA) Rocky Mountain Mental Illness Research Education and Clinical Center (MIRECC), Aurora, CO
| | - Nathan E Odom
- Department of Physical Medicine and Rehabilitation, University of Colorado, School of Medicine, Aurora, CO
| | - Lisa A Brenner
- Department of Physical Medicine and Rehabilitation, University of Colorado, School of Medicine, Aurora, CO; Veterans Affairs (VA) Rocky Mountain Mental Illness Research Education and Clinical Center (MIRECC), Aurora, CO
| |
Collapse
|
12
|
Zhao Y, Fu S, Bielinski SJ, Decker PA, Chamberlain AM, Roger VL, Liu H, Larson NB. Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation. J Med Internet Res 2021; 23:e22951. [PMID: 33683212 PMCID: PMC7985804 DOI: 10.2196/22951] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 08/25/2020] [Accepted: 01/20/2021] [Indexed: 11/29/2022] Open
Abstract
Background Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events. Objective The aim of this study was to develop a machine learning–based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. Methods The algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314). Results Among the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86% (43/50; 95% CI 0.74-0.93) and a negative predictive value of 96% (96/100). For subtype identification, we achieved an accuracy of 83% in the AF cohort and 80% in the general population sample. Conclusions We developed and validated a machine learning–based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions.
Collapse
Affiliation(s)
- Yiqing Zhao
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Suzette J Bielinski
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Paul A Decker
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Alanna M Chamberlain
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Veronique L Roger
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Nicholas B Larson
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
13
|
Lee S, Doktorchik C, Martin EA, D'Souza AG, Eastwood C, Shaheen AA, Naugler C, Lee J, Quan H. Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review. JMIR Med Inform 2021; 9:e23934. [PMID: 33522976 PMCID: PMC7884219 DOI: 10.2196/23934] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/20/2020] [Accepted: 12/05/2020] [Indexed: 12/16/2022] Open
Abstract
Background Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research. Objective This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions. Methods A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines. Results A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule–based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance. Conclusions Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.
Collapse
Affiliation(s)
- Seungwon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Chelsea Doktorchik
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Elliot Asher Martin
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Adam Giles D'Souza
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Cathy Eastwood
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Abdel Aziz Shaheen
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Christopher Naugler
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Joon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Hude Quan
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
14
|
Aguiar de Sousa D, Katan M. Promising Use of Automated Electronic Phenotyping: Turning Big Data Into Big Value in Stroke Research. Stroke 2020; 52:190-192. [PMID: 33297867 DOI: 10.1161/strokeaha.120.033061] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Affiliation(s)
- Diana Aguiar de Sousa
- Department of Neurosciences and Mental Health (Neurology), Hospital de Santa Maria-Centro Hospitalar Universitário Lisboa Norte (CHULN), Lisbon, Portugal (D.A.d.S.).,Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal (D.A.d.S.)
| | - Mira Katan
- Department of Neurology, University Hospital of Zurich, Switzerland (M.K.).,Neuroscience Center of Zurich, University of Zurich, Switzerland (M.K.)
| |
Collapse
|
15
|
Thangaraj PM, Kummer BR, Lorberbaum T, Elkind MSV, Tatonetti NP. Comparative analysis, applications, and interpretation of electronic health record-based stroke phenotyping methods. BioData Min 2020; 13:21. [PMID: 33372632 PMCID: PMC7720570 DOI: 10.1186/s13040-020-00230-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 11/15/2020] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Accurate identification of acute ischemic stroke (AIS) patient cohorts is essential for a wide range of clinical investigations. Automated phenotyping methods that leverage electronic health records (EHRs) represent a fundamentally new approach cohort identification without current laborious and ungeneralizable generation of phenotyping algorithms. We systematically compared and evaluated the ability of machine learning algorithms and case-control combinations to phenotype acute ischemic stroke patients using data from an EHR. MATERIALS AND METHODS Using structured patient data from the EHR at a tertiary-care hospital system, we built and evaluated machine learning models to identify patients with AIS based on 75 different case-control and classifier combinations. We then estimated the prevalence of AIS patients across the EHR. Finally, we externally validated the ability of the models to detect AIS patients without AIS diagnosis codes using the UK Biobank. RESULTS Across all models, we found that the mean AUROC for detecting AIS was 0.963 ± 0.0520 and average precision score 0.790 ± 0.196 with minimal feature processing. Classifiers trained with cases with AIS diagnosis codes and controls with no cerebrovascular disease codes had the best average F1 score (0.832 ± 0.0383). In the external validation, we found that the top probabilities from a model-predicted AIS cohort were significantly enriched for AIS patients without AIS diagnosis codes (60-150 fold over expected). CONCLUSIONS Our findings support machine learning algorithms as a generalizable way to accurately identify AIS patients without using process-intensive manual feature curation. When a set of AIS patients is unavailable, diagnosis codes may be used to train classifier models.
Collapse
Affiliation(s)
- Phyllis M Thangaraj
- Department of Biomedical Informatics, Columbia University, 622 W 168th St., PH-20, New York, NY, 10032, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Benjamin R Kummer
- Department of Neurology, Icahn School of Medicine at Mt. Sinai, New York, NY, USA
| | - Tal Lorberbaum
- Department of Biomedical Informatics, Columbia University, 622 W 168th St., PH-20, New York, NY, 10032, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Mitchell S V Elkind
- Department of Neurology, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
- Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, 622 W 168th St., PH-20, New York, NY, 10032, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
| |
Collapse
|
16
|
Sung SF, Lin CY, Hu YH. EMR-Based Phenotyping of Ischemic Stroke Using Supervised Machine Learning and Text Mining Techniques. IEEE J Biomed Health Inform 2020; 24:2922-2931. [DOI: 10.1109/jbhi.2020.2976931] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
17
|
Zhao J, Zhang Y, Schlueter DJ, Wu P, Eric Kerchberger V, Trent Rosenbloom S, Wells QS, Feng Q, Denny JC, Wei WQ. Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: Cardiovascular disease case study. J Biomed Inform 2019; 98:103270. [PMID: 31445983 PMCID: PMC6783385 DOI: 10.1016/j.jbi.2019.103270] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 07/10/2019] [Accepted: 08/16/2019] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Discovering subphenotypes of complex diseases can help characterize disease cohorts for investigative studies aimed at developing better diagnoses and treatments. Recent advances in unsupervised machine learning on electronic health record (EHR) data have enabled researchers to discover phenotypes without input from domain experts. However, most existing studies have ignored time and modeled diseases as discrete events. Uncovering the evolution of phenotypes - how they emerge, evolve and contribute to health outcomes - is essential to define more precise phenotypes and refine the understanding of disease progression. Our objective was to assess the benefits of an unsupervised approach that incorporates time to model diseases as dynamic processes in phenotype discovery. METHODS In this study, we applied a constrained non-negative tensor-factorization approach to characterize the complexity of cardiovascular disease (CVD) patient cohort based on longitudinal EHR data. Through tensor-factorization, we identified a set of phenotypic topics (i.e., subphenotypes) that these patients established over the 10 years prior to the diagnosis of CVD, and showed the progress pattern. For each identified subphenotype, we examined its association with the risk for adverse cardiovascular outcomes estimated by the American College of Cardiology/American Heart Association Pooled Cohort Risk Equations, a conventional CVD-risk assessment tool frequently used in clinical practice. Furthermore, we compared the subsequent myocardial infarction (MI) rates among the six most prevalent subphenotypes using survival analysis. RESULTS From a cohort of 12,380 adult CVD individuals with 1068 unique PheCodes, we successfully identified 14 subphenotypes. Through the association analysis with estimated CVD risk for each subtype, we found some phenotypic topics such as Vitamin D deficiency and depression, Urinary infections cannot be explained by the conventional risk factors. Through a survival analysis, we found markedly different risks of subsequent MI following the diagnosis of CVD among the six most prevalent topics (p < 0.0001), indicating these topics may capture clinically meaningful subphenotypes of CVD. CONCLUSION This study demonstrates the potential benefits of using tensor-decomposition to model diseases as dynamic processes from longitudinal EHR data. Our results suggest that this data-driven approach may potentially help researchers identify complex and chronic disease subphenotypes in precision medicine research.
Collapse
Affiliation(s)
- Juan Zhao
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Yun Zhang
- Fixed Income Division, Morgan Stanley & Co LLC, New York, NY, USA
| | - David J Schlueter
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Patrick Wu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Medical Scientist Training Program, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Vern Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Division of Allergy, Pulmonary, and Critical Care Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - S Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Quinn S Wells
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - QiPing Feng
- Division of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|