1
|
Powla PP, Fakhri F, Jankowski S, Mansour A, Polley EC. Clinical Prediction Models in Neurocritical Care: An Overview of the Literature and Example Application to Prediction of Hospital Mortality in Traumatic Brain Injury. Neurocrit Care 2025; 42:32-38. [PMID: 39107660 DOI: 10.1007/s12028-024-02083-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 07/26/2024] [Indexed: 02/12/2025]
Abstract
Clinical prediction models serve as valuable instruments for assessing the risk of crucial outcomes and facilitating decision-making in clinical settings. Constructing these models requires nuanced analytical decisions and expertise informed by the current statistical literature. Access and thorough understanding of such literature may be limited for neurocritical care physicians, which may hinder the interpretation of existing predictive models. The present emphasis is on narrowing this knowledge gap by providing neurocritical care specialists with methodological guidance for interpreting predictive models in neurocritical care. Presented are the statistical learning principles integral to constructing a model predicting hospital mortality (nonsurvival during hospitalization) in patients with moderate and severe blunt traumatic brain injury using components of the IMPACT-Core model. Discussion encompasses critical elements such as model flexibility, hyperparameter selection, data imbalance, cross-validation, model assessment (discrimination and calibration), prediction instability, and probability thresholds. The intricate interplay among these components, the data set, and the clincal context of neurocritical care is elaborated. Leveraging this comprehensive exploration of statistical learning can enhance comprehension of articles encompassing model generation, tailored clinical care, and, ultimately, better interpretation and clinical applicability of predictive models.
Collapse
Affiliation(s)
- Plamena P Powla
- Division of Neurocritical Care, Department of Neurology, University of Chicago Medical Center, 5841 S. Maryland Ave., MC 2030, Chicago, IL, 60637-1470, USA.
| | - Farima Fakhri
- Division of Neurocritical Care, Department of Neurology, University of Chicago Medical Center, 5841 S. Maryland Ave., MC 2030, Chicago, IL, 60637-1470, USA
| | - Samantha Jankowski
- Division of Neurocritical Care, Department of Neurology, University of Chicago Medical Center, 5841 S. Maryland Ave., MC 2030, Chicago, IL, 60637-1470, USA
| | - Ali Mansour
- Division of Neurocritical Care, Department of Neurology, University of Chicago Medical Center, 5841 S. Maryland Ave., MC 2030, Chicago, IL, 60637-1470, USA
| | - Eric C Polley
- Department of Public Health Sciences, University of Chicago Medical Center, Chicago, IL, USA
| |
Collapse
|
2
|
Dimitsaki S, Natsiavas P, Jaulent MC. Applying AI to Structured Real-World Data for Pharmacovigilance Purposes: Scoping Review. J Med Internet Res 2024; 26:e57824. [PMID: 39753222 PMCID: PMC11729787 DOI: 10.2196/57824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 10/03/2024] [Accepted: 10/27/2024] [Indexed: 01/14/2025] Open
Abstract
BACKGROUND Artificial intelligence (AI) applied to real-world data (RWD; eg, electronic health care records) has been identified as a potentially promising technical paradigm for the pharmacovigilance field. There are several instances of AI approaches applied to RWD; however, most studies focus on unstructured RWD (conducting natural language processing on various data sources, eg, clinical notes, social media, and blogs). Hence, it is essential to investigate how AI is currently applied to structured RWD in pharmacovigilance and how new approaches could enrich the existing methodology. OBJECTIVE This scoping review depicts the emerging use of AI on structured RWD for pharmacovigilance purposes to identify relevant trends and potential research gaps. METHODS The scoping review methodology is based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) methodology. We queried the MEDLINE database through the PubMed search engine. Relevant scientific manuscripts published from January 2010 to January 2024 were retrieved. The included studies were "mapped" against a set of evaluation criteria, including applied AI approaches, code availability, description of the data preprocessing pipeline, clinical validation of AI models, and implementation of trustworthy AI criteria following the guidelines of the FUTURE (Fairness, Universality, Traceability, Usability, Robustness, and Explainability)-AI initiative. RESULTS The scoping review ultimately yielded 36 studies. There has been a significant increase in relevant studies after 2019. Most of the articles focused on adverse drug reaction detection procedures (23/36, 64%) for specific adverse effects. Furthermore, a substantial number of studies (34/36, 94%) used nonsymbolic AI approaches, emphasizing classification tasks. Random forest was the most popular machine learning approach identified in this review (17/36, 47%). The most common RWD sources used were electronic health care records (28/36, 78%). Typically, these data were not available in a widely acknowledged data model to facilitate interoperability, and they came from proprietary databases, limiting their availability for reproducing results. On the basis of the evaluation criteria classification, 10% (4/36) of the studies published their code in public registries, 16% (6/36) tested their AI models in clinical environments, and 36% (13/36) provided information about the data preprocessing pipeline. In addition, in terms of trustworthy AI, 89% (32/36) of the studies followed at least half of the trustworthy AI initiative guidelines. Finally, selection and confounding biases were the most common biases in the included studies. CONCLUSIONS AI, along with structured RWD, constitutes a promising line of work for drug safety and pharmacovigilance. However, in terms of AI, some approaches have not been examined extensively in this field (such as explainable AI and causal AI). Moreover, it would be helpful to have a data preprocessing protocol for RWD to support pharmacovigilance processes. Finally, because of personal data sensitivity, evaluation procedures have to be investigated further.
Collapse
Affiliation(s)
- Stella Dimitsaki
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé - LIMICS, Inserm, Université Sorbonne Paris-Nord, Sorbonne Université, Paris, France
| | - Pantelis Natsiavas
- Centre for Research and Development Hellas, Institute of Applied Biosciences, Thessaloniki, Greece
| | - Marie-Christine Jaulent
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé - LIMICS, Inserm, Université Sorbonne Paris-Nord, Sorbonne Université, Paris, France
| |
Collapse
|
3
|
Hu Q, Chen Y, Zou D, He Z, Xu T. Predicting adverse drug event using machine learning based on electronic health records: a systematic review and meta-analysis. Front Pharmacol 2024; 15:1497397. [PMID: 39605909 PMCID: PMC11600142 DOI: 10.3389/fphar.2024.1497397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 10/21/2024] [Indexed: 11/29/2024] Open
Abstract
Introduction Adverse drug events (ADEs) pose a significant challenge in current clinical practice. Machine learning (ML) has been increasingly used to predict specific ADEs using electronic health record (EHR) data. This systematic review provides a comprehensive overview of the application of ML in predicting specific ADEs based on EHR data. Methods A systematic search of PubMed, Web of Science, Embase, and IEEE Xplore was conducted to identify relevant articles published from the inception to 20 May 2024. Studies that developed ML models for predicting specific ADEs or ADEs associated with particular drugs were included using EHR data. Results A total of 59 studies met the inclusion criteria, covering 15 drugs and 15 ADEs. In total, 38 machine learning algorithms were reported, with random forest (RF) being the most frequently used, followed by support vector machine (SVM), eXtreme gradient boosting (XGBoost), decision tree (DT), and light gradient boosting machine (LightGBM). The performance of the ML models was generally strong, with an average area under the curve (AUC) of 76.68% ± 10.73, accuracy of 76.00% ± 11.26, precision of 60.13% ± 24.81, sensitivity of 62.35% ± 20.19, specificity of 75.13% ± 16.60, and an F1 score of 52.60% ± 21.10. The combined sensitivity, specificity, diagnostic odds ratio (DOR), and AUC from the summary receiver operating characteristic (SROC) curve using a random effects model were 0.65 (95% CI: 0.65-0.66), 0.89 (95% CI: 0.89-0.90), 12.11 (95% CI: 8.17-17.95), and 0.8069, respectively. The risk factors associated with different drugs and ADEs varied. Discussion Future research should focus on improving standardization, conducting multicenter studies that incorporate diverse data types, and evaluating the impact of artificial intelligence predictive models in real-world clinical settings. Systematic Review Registration https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42024565842, identifier CRD42024565842.
Collapse
Affiliation(s)
- Qiaozhi Hu
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
- West China School of Medicine, Sichuan University, Chengdu, Sichuan, China
| | - Yuxian Chen
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Dan Zou
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Zhiyao He
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry, Sichuan Engineering Laboratory for Plant-Sourced Drug and Sichuan Research Center for Drug Precision Industrial Technology, West China School of Pharmacy, Sichuan University, Chengdu, Sichuan, China
| | - Ting Xu
- Department of Pharmacy, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
4
|
Yang Y, Liu Y, Chen Y, Luo D, Xu K, Zhang L. Artificial intelligence for predicting treatment responses in autoimmune rheumatic diseases: advancements, challenges, and future perspectives. Front Immunol 2024; 15:1477130. [PMID: 39502698 PMCID: PMC11534874 DOI: 10.3389/fimmu.2024.1477130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 10/03/2024] [Indexed: 11/08/2024] Open
Abstract
Autoimmune rheumatic diseases (ARD) present a significant global health challenge characterized by a rising prevalence. These highly heterogeneous diseases involve complex pathophysiological mechanisms, leading to variable treatment efficacies across individuals. This variability underscores the need for personalized and precise treatment strategies. Traditionally, clinical practices have depended on empirical treatment selection, which often results in delays in effective disease management and can cause irreversible damage to multiple organs. Such delays significantly affect patient quality of life and prognosis. Artificial intelligence (AI) has recently emerged as a transformative tool in rheumatology, offering new insights and methodologies. Current research explores AI's capabilities in diagnosing diseases, stratifying risks, assessing prognoses, and predicting treatment responses in ARD. These developments in AI offer the potential for more precise and targeted treatment strategies, fostering optimism for enhanced patient outcomes. This paper critically reviews the latest AI advancements for predicting treatment responses in ARD, highlights the current state of the art, identifies ongoing challenges, and proposes directions for future research. By capitalizing on AI's capabilities, researchers and clinicians are poised to develop more personalized and effective interventions, improving care and outcomes for patients with ARD.
Collapse
Affiliation(s)
- Yanli Yang
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| | - Yang Liu
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| | - Yu Chen
- Department of Emergency Medicine, Xinzhou People’s Hospital, Xinzhou, China
| | - Di Luo
- Department of Health Management, Guangdong Second Provincial General Hospital, Guangzhou, China
| | - Ke Xu
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| | - Liyun Zhang
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| |
Collapse
|
5
|
Liu X, Radojčić MR, Huang Z, Shi B, Li G, Chen L. Antidepressants for chronic pain management: considerations from predictive modeling and personalized medicine perspectives. FRONTIERS IN PAIN RESEARCH 2024; 5:1359024. [PMID: 38385140 PMCID: PMC10879562 DOI: 10.3389/fpain.2024.1359024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 01/26/2024] [Indexed: 02/23/2024] Open
Affiliation(s)
- Xinyue Liu
- Department of Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Maja R. Radojčić
- Division of Psychology and Mental Health, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| | - Ziye Huang
- Department of Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Baoyi Shi
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, United States
| | - Ge Li
- Department of Public Health, Tianjin University of Traditional Chinese Medicine, Tianjin, China
| | - Lingxiao Chen
- Department of Orthopaedics, Shandong University Centre for Orthopaedics, Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, China
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, China
- Sydney Musculoskeletal Health, School of Health Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
6
|
Qin ZM, Liang SQ, Long JX, Deng JM, Wei X, Yang ML, Tang SJ, Li HL. Importance of GWAS Risk Loci and Clinical Data in Predicting Asthma Using Machine-learning Approaches. Comb Chem High Throughput Screen 2024; 27:400-407. [PMID: 37278039 DOI: 10.2174/1386207326666230602161939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 04/17/2023] [Accepted: 05/04/2023] [Indexed: 06/07/2023]
Abstract
INTRODUCTION To understand the risk factors of asthma, we combined genome-wide association study (GWAS) risk loci and clinical data in predicting asthma using machine-learning approaches. METHODS A case-control study with 123 asthmatics and 100 controls was conducted in the Zhuang population in Guangxi. GWAS risk loci were detected using polymerase chain reaction, and clinical data were collected. Machine-learning approaches were used to identify the major factors that contribute to asthma. RESULTS A total of 14 GWAS risk loci with clinical data were analyzed on the basis of 10 times the 10-fold cross-validation for all machine-learning models. Using GWAS risk loci or clinical data, the best performances exhibited area under the curve (AUC) values of 64.3% and 71.4%, respectively. Combining GWAS risk loci and clinical data, the XGBoost established the best model with an AUC of 79.7%, indicating that the combination of genetics and clinical data can enable improved performance. We then sorted the importance of features and found the top six risk factors for predicting asthma to be rs3117098, rs7775228, family history, rs2305480, rs4833095, and body mass index. CONCLUSION Asthma-prediction models based on GWAS risk loci and clinical data can accurately predict asthma, and thus provide insights into the disease pathogenesis.
Collapse
Affiliation(s)
- Zan-Mei Qin
- Department of Respiratory and Critical Care Medicine, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Si-Qiao Liang
- Department of Respiratory and Critical Care Medicine, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Jian-Xiong Long
- Department of Epidemiology and Health Statistics, School of Public Health of Guangxi Medical University, Nanning, Guangxi, China
| | - Jing-Min Deng
- Department of Respiratory and Critical Care Medicine, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Xuan Wei
- Department of Respiratory and Critical Care Medicine, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Mei-Ling Yang
- Department of Respiratory and Critical Care Medicine, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Shao-Jie Tang
- School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, Shanxi, 710121, China
- Xi'an Key Laboratory of Advanced Controlling and Intelligent Processing (ACIP), Xi'an, Shanxi, 710121, China
| | - Hai-Li Li
- Department of Respiratory and Critical Care Medicine, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| |
Collapse
|
7
|
Su K, Yuan X, Huang Y, Yuan Q, Yang M, Sun J, Li S, Long X, Liu L, Li T, Yuan Z. Improved Prediction of Knee Osteoarthritis by the Machine Learning Model XGBoost. Indian J Orthop 2023; 57:1667-1677. [PMID: 37766962 PMCID: PMC10519887 DOI: 10.1007/s43465-023-00936-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 06/19/2023] [Indexed: 09/29/2023]
Abstract
Objectives The accurate prediction of osteoarthritis (OA) severity in patients can be helpful to make the proper decision of intervention. This study aims to build up a powerful model to assess predictive risk factors and severity of knee osteoarthritis (KOA) in the clinical scenario. Methods A total of 4796 KOA cases and 1205 features were selected by feature selections from the public OA database, Osteoarthritis Initiative (OAI). Six machine learning-based models were constructed and compared for the accuracy of OA prediction. The gradient-boosting decision tree was used to identify important prediction features in the extreme gradient boosting (XGBoost) model. The performance of models was evaluated by F1-score. Results Twenty features were determined as predictors for KOA risk and severity, including the subject characteristics, knee symptoms/risk factors and physical exam. The XGBoost model demonstrated 100% prediction accuracy for 54.7% of examined samples, and the remaining 45.3% of samples showed Kellgren and Lawrence (KL) gradings very close to the actual levels. It showed the highest prediction accuracy with an F1-score of 0.553 among the tested six models. Conclusions We demonstrate that the XGBoost is the best model for the prediction of KOA severity in the six examined models. In addition, 20 risk features were determined as the essential predictors of KOA, including the physical exam, knee symptoms/risk factors and subject characteristics, which may be useful for the identification of high-risk KOA cases and for making appropriate treatment decisions as well.
Collapse
Affiliation(s)
- Kui Su
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Higher Education Mega Center, 100 Outside Ring West Road, Guangzhou, 510006 People’s Republic of China
| | - Xin Yuan
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Higher Education Mega Center, 100 Outside Ring West Road, Guangzhou, 510006 People’s Republic of China
| | - Yukai Huang
- Department of Rheumatology and Immunology, Guangdong Second Provincial General Hospital, Guangzhou, 510317 People’s Republic of China
| | - Qian Yuan
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Higher Education Mega Center, 100 Outside Ring West Road, Guangzhou, 510006 People’s Republic of China
| | - Minghui Yang
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Higher Education Mega Center, 100 Outside Ring West Road, Guangzhou, 510006 People’s Republic of China
| | - Jianwu Sun
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Higher Education Mega Center, 100 Outside Ring West Road, Guangzhou, 510006 People’s Republic of China
| | - Shuyi Li
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Higher Education Mega Center, 100 Outside Ring West Road, Guangzhou, 510006 People’s Republic of China
| | - Xinyi Long
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Higher Education Mega Center, 100 Outside Ring West Road, Guangzhou, 510006 People’s Republic of China
| | - Lang Liu
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Higher Education Mega Center, 100 Outside Ring West Road, Guangzhou, 510006 People’s Republic of China
| | - Tianwang Li
- Department of Rheumatology and Immunology, Guangdong Second Provincial General Hospital, Guangzhou, 510317 People’s Republic of China
| | - Zhengqiang Yuan
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Higher Education Mega Center, 100 Outside Ring West Road, Guangzhou, 510006 People’s Republic of China
| |
Collapse
|
8
|
Madrid-García A, Merino-Barbancho B, Rodríguez-González A, Fernández-Gutiérrez B, Rodríguez-Rodríguez L, Menasalvas-Ruiz E. Understanding the role and adoption of artificial intelligence techniques in rheumatology research: An in-depth review of the literature. Semin Arthritis Rheum 2023; 61:152213. [PMID: 37315379 DOI: 10.1016/j.semarthrit.2023.152213] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 04/28/2023] [Accepted: 05/02/2023] [Indexed: 06/16/2023]
Abstract
The major and upward trend in the number of published research related to rheumatic and musculoskeletal diseases, in which artificial intelligence plays a key role, has exhibited the interest of rheumatology researchers in using these techniques to answer their research questions. In this review, we analyse the original research articles that combine both worlds in a five- year period (2017-2021). In contrast to other published papers on the same topic, we first studied the review and recommendation articles that were published during that period, including up to October 2022, as well as the publication trends. Secondly, we review the published research articles and classify them into one of the following categories: disease identification and prediction, disease classification, patient stratification and disease subtype identification, disease progression and activity, treatment response, and predictors of outcomes. Thirdly, we provide a table with illustrative studies in which artificial intelligence techniques have played a central role in more than twenty rheumatic and musculoskeletal diseases. Finally, the findings of the research articles, in terms of disease and/or data science techniques employed, are highlighted in a discussion. Therefore, the present review aims to characterise how researchers are applying data science techniques in the rheumatology medical field. The most immediate conclusions that can be drawn from this work are: multiple and novel data science techniques have been used in a wide range of rheumatic and musculoskeletal diseases including rare diseases; the sample size and the data type used are heterogeneous, and new technical approaches are expected to arrive in the short-middle term.
Collapse
Affiliation(s)
- Alfredo Madrid-García
- Grupo de Patología Musculoesquelética. Hospital Clínico San Carlos, Prof. Martin Lagos s/n, Madrid, 28040, Spain; Escuela Técnica Superior de Ingenieros de Telecomunicación. Universidad Politécnica de Madrid, Avenida Complutense, 30, Madrid, 28040, Spain.
| | - Beatriz Merino-Barbancho
- Escuela Técnica Superior de Ingenieros de Telecomunicación. Universidad Politécnica de Madrid, Avenida Complutense, 30, Madrid, 28040, Spain
| | | | - Benjamín Fernández-Gutiérrez
- Grupo de Patología Musculoesquelética. Hospital Clínico San Carlos, Prof. Martin Lagos s/n, Madrid, 28040, Spain
| | - Luis Rodríguez-Rodríguez
- Grupo de Patología Musculoesquelética. Hospital Clínico San Carlos, Prof. Martin Lagos s/n, Madrid, 28040, Spain
| | - Ernestina Menasalvas-Ruiz
- Centro de Tecnología Biomédica. Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain
| |
Collapse
|
9
|
Huang Y, Pfeiffer SM, Zhang Q. Primary tumor type prediction based on US nationwide genomic profiling data in 13,522 patients. Comput Struct Biotechnol J 2023; 21:3865-3874. [PMID: 37593720 PMCID: PMC10432138 DOI: 10.1016/j.csbj.2023.07.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 07/16/2023] [Accepted: 07/25/2023] [Indexed: 08/19/2023] Open
Abstract
Timely and accurate primary tumor diagnosis is critical, and misdiagnoses and delays may cause undue health and economic burden. To predict primary tumor types based on genomics data from a de-identified US nationwide clinico-genomic database (CGDB), the XGBoost-based Clinico-Genomic Machine Learning Model (XC-GeM) was developed to predict 13 primary tumor types based on data from 12,060 patients in the CGDB, derived from routine clinical comprehensive genomic profiling (CGP) testing and chart-confirmed electronic health records (EHRs). The SHapley Additive exPlanations method was used to interpret model predictions. XC-GeM reached an outstanding area under the curve (AUC) of 0.965 and Matthew's correlation coefficient (MCC) of 0.742 in the holdout validation dataset. In the independent validation cohort of 955 patients, XC-GeM reached 0.954 AUC and 0.733 MCC and made correct predictions in 77% of non-small cell lung cancer (NSCLC), 86% of colorectal cancer, and 84% of breast cancer patients. Top predictors for the overall model (e.g. tumor mutational burden (TMB), gender, and KRAS alteration), and for specific tumor types (e.g., TMB and EGFR alteration for NSCLC) were supported by published studies. XC-GeM also achieved an excellent AUC of 0.880 and positive MCC of 0.540 in 507 patients with missing primary diagnosis. XC-GeM is the first algorithm to predict primary tumor type using US nationwide data from routine CGP testing and chart-confirmed EHRs, showing promising performance. It may enhance the accuracy and efficiency of cancer diagnoses, enabling more timely treatment choices and potentially leading to better outcomes.
Collapse
Affiliation(s)
| | | | - Qing Zhang
- Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080, United States
| |
Collapse
|
10
|
Kalyani P, Manasa Y, Ahammad SH, Suman M, Anwer TMK, Hossain MA, Rashed ANZ. Prediction of patient's neurological recovery from cervical spinal cord injury through XGBoost learning approach. EUROPEAN SPINE JOURNAL : OFFICIAL PUBLICATION OF THE EUROPEAN SPINE SOCIETY, THE EUROPEAN SPINAL DEFORMITY SOCIETY, AND THE EUROPEAN SECTION OF THE CERVICAL SPINE RESEARCH SOCIETY 2023; 32:2140-2148. [PMID: 37060466 DOI: 10.1007/s00586-023-07712-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 03/07/2023] [Accepted: 04/06/2023] [Indexed: 04/16/2023]
Abstract
Due to the diversity of patient characteristics, therapeutic approaches, and radiological findings, it can be challenging to predict outcomes based on neurological consequences accurately within cervical spinal cord injury (SCI) entities and based on machine learning (ML) technique. Accurate neurological outcomes prediction in the patients suffering with cervical spinal cord injury is challenging due to heterogeneity existing in patient characteristics and treatment strategies. Machine learning algorithms are proven technology for achieving greater prediction outcomes. Thus, the research employed machine learning model through extreme gradient boosting (XGBoost) for attaining superior accuracy and reliability followed with other MI algorithms for predicting the neurological outcomes. Besides, it generated a model of a data-driven approach with extreme gradient boosting to enhance fault detection techniques (XGBoost) efficiency rate. To forecast improvements within functionalities of neurological systems, the status has been monitored through motor position (ASIA [American Spinal Injury Association] Impairment Scale [AIS] D and E) followed by the method of prediction employing XGBoost, combined with decision tree for regression logistics. Thus, with the proposed XGBoost approach, the enhanced accuracy in reaching the outcome is 81.1%, and from other models such as decision tree (80%) and logistic regression (82%), in predicting outcomes of neurological improvements within cervical SCI patients. Considering the AUC, the XGBoost and decision tree valued with 0.867 and 0.787, whereas logistic regression showed 0.877. Therefore, the application of XGBoost for accurate prediction and decision-making in the categorization of pre-treatment in patients with cervical SCI has reached better development with this study.
Collapse
Affiliation(s)
- P Kalyani
- Department of ECE, Vardhaman College of Engineering, Hyderabad, India
| | - Y Manasa
- Department of CSE, Prasad V Potluri Siddhartha Institute of Technology, Vijayawada, India
| | - Sk Hasane Ahammad
- Department of ECE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, 522302, India
| | - M Suman
- Department of ECE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, 522302, India
| | - Twana Mohammed Kak Anwer
- Department of Physics, College of Education, Salahaddin University-Erbil, Erbil, Kurdistan Region, 44002, Iraq
| | - Md Amzad Hossain
- Institute of Theoretical Electrical Engineering, Faculty of Electrical Engineering and Information Technology, Ruhr University Bochum, 44801, Bochum, Germany.
- Department of Electrical and Electronic Engineering, Jashore University of Science and Technology, Jashore, 7408, Bangladesh.
| | - Ahmed Nabih Zaki Rashed
- Electronics and Electrical Communications Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, 32951, Egypt.
- Department of VLSI Microelectronics, Institute of Electronics and Communication Engineering, Saveetha School of Engineering, SIMATS, Chennai, Tamilnadu, 602105, India.
| |
Collapse
|
11
|
Yasrebi-de Kom IAR, Dongelmans DA, de Keizer NF, Jager KJ, Schut MC, Abu-Hanna A, Klopotowska JE. Electronic health record-based prediction models for in-hospital adverse drug event diagnosis or prognosis: a systematic review. J Am Med Inform Assoc 2023; 30:978-988. [PMID: 36805926 PMCID: PMC10114128 DOI: 10.1093/jamia/ocad014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/13/2023] [Accepted: 02/01/2023] [Indexed: 02/22/2023] Open
Abstract
OBJECTIVE We conducted a systematic review to characterize and critically appraise developed prediction models based on structured electronic health record (EHR) data for adverse drug event (ADE) diagnosis and prognosis in adult hospitalized patients. MATERIALS AND METHODS We searched the Embase and Medline databases (from January 1, 1999, to July 4, 2022) for articles utilizing structured EHR data to develop ADE prediction models for adult inpatients. For our systematic evidence synthesis and critical appraisal, we applied the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS). RESULTS Twenty-five articles were included. Studies often did not report crucial information such as patient characteristics or the method for handling missing data. In addition, studies frequently applied inappropriate methods, such as univariable screening for predictor selection. Furthermore, the majority of the studies utilized ADE labels that only described an adverse symptom while not assessing causality or utilizing a causal model. None of the models were externally validated. CONCLUSIONS Several challenges should be addressed before the models can be widely implemented, including the adherence to reporting standards and the adoption of best practice methods for model development and validation. In addition, we propose a reorientation of the ADE prediction modeling domain to include causality as a fundamental challenge that needs to be addressed in future studies, either through acquiring ADE labels via formal causality assessments or the usage of adverse event labels in combination with causal prediction modeling.
Collapse
Affiliation(s)
- Izak A R Yasrebi-de Kom
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
| | - Dave A Dongelmans
- Amsterdam Public Health, Amsterdam, The Netherlands
- Amsterdam UMC location University of Amsterdam, Department of Intensive Care Medicine, Amsterdam, The Netherlands
| | - Nicolette F de Keizer
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
| | - Kitty J Jager
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
- Amsterdam Cardiovascular Sciences, Pulmonary Hypertension & Thrombosis, Amsterdam, The Netherlands
| | - Martijn C Schut
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
- Amsterdam UMC location Vrije Universiteit Amsterdam, Department of Clinical Chemistry, Amsterdam, The Netherlands
| | - Ameen Abu-Hanna
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
| | - Joanna E Klopotowska
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
| |
Collapse
|
12
|
Liu L, Chang J, Zhang P, Ma Q, Zhang H, Sun T, Qiao H. A joint multi-modal learning method for early-stage knee osteoarthritis disease classification. Heliyon 2023; 9:e15461. [PMID: 37123973 PMCID: PMC10130858 DOI: 10.1016/j.heliyon.2023.e15461] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 04/05/2023] [Accepted: 04/10/2023] [Indexed: 05/02/2023] Open
Abstract
Osteoarthritis (OA) is a progressive and chronic disease. Identifying the early stages of OA disease is important for the treatment and care of patients. However, most state-of-the-art methods only use single-modal data to predict disease status, so that these methods usually ignore complementary information in multi-modal data. In this study, we develop an integrated multi-modal learning method (MMLM) that uses an interpretable strategy to select and fuse clinical, imaging, and demographic features to classify the grade of early-stage knee OA disease. MMLM applies XGboost and ResNet50 to extract two heterogeneous features from the clinical data and imaging data, respectively. And then we integrate these extracted features with demographic data. To avoid the negative effects of redundant features in a direct integration of multiple features, we propose a L1-norm-based optimization method (MMLM) to regularize the inter-correlations among the multiple features. MMLM was assessed using the Osteoarthritis Initiative (OAI) data set with machine learning classifiers. Extensive experiments demonstrate that MMLM improves the performance of the classifiers. Furthermore, a visual analysis of the important features in the multimodal data verified the relations among the modalities when classifying the grade of knee OA disease.
Collapse
|
13
|
Salem H, Huynh T, Topolski N, Mwangi B, Trivedi MH, Soares JC, Rush AJ, Selvaraj S. Temporal multi-step predictive modeling of remission in major depressive disorder using early stage treatment data; STAR*D based machine learning approach. J Affect Disord 2023; 324:286-293. [PMID: 36584711 PMCID: PMC9863277 DOI: 10.1016/j.jad.2022.12.076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 12/10/2022] [Accepted: 12/18/2022] [Indexed: 12/29/2022]
Abstract
BACKGROUND Artificial intelligence is currently being used to facilitate early disease detection, better understand disease progression, optimize medication/treatment dosages, and uncover promising novel treatments and potential outcomes. METHODS Utilizing the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) dataset, we built a machine learning model to predict depression remission rates using same clinical data as features for each of the first three antidepressant treatment steps in STAR*D. We only used early treatment data (baseline and first follow up) in each STAR*D step to temporally analyze predictive features of remission at the end of the step. RESULTS Our model showed significant prediction performance across the three treatment steps, At step 1, Model accuracy was 66 %; sensitivity-65 %, specificity-67 %, positive predictive value (PPV)-65.5 %, and negative predictive value (NPV)-66.6 %. At step 2, model accuracy was 71.3 %, sensitivity-74.3 %, specificity-69 %, PPV-64.5 %, and NPV-77.9 %. At step 3, accuracy reached 84.6 %; sensitivity-69 %, specificity-88.8 %, PPV-67 %, and NPV-91.1 %. Across all three steps, the early Quick Inventory of Depressive Symptomatology-Self-Report (QIDS-SR) scores were key elements in predicting the final treatment outcome. The model also identified key sociodemographic factors that predicted treatment remission at different steps. LIMITATIONS The retrospective design, lack of replication in an independent dataset, and the use of "a complete case analysis" model in our analysis. CONCLUSIONS This proof-of-concept study showed that using early treatment data, multi-step temporal prediction of depressive symptom remission results in clinically useful accuracy rates. Whether these predictive models are generalizable deserves further study.
Collapse
Affiliation(s)
- Haitham Salem
- Department of Psychiatry and Human Behavior (DPHB), Warren Alpert School of Medicine, Brown University, Providence, RI, USA
| | - Tung Huynh
- Louis Faillace Department of Psychiatry and Behavioral Science, McGovern Medical School, University of Texas Health Science Center, Houston, TX, USA
| | - Natasha Topolski
- Louis Faillace Department of Psychiatry and Behavioral Science, McGovern Medical School, University of Texas Health Science Center, Houston, TX, USA
| | - Benson Mwangi
- Louis Faillace Department of Psychiatry and Behavioral Science, McGovern Medical School, University of Texas Health Science Center, Houston, TX, USA
| | - Madhukar H Trivedi
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jair C Soares
- Louis Faillace Department of Psychiatry and Behavioral Science, McGovern Medical School, University of Texas Health Science Center, Houston, TX, USA
| | - A John Rush
- Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC, USA; Professor Emeritus, Duke-National University of Singapore, Singapore, Singapore
| | - Sudhakar Selvaraj
- Louis Faillace Department of Psychiatry and Behavioral Science, McGovern Medical School, University of Texas Health Science Center, Houston, TX, USA.
| |
Collapse
|
14
|
Eysenbach G, Chao HJ, Chiang YC, Chen HY. Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation. J Med Internet Res 2023; 25:e43734. [PMID: 36749620 PMCID: PMC9944157 DOI: 10.2196/43734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/25/2022] [Accepted: 01/16/2023] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Machine learning offers new solutions for predicting life-threatening, unpredictable amiodarone-induced thyroid dysfunction. Traditional regression approaches for adverse-effect prediction without time-series consideration of features have yielded suboptimal predictions. Machine learning algorithms with multiple data sets at different time points may generate better performance in predicting adverse effects. OBJECTIVE We aimed to develop and validate machine learning models for forecasting individualized amiodarone-induced thyroid dysfunction risk and to optimize a machine learning-based risk stratification scheme with a resampling method and readjustment of the clinically derived decision thresholds. METHODS This study developed machine learning models using multicenter, delinked electronic health records. It included patients receiving amiodarone from January 2013 to December 2017. The training set was composed of data from Taipei Medical University Hospital and Wan Fang Hospital, while data from Taipei Medical University Shuang Ho Hospital were used as the external test set. The study collected stationary features at baseline and dynamic features at the first, second, third, sixth, ninth, 12th, 15th, 18th, and 21st months after amiodarone initiation. We used 16 machine learning models, including extreme gradient boosting, adaptive boosting, k-nearest neighbor, and logistic regression models, along with an original resampling method and 3 other resampling methods, including oversampling with the borderline-synthesized minority oversampling technique, undersampling-edited nearest neighbor, and over- and undersampling hybrid methods. The model performance was compared based on accuracy; Precision, recall, F1-score, geometric mean, area under the curve of the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). Feature importance was determined by the best model. The decision threshold was readjusted to identify the best cutoff value and a Kaplan-Meier survival analysis was performed. RESULTS The training set contained 4075 patients from Taipei Medical University Hospital and Wan Fang Hospital, of whom 583 (14.3%) developed amiodarone-induced thyroid dysfunction, while the external test set included 2422 patients from Taipei Medical University Shuang Ho Hospital, of whom 275 (11.4%) developed amiodarone-induced thyroid dysfunction. The extreme gradient boosting oversampling machine learning model demonstrated the best predictive outcomes among all 16 models. The accuracy; Precision, recall, F1-score, G-mean, AUPRC, and AUROC were 0.923, 0.632, 0.756, 0.688, 0.845, 0.751, and 0.934, respectively. After readjusting the cutoff, the best value was 0.627, and the F1-score reached 0.699. The best threshold was able to classify 286 of 2422 patients (11.8%) as high-risk subjects, among which 275 were true-positive patients in the testing set. A shorter treatment duration; higher levels of thyroid-stimulating hormone and high-density lipoprotein cholesterol; and lower levels of free thyroxin, alkaline phosphatase, and low-density lipoprotein were the most important features. CONCLUSIONS Machine learning models combined with resampling methods can predict amiodarone-induced thyroid dysfunction and serve as a support tool for individualized risk prediction and clinical decision support.
Collapse
Affiliation(s)
| | - Horng-Jiun Chao
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - Yi-Chun Chiang
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.,Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| | - Hsiang-Yin Chen
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.,Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
15
|
Ahmadzadeh M, Cosco TD, Best JR, Christie GJ, DiPaola S. Predictors of the rate of cognitive decline in older adults using machine learning. PLoS One 2023; 18:e0280029. [PMID: 36867596 PMCID: PMC9983884 DOI: 10.1371/journal.pone.0280029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 12/20/2022] [Indexed: 03/04/2023] Open
Abstract
BACKGROUND The longitudinal rates of cognitive decline among aging populations are heterogeneous. Few studies have investigated the possibility of implementing prognostic models to predict cognitive changes with the combination of categorical and continuous data from multiple domains. OBJECTIVE Implement a multivariate robust model to predict longitudinal cognitive changes over 12 years among older adults and to identify the most significant predictors of cognitive changes using machine learning techniques. METHOD In total, data of 2733 participants aged 50-85 years from the English Longitudinal Study of Ageing are included. Two categories of cognitive changes were determined including minor cognitive decliners (2361 participants, 86.4%) and major cognitive decliners (372 participants, 13.6%) over 12 years from wave 2 (2004-2005) to wave 8 (2016-2017). Machine learning methods were used to implement the predictive models and to identify the predictors of cognitive decline using 43 baseline features from seven domains including sociodemographic, social engagement, health, physical functioning, psychological, health-related behaviors, and baseline cognitive tests. RESULTS The model predicted future major cognitive decliners from those with the minor cognitive decline with a relatively high performance. The overall AUC, sensitivity, and specificity of prediction were 72.84%, 78.23%, and 67.41%, respectively. Furthermore, the top 7 ranked features with an important role in predicting major vs minor cognitive decliners included age, employment status, socioeconomic status, self-rated memory changes, immediate word recall, the feeling of loneliness, and vigorous physical activity. In contrast, the five least important baseline features consisted of smoking, instrumental activities of daily living, eye disease, life satisfaction, and cardiovascular disease. CONCLUSION The present study indicated the possibility of identifying individuals at high risk of future major cognitive decline as well as potential risk/protective factors of cognitive decline among older adults. The findings could assist in improving the effective interventions to delay cognitive decline among aging populations.
Collapse
Affiliation(s)
- Maryam Ahmadzadeh
- School of Interactive Arts and Technology, Simon Fraser University, Surrey, BC, Canada
| | - Theodore David Cosco
- Gerontology Research Center, Simon Fraser University, Vancouver, BC, Canada
- Oxford Institute of Population Ageing, University of Oxford, Oxford, United Kingdom
| | - John R. Best
- Gerontology Research Center, Simon Fraser University, Vancouver, BC, Canada
| | | | - Steve DiPaola
- School of Interactive Arts and Technology, Simon Fraser University, Surrey, BC, Canada
- * E-mail:
| |
Collapse
|
16
|
Jha M, Gupta R, Saxena R. A Precise Method to Detect Post-COVID-19 Pulmonary Fibrosis Through Extreme Gradient Boosting. SN COMPUTER SCIENCE 2023; 4:89. [PMID: 36532633 PMCID: PMC9746584 DOI: 10.1007/s42979-022-01526-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 11/17/2022] [Indexed: 12/15/2022]
Abstract
The association of pulmonary fibrosis with COVID-19 patients has now been adequately acknowledged and caused a significant number of mortalities around the world. As automatic disease detection has now become a crucial assistant to clinicians to obtain fast and precise results, this study proposes an architecture based on an ensemble machine learning approach to detect COVID-19-associated pulmonary fibrosis. The paper discusses Extreme Gradient Boosting (XGBoost) and its tuned hyper-parameters to optimize the performance for the prediction of severe COVID-19 patients who developed pulmonary fibrosis after 90 days of hospital discharge. A dataset comprising Electronic Health Record (EHR) and corresponding High-resolution computed tomography (HRCT) images of chest of 1175 COVID-19 patients has been considered, which involves 725 pulmonary fibrosis cases and 450 normal lung cases. The experimental results achieved an accuracy of 98%, precision of 99% and sensitivity of 99%. The proposed model is the first in literature to help clinicians in keeping a record of severe COVID-19 cases for analyzing the risk of pulmonary fibrosis through EHRs and HRCT scans, leading to less chance of life-threatening conditions.
Collapse
Affiliation(s)
- Manika Jha
- Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, 201309 Noida, India
| | - Richa Gupta
- Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, 201309 Noida, India
| | - Rajiv Saxena
- Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, 201309 Noida, India
| |
Collapse
|
17
|
Shokrollahi P, Chaves JMZ, Lam JPH, Sharma A, Pal D, Bahrami N, Chaudhari AS, Loening AM. Radiology Decision Support System for Selecting Appropriate CT Imaging Titles Using Machine Learning Techniques Based on Electronic Medical Records. IEEE ACCESS 2023; 11:99222-99236. [DOI: 10.1109/access.2023.3314380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
Affiliation(s)
- Peyman Shokrollahi
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| | | | - Jonathan P. H. Lam
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Avishkar Sharma
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| | | | | | - Akshay S. Chaudhari
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Andreas M. Loening
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| |
Collapse
|
18
|
Kim HR, Sung M, Park JA, Jeong K, Kim HH, Lee S, Park YR. Analyzing adverse drug reaction using statistical and machine learning methods: A systematic review. Medicine (Baltimore) 2022; 101:e29387. [PMID: 35758373 PMCID: PMC9276413 DOI: 10.1097/md.0000000000029387] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 04/12/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Adverse drug reactions (ADRs) are unintended negative drug-induced responses. Determining the association between drugs and ADRs is crucial, and several methods have been proposed to demonstrate this association. This systematic review aimed to examine the analytical tools by considering original articles that utilized statistical and machine learning methods for detecting ADRs. METHODS A systematic literature review was conducted based on articles published between 2015 and 2020. The keywords used were statistical, machine learning, and deep learning methods for detecting ADR signals. The study was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement (PRISMA) guidelines. RESULTS We reviewed 72 articles, of which 51 and 21 addressed statistical and machine learning methods, respectively. Electronic medical record (EMR) data were exclusively analyzed using the regression method. For FDA Adverse Event Reporting System (FAERS) data, components of the disproportionality method were preferable. DrugBank was the most used database for machine learning. Other methods accounted for the highest and supervised methods accounted for the second highest. CONCLUSIONS Using the 72 main articles, this review provides guidelines on which databases are frequently utilized and which analysis methods can be connected. For statistical analysis, >90% of the cases were analyzed by disproportionate or regression analysis with each spontaneous reporting system (SRS) data or electronic medical record (EMR) data; for machine learning research, however, there was a strong tendency to analyze various data combinations. Only half of the DrugBank database was occupied, and the k-nearest neighbor method accounted for the greatest proportion.
Collapse
Affiliation(s)
- Hae Reong Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, South Korea
| | - MinDong Sung
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, South Korea
| | - Ji Ae Park
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, South Korea
| | - Kyeongseob Jeong
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, South Korea
| | - Ho Heon Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, South Korea
| | - Suehyun Lee
- Department of Biomedical Informatics, Konyang University College of Medicine, Daejeon, South Korea
| | - Yu Rang Park
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, South Korea
| |
Collapse
|
19
|
Rahman MS, Chowdhury AH, Amrin M. Accuracy comparison of ARIMA and XGBoost forecasting models in predicting the incidence of COVID-19 in Bangladesh. PLOS GLOBAL PUBLIC HEALTH 2022; 2:e0000495. [PMID: 36962227 PMCID: PMC10021465 DOI: 10.1371/journal.pgph.0000495] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 04/27/2022] [Indexed: 04/19/2023]
Abstract
Accurate predictive time series modelling is important in public health planning and response during the emergence of a novel pandemic. Therefore, the aims of the study are three-fold: (a) to model the overall trend of COVID-19 confirmed cases and deaths in Bangladesh; (b) to generate a short-term forecast of 8 weeks of COVID-19 cases and deaths; (c) to compare the predictive accuracy of the Autoregressive Integrated Moving Average (ARIMA) and eXtreme Gradient Boosting (XGBoost) for precise modelling of non-linear features and seasonal trends of the time series. The data were collected from the onset of the epidemic in Bangladesh from the Directorate General of Health Service (DGHS) and Institute of Epidemiology, Disease Control and Research (IEDCR). The daily confirmed cases and deaths of COVID-19 of 633 days in Bangladesh were divided into several training and test sets. The ARIMA and XGBoost models were established using those training data, and the test sets were used to evaluate each model's ability to forecast and finally averaged all the predictive performances to choose the best model. The predictive accuracy of the models was assessed using the mean absolute error (MAE), mean percentage error (MPE), root mean square error (RMSE) and mean absolute percentage error (MAPE). The findings reveal the existence of a nonlinear trend and weekly seasonality in the dataset. The average error measures of the ARIMA model for both COVID-19 confirmed cases and deaths were lower than XGBoost model. Hence, in our study, the ARIMA model performed better than the XGBoost model in predicting COVID-19 confirmed cases and deaths in Bangladesh. The suggested prediction model might play a critical role in estimating the spread of a novel pandemic in Bangladesh and similar countries.
Collapse
|
20
|
Gupta UC, Gupta SC, Gupta SS. Clinical Overview of Arthritis with a Focus on Management Options and Preventive Lifestyle Measures for Its Control. CURRENT NUTRITION & FOOD SCIENCE 2022. [DOI: 10.2174/1573401318666220204095629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
ABSTRACT:
Arthritis is the spectrum of conditions that cause swelling and tenderness of one or more body joints with key symptoms of joint pain and stiffness. Its progression is closely tied to age. Although there are a number of arthritis types, such as, ankylosing, gout, joint infections, juvenile idiopathic, reactive and septic; the two most common types are osteoarthritis and rheumatoid arthritis. Osteoarthritis causes the articulating smooth cartilage that covers the ends of bones, where they form a joint, to breakdown. Rheumatoid arthritis is a disease in which the immune system attacks joints, beginning with the cartilaginous lining of the joints. The latter is considered a systemic disease, i.e. affecting many parts of the body, but the respiratory system is involved in 10 to 20 % of all mortality. Osteoarthritis is one of the leading causes of disability globally. Several preventive measures to control arthritis have been suggested, such as the use of analgesics, non-steroid anti-inflammatory drugs, moderate to vigorous physical activity and exercise, reducing sedentary hours, getting adequate sleep and maintaining a healthy body weight. Foods including, a Mediterranean diet rich in fruits and vegetables, fish oil, medicinal plants and microbiota are vital protective methods. The intake of vitamins such as A and C, minerals e.g., selenium and zinc; poly unsaturated and n-3 fatty acids is also a significant preventive measures.
Collapse
Affiliation(s)
- Umesh Chandra Gupta
- Emeritus Research Scientist, Agriculture and Agri-food Canada, Charlottetown Research and Development Centre, 440 University Avenue, Charlottetown, PE, C1A 4N6, Canada
| | - Subhas Chandra Gupta
- Chairman and Professor, The Department of Plastic Surgery, Loma Linda University School of Medicine, Loma Linda, California, 92354, USA
| | | |
Collapse
|
21
|
Syrowatka A, Song W, Amato MG, Foer D, Edrees H, Co Z, Kuznetsova M, Dulgarian S, Seger DL, Simona A, Bain PA, Purcell Jackson G, Rhee K, Bates DW. Key use cases for artificial intelligence to reduce the frequency of adverse drug events: a scoping review. Lancet Digit Health 2022; 4:e137-e148. [PMID: 34836823 DOI: 10.1016/s2589-7500(21)00229-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 08/13/2021] [Accepted: 09/10/2021] [Indexed: 12/31/2022]
Abstract
Adverse drug events (ADEs) represent one of the most prevalent types of health-care-related harm, and there is substantial room for improvement in the way that they are currently predicted and detected. We conducted a scoping review to identify key use cases in which artificial intelligence (AI) could be leveraged to reduce the frequency of ADEs. We focused on modern machine learning techniques and natural language processing. 78 articles were included in the scoping review. Studies were heterogeneous and applied various AI techniques covering a wide range of medications and ADEs. We identified several key use cases in which AI could contribute to reducing the frequency and consequences of ADEs, through prediction to prevent ADEs and early detection to mitigate the effects. Most studies (73 [94%] of 78) assessed technical algorithm performance, and few studies evaluated the use of AI in clinical settings. Most articles (58 [74%] of 78) were published within the past 5 years, highlighting an emerging area of study. Availability of new types of data, such as genetic information, and access to unstructured clinical notes might further advance the field.
Collapse
Affiliation(s)
- Ania Syrowatka
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Wenyu Song
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Mary G Amato
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Massachusetts College of Pharmacy and Health Sciences, Boston, MA, USA
| | - Dinah Foer
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Division of Allergy and Clinical Immunology, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Heba Edrees
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Massachusetts College of Pharmacy and Health Sciences, Boston, MA, USA
| | - Zoe Co
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Sevan Dulgarian
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Diane L Seger
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Aurélien Simona
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Paul A Bain
- Countway Library of Medicine, Harvard Medical School, Boston, MA, USA
| | - Gretchen Purcell Jackson
- IBM Watson Health, Cambridge, MA, USA; Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Kyu Rhee
- IBM Watson Health, Cambridge, MA, USA; CVS Health, Wellesley Hills, MA, USA
| | - David W Bates
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA; Harvard T H Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
22
|
Ku Y, Kwon SB, Yoon JH, Mun SK, Chang M. Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors. Clin Exp Otorhinolaryngol 2022; 15:168-176. [PMID: 34990536 PMCID: PMC9149237 DOI: 10.21053/ceo.2021.01536] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/19/2021] [Indexed: 11/24/2022] Open
Abstract
Objectives Because climatic and air-pollution factors are known to influence the occurrence of respiratory diseases, we used these factors to develop machine learning models for predicting the occurrence of respiratory diseases. Methods We obtained the daily number of respiratory disease patients in Seoul. We used climatic and air-pollution factors to predict the daily number of patients treated for respiratory diseases per 10,000 inhabitants. We applied the relief-based feature selection algorithm to evaluate the importance of feature selection. We used the gradient boosting and Gaussian process regression (GPR) methods, respectively, to develop two different prediction models. We also employed the holdout cross-validation method, in which 75% of the data was used to train the model, and the remaining 25% was used to test the trained model. We determined the estimated number of respiratory disease patients by applying the developed prediction models to the test set. To evaluate the performance of each model, we calculated the coefficient of determination (R2) and the root mean square error (RMSE) between the original and estimated numbers of respiratory disease patients. We used the Shapley Additive exPlanations (SHAP) approach to interpret the estimated output of each machine learning model. Results Features with negative weights in the relief-based algorithm were excluded. When applying gradient boosting to unseen test data, R2 and RMSE were 0.68 and 13.8, respectively. For GPR, the R2 and RMSE were 0.67 and 13.9, respectively. SHAP analysis showed that reductions in average temperature, daylight duration, average humidity, sulfur dioxide (SO2), total solar insolation amount, and temperature difference increased the number of respiratory disease patients, whereas increases in atmospheric pressure, carbon monoxide (CO), and particulate matter ≤2.5 μm in aerodynamic diameter (PM2.5) increased the number of respiratory disease patients. Conclusion We successfully developed models for predicting the occurrence of respiratory diseases using climatic and air-pollution factors. These models could evolve into public warning systems.
Collapse
Affiliation(s)
- Yunseo Ku
- Department of Biomedical Engineering, Chungnam National University College of Medicine, Daejeon, Korea
| | - Soon Bin Kwon
- Department of Neurology, Columbia University, New York, NY, USA
| | - Jeong-Hwa Yoon
- Institute of Health Policy and Management, Medical Research Center, Seoul National University, Seoul, Korea
| | - Seog-Kyun Mun
- Department of Otorhinolaryngology-Head and Neck Surgery, Chung-Ang University College of Medicine, Seoul, Korea
| | - Munyoung Chang
- Department of Otorhinolaryngology-Head and Neck Surgery, Chung-Ang University College of Medicine, Seoul, Korea
| |
Collapse
|
23
|
Zhao Y, Jia L, Jia R, Han H, Feng C, Li X, Wei Z, Wang H, Zhang H, Pan S, Wang J, Guo X, Yu Z, Li X, Wang Z, Chen W, Li J, Li T. A New Time-Window Prediction Model For Traumatic Hemorrhagic Shock Based on Interpretable Machine Learning. Shock 2022; 57:48-56. [PMID: 34905530 PMCID: PMC8663521 DOI: 10.1097/shk.0000000000001842] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 07/26/2021] [Indexed: 12/29/2022]
Abstract
ABSTRACT Early warning prediction of traumatic hemorrhagic shock (THS) can greatly reduce patient mortality and morbidity. We aimed to develop and validate models with different stepped feature sets to predict THS in advance. From the PLA General Hospital Emergency Rescue Database and Medical Information Mart for Intensive Care III, we identified 604 and 1,614 patients, respectively. Two popular machine learning algorithms (i.e., extreme gradient boosting [XGBoost] and logistic regression) were applied. The area under the receiver operating characteristic curve (AUROC) was used to evaluate the performance of the models. By analyzing the feature importance based on XGBoost, we found that features in vital signs (VS), routine blood (RB), and blood gas analysis (BG) were the most relevant to THS (0.292, 0.249, and 0.225, respectively). Thus, the stepped relationships existing in them were revealed. Furthermore, the three stepped feature sets (i.e., VS, VS + RB, and VS + RB + sBG) were passed to the two machine learning algorithms to predict THS in the subsequent T hours (where T = 3, 2, 1, or 0.5), respectively. Results showed that the XGBoost model performance was significantly better than the logistic regression. The model using vital signs alone achieved good performance at the half-hour time window (AUROC = 0.935), and the performance was increased when laboratory results were added, especially when the time window was 1 h (AUROC = 0.950 and 0.968, respectively). These good-performing interpretable models demonstrated acceptable generalization ability in external validation, which could flexibly and rollingly predict THS T hours (where T = 0.5, 1) prior to clinical recognition. A prospective study is necessary to determine the clinical utility of the proposed THS prediction models.
Collapse
Affiliation(s)
- Yuzhuo Zhao
- Department of Emergency, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Lijing Jia
- Department of Emergency, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Ruiqi Jia
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Hui Han
- Department of Emergency, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Cong Feng
- Department of Emergency, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Xueyan Li
- Management School, Beijing Union University, Beijing, China
| | | | - Hongxin Wang
- Department of Emergency, Armed Police Characteristic Medical Center, Tianjin, China
| | - Heng Zhang
- Department of Emergency, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Shuxiao Pan
- College of Computer Science and Artificial Intelligence, Wenzhou University
| | - Jiaming Wang
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Xin Guo
- Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Zheyuan Yu
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Xiucheng Li
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Zhaohong Wang
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Wei Chen
- Department of Emergency, The Third Medical Center of Chinese PLA General Hospital, Beijing, China
- Hainan Hospital of Chinese PLA General Hospital, Sanya, China
| | - Jing Li
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Tanshi Li
- Department of Emergency, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
24
|
Jia L, Wei Z, Zhang H, Wang J, Jia R, Zhou M, Li X, Zhang H, Chen X, Yu Z, Wang Z, Li X, Li T, Liu X, Liu P, Chen W, Li J, He K. An interpretable machine learning model based on a quick pre-screening system enables accurate deterioration risk prediction for COVID-19. Sci Rep 2021; 11:23127. [PMID: 34848736 PMCID: PMC8633326 DOI: 10.1038/s41598-021-02370-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 11/08/2021] [Indexed: 01/04/2023] Open
Abstract
A high-performing interpretable model is proposed to predict the risk of deterioration in coronavirus disease 2019 (COVID-19) patients. The model was developed using a cohort of 3028 patients diagnosed with COVID-19 and exhibiting common clinical symptoms that were internally verified (AUC 0.8517, 95% CI 0.8433, 0.8601). A total of 15 high risk factors for deterioration and their approximate warning ranges were identified. This included prothrombin time (PT), prothrombin activity, lactate dehydrogenase, international normalized ratio, heart rate, body-mass index (BMI), D-dimer, creatine kinase, hematocrit, urine specific gravity, magnesium, globulin, activated partial thromboplastin time, lymphocyte count (L%), and platelet count. Four of these indicators (PT, heart rate, BMI, HCT) and comorbidities were selected for a streamlined combination of indicators to produce faster results. The resulting model showed good predictive performance (AUC 0.7941 95% CI 0.7926, 0.8151). A website for quick pre-screening online was also developed as part of the study.
Collapse
Affiliation(s)
- Lijing Jia
- Department of Emergency, The First Medical Center to Chinese People's Liberation Army General Hospital, Beijing, China
| | - Zijian Wei
- Washington University in St. Louis, St. Louis, USA
| | - Heng Zhang
- Department of Emergency, The First Medical Center to Chinese People's Liberation Army General Hospital, Beijing, China
| | - Jiaming Wang
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Ruiqi Jia
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Manhong Zhou
- Department of Emergency, Affiliated Hospital of Zunyi Medical University, Zunyi, China
| | - Xueyan Li
- School of Management, Beijing Union University, Beijing, China
| | - Hankun Zhang
- School of E-Business and Logistics, Beijing Technology and Business University, Beijing, China
| | - Xuedong Chen
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Zheyuan Yu
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Zhaohong Wang
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Xiucheng Li
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Tingting Li
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Xiangge Liu
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Pei Liu
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Wei Chen
- Department of Emergency, The Third Medical Center to Chinese People's Liberation Army General Hospital, Beijing, China.
| | - Jing Li
- School of Economics and Management, Beijing Jiaotong University, Beijing, China.
| | - Kunlun He
- Department of Emergency, The First Medical Center to Chinese People's Liberation Army General Hospital, Beijing, China.
| |
Collapse
|
25
|
Krittanawong C, Virk HUH, Kumar A, Aydar M, Wang Z, Stewart MP, Halperin JL. Machine learning and deep learning to predict mortality in patients with spontaneous coronary artery dissection. Sci Rep 2021; 11:8992. [PMID: 33903608 PMCID: PMC8076284 DOI: 10.1038/s41598-021-88172-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 03/23/2021] [Indexed: 12/30/2022] Open
Abstract
Machine learning (ML) and deep learning (DL) can successfully predict high prevalence events in very large databases (big data), but the value of this methodology for risk prediction in smaller cohorts with uncommon diseases and infrequent events is uncertain. The clinical course of spontaneous coronary artery dissection (SCAD) is variable, and no reliable methods are available to predict mortality. Based on the hypothesis that machine learning (ML) and deep learning (DL) techniques could enhance the identification of patients at risk, we applied a deep neural network to information available in electronic health records (EHR) to predict in-hospital mortality in patients with SCAD. We extracted patient data from the EHR of an extensive urban health system and applied several ML and DL models using candidate clinical variables potentially associated with mortality. We partitioned the data into training and evaluation sets with cross-validation. We estimated model performance based on the area under the receiver-operator characteristics curve (AUC) and balanced accuracy. As sensitivity analyses, we examined results limited to cases with complete clinical information available. We identified 375 SCAD patients of which mortality during the index hospitalization was 11.5%. The best-performing DL algorithm identified in-hospital mortality with AUC 0.98 (95% CI 0.97-0.99), compared to other ML models (P < 0.0001). For prediction of mortality using ML models in patients with SCAD, the AUC ranged from 0.50 with the random forest method (95% CI 0.41-0.58) to 0.95 with the AdaBoost model (95% CI 0.93-0.96), with intermediate performance using logistic regression, decision tree, support vector machine, K-nearest neighbors, and extreme gradient boosting methods. A deep neural network model was associated with higher predictive accuracy and discriminative power than logistic regression or ML models for identification of patients with ACS due to SCAD prone to early mortality.
Collapse
Affiliation(s)
- Chayakrit Krittanawong
- Section of Cardiology, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.
- Icahn School of Medicine at Mount Sinai, The the Zena and Michael A. Wiener Cardiovascular Institute, Mount Sinai Heart, New York, NY, USA.
| | - Hafeez Ul Hassan Virk
- Department of Cardiovascular Diseases, Case Western Reserve University, University Hospitals Cleveland Medical Center, Cleveland, OH, USA
| | - Anirudh Kumar
- Heart and Vascular Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Mehmet Aydar
- Department of Computer Science, Kent State University, Kent, OH, USA
| | - Zhen Wang
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
- Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Matthew P Stewart
- The Institute of Applied and Computational Sciences, Harvard University, Boston, MA, USA
- School of Engineering and Applied Sciences, Harvard University, Boston, MA, USA
| | - Jonathan L Halperin
- Icahn School of Medicine at Mount Sinai, The the Zena and Michael A. Wiener Cardiovascular Institute, Mount Sinai Heart, New York, NY, USA
| |
Collapse
|
26
|
Alim M, Ye GH, Guan P, Huang DS, Zhou BS, Wu W. Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study. BMJ Open 2020; 10:e039676. [PMID: 33293308 PMCID: PMC7722837 DOI: 10.1136/bmjopen-2020-039676] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
OBJECTIVES Human brucellosis is a public health problem endangering health and property in China. Predicting the trend and the seasonality of human brucellosis is of great significance for its prevention. In this study, a comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more suitable for predicting the occurrence of brucellosis in mainland China. DESIGN Time-series study. SETTING Mainland China. METHODS Data on human brucellosis in mainland China were provided by the National Health and Family Planning Commission of China. The data were divided into a training set and a test set. The training set was composed of the monthly incidence of human brucellosis in mainland China from January 2008 to June 2018, and the test set was composed of the monthly incidence from July 2018 to June 2019. The mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were used to evaluate the effects of model fitting and prediction. RESULTS The number of human brucellosis patients in mainland China increased from 30 002 in 2008 to 40 328 in 2018. There was an increasing trend and obvious seasonal distribution in the original time series. For the training set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)12 model were 338.867, 450.223 and 10.323, respectively, and the MAE, RSME and MAPE of the XGBoost model were 189.332, 262.458 and 4.475, respectively. For the test set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)12 model were 529.406, 586.059 and 17.676, respectively, and the MAE, RSME and MAPE of the XGBoost model were 249.307, 280.645 and 7.643, respectively. CONCLUSIONS The performance of the XGBoost model was better than that of the ARIMA model. The XGBoost model is more suitable for prediction cases of human brucellosis in mainland China.
Collapse
Affiliation(s)
- Mirxat Alim
- Department of Epidemiology, China Medical University, Shenyang, China
| | - Guo-Hua Ye
- Department of Epidemiology, China Medical University, Shenyang, China
| | - Peng Guan
- Department of Epidemiology, China Medical University, Shenyang, China
| | - De-Sheng Huang
- Department of Mathematics, China Medical University, Shenyang, China
| | - Bao-Sen Zhou
- Department of Epidemiology, China Medical University, Shenyang, China
| | - Wei Wu
- Department of Epidemiology, China Medical University, Shenyang, China
| |
Collapse
|
27
|
Challa AP, Beam AL, Shen M, Peryea T, Lavieri RR, Lippmann ES, Aronoff DM. Machine learning on drug-specific data to predict small molecule teratogenicity. Reprod Toxicol 2020; 95:148-158. [PMID: 32428651 PMCID: PMC7577422 DOI: 10.1016/j.reprotox.2020.05.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 05/04/2020] [Accepted: 05/06/2020] [Indexed: 12/23/2022]
Abstract
Pregnant women are an especially vulnerable population, given the sensitivity of a developing fetus to chemical exposures. However, prescribing behavior for the gravid patient is guided on limited human data and conflicting cases of adverse outcomes due to the exclusion of pregnant populations from randomized, controlled trials. These factors increase risk for adverse drug outcomes and reduce quality of care for pregnant populations. Herein, we propose the application of artificial intelligence to systematically predict the teratogenicity of a prescriptible small molecule from information inherent to the drug. Using unsupervised and supervised machine learning, our model probes all small molecules with known structure and teratogenicity data published in research-amenable formats to identify patterns among structural, meta-structural, and in vitro bioactivity data for each drug and its teratogenicity score. With this workflow, we discovered three chemical functionalities that predispose a drug towards increased teratogenicity and two moieties with potentially protective effects. Our models predict three clinically-relevant classes of teratogenicity with AUC = 0.8 and nearly double the predictive accuracy of a blind control for the same task, suggesting successful modeling. We also present extensive barriers to translational research that restrict data-driven studies in pregnancy and therapeutically "orphan" pregnant populations. Collectively, this work represents a first-in-kind platform for the application of computing to study and predict teratogenicity.
Collapse
Affiliation(s)
- Anup P Challa
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville 37203, TN, United States; Department of Biomedical Informatics, Harvard Medical School, Boston 02115, MA, United States; National Center for Advancing Translational Sciences, National Institutes of Health, Rockville 20850, MD, United States; Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville 37212, TN, United States.
| | - Andrew L Beam
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston 02115, MA, United States; Department of Biomedical Informatics, Harvard Medical School, Boston 02115, MA, United States
| | - Min Shen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville 20850, MD, United States
| | - Tyler Peryea
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville 20850, MD, United States
| | - Robert R Lavieri
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville 37203, TN, United States
| | - Ethan S Lippmann
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville 37212, TN, United States
| | - David M Aronoff
- Division of Infectious Diseases, Department of Medicine, Vanderbilt University Medical Center, Nashville 37203, TN, United States; Department of Obstetrics and Gynecology, Vanderbilt University Medical Center, Nashville 37203, TN, United States; Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville 37203, TN, United States
| |
Collapse
|
28
|
Inoue T, Ichikawa D, Ueno T, Cheong M, Inoue T, Whetstone WD, Endo T, Nizuma K, Tominaga T. XGBoost, a Machine Learning Method, Predicts Neurological Recovery in Patients with Cervical Spinal Cord Injury. Neurotrauma Rep 2020; 1:8-16. [PMID: 34223526 PMCID: PMC8240917 DOI: 10.1089/neur.2020.0009] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The accurate prediction of neurological outcomes in patients with cervical spinal cord injury (SCI) is difficult because of heterogeneity in patient characteristics, treatment strategies, and radiographic findings. Although machine learning algorithms may increase the accuracy of outcome predictions in various fields, limited information is available on their efficacy in the management of SCI. We analyzed data from 165 patients with cervical SCI, and extracted important factors for predicting prognoses. Extreme gradient boosting (XGBoost) as a machine learning model was applied to assess the reliability of a machine learning algorithm to predict neurological outcomes compared with that of conventional methodology, such as a logistic regression or decision tree. We used regularly obtainable data as predictors, such as demographics, magnetic resonance variables, and treatment strategies. Predictive tools, including XGBoost, a logistic regression, and a decision tree, were applied to predict neurological improvements in the functional motor status (ASIA [American Spinal Injury Association] Impairment Scale [AIS] D and E) 6 months after injury. We evaluated predictive performance, including accuracy and the area under the receiver operating characteristic curve (AUC). Regarding predictions of neurological improvements in patients with cervical SCI, XGBoost had the highest accuracy (81.1%), followed by the logistic regression (80.6%) and the decision tree (78.8%). Regarding AUC, the logistic regression showed 0.877, followed by XGBoost (0.867) and the decision tree (0.753). XGBoost reliably predicted neurological alterations in patients with cervical SCI. The utilization of predictive machine learning algorithms may enhance personalized management choices through pre-treatment categorization of patients.
Collapse
Affiliation(s)
- Tomoo Inoue
- Department of Neurosurgery, National Health Organization Sendai Medical Center, Sendai, Miyagi, Japan
| | | | | | - Maxwell Cheong
- Department of Radiology, Stanford University School of Medicine, Palo Alto, California, USA
| | - Takashi Inoue
- Department of Neurosurgery, National Health Organization Sendai Medical Center, Sendai, Miyagi, Japan
| | - William D. Whetstone
- Department of Emergency Medicine, University of California, San Francisco, San Francisco, California, USA
| | - Toshiki Endo
- Department of Neurosurgery, National Health Organization Sendai Medical Center, Sendai, Miyagi, Japan
- Department of Neurosurgery, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Kuniyasu Nizuma
- Department of Neurosurgery, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
- Department of Neurosurgical Engineering and Translational Neuroscience, Graduate School of Biomedical Engineering, Tohoku University, Sendai, Miyagi, Japan
- Department of Neurosurgical Engineering and Translational Neuroscience, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Teiji Tominaga
- Department of Neurosurgery, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| |
Collapse
|
29
|
Yan C, Duan G, Pan Y, Wu FX, Wang J. DDIGIP: predicting drug-drug interactions based on Gaussian interaction profile kernels. BMC Bioinformatics 2019; 20:538. [PMID: 31874609 PMCID: PMC6929542 DOI: 10.1186/s12859-019-3093-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Accepted: 09/10/2019] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND A drug-drug interaction (DDI) is defined as a drug effect modified by another drug, which is very common in treating complex diseases such as cancer. Many studies have evidenced that some DDIs could be an increase or a decrease of the drug effect. However, the adverse DDIs maybe result in severe morbidity and even morality of patients, which also cause some drugs to withdraw from the market. As the multi-drug treatment becomes more and more common, identifying the potential DDIs has become the key issue in drug development and disease treatment. However, traditional biological experimental methods, including in vitro and vivo, are very time-consuming and expensive to validate new DDIs. With the development of high-throughput sequencing technology, many pharmaceutical studies and various bioinformatics data provide unprecedented opportunities to study DDIs. RESULT In this study, we propose a method to predict new DDIs, namely DDIGIP, which is based on Gaussian Interaction Profile (GIP) kernel on the drug-drug interaction profiles and the Regularized Least Squares (RLS) classifier. In addition, we also use the k-nearest neighbors (KNN) to calculate the initial relational score in the presence of new drugs via the chemical, biological, phenotypic data of drugs. We compare the prediction performance of DDIGIP with other competing methods via the 5-fold cross validation, 10-cross validation and de novo drug validation. CONLUSION In 5-fold cross validation and 10-cross validation, DDRGIP method achieves the area under the ROC curve (AUC) of 0.9600 and 0.9636 which are better than state-of-the-art method (L1 Classifier ensemble method) of 0.9570 and 0.9599. Furthermore, for new drugs, the AUC value of DDIGIP in de novo drug validation reaches 0.9262 which also outperforms the other state-of-the-art method (Weighted average ensemble method) of 0.9073. Case studies and these results demonstrate that DDRGIP is an effective method to predict DDIs while being beneficial to drug development and disease treatment.
Collapse
Affiliation(s)
- Cheng Yan
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
- School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000 China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA30302 USA
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9 Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
| |
Collapse
|
30
|
Predicting Hepatitis B Virus Infection Based on Health Examination Data of Community Population. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:ijerph16234842. [PMID: 31810204 PMCID: PMC6926879 DOI: 10.3390/ijerph16234842] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 11/26/2019] [Accepted: 11/27/2019] [Indexed: 01/14/2023]
Abstract
Despite a decline in the prevalence of hepatitis B in China, the disease burden remains high. Large populations unaware of infection risk often fail to meet the ideal treatment window, resulting in poor prognosis. The purpose of this study was to develop and evaluate models identifying high-risk populations who should be tested for hepatitis B surface antigen. Data came from a large community-based health screening, including 97,173 individuals, with an average age of 54.94. A total of 33 indicators were collected as model predictors, including demographic characteristics, routine blood indicators, and liver function. Borderline-Synthetic minority oversampling technique (SMOTE) was conducted to preprocess the data and then four predictive models, namely, the extreme gradient boosting (XGBoost), random forest (RF), decision tree (DT), and logistic regression (LR) algorithms, were developed. The positive rate of hepatitis B surface antigen (HBsAg) was 8.27%. The area under the receiver operating characteristic curves for XGBoost, RF, DT, and LR models were 0.779, 0.752, 0.619, and 0.742, respectively. The Borderline-SMOTE XGBoost combined model outperformed the other models, which correctly predicted 13,637/19,435 cases (sensitivity 70.8%, specificity 70.1%), and the variable importance plot of XGBoost model indicated that age was of high importance. The prediction model can be used to accurately identify populations at high risk of hepatitis B infection that should adopt timely appropriate medical treatment measures.
Collapse
|