1
|
Tran SD, Lin J, Galvez C, Rasmussen LV, Pacheco J, Perottino GM, Rahbari KJ, Miller CD, John JD, Theros J, Vogel K, Dinh PV, Malik S, Ramzan U, Tegtmeyer K, Mohindra N, Johnson JL, Luo Y, Kho A, Sosman J, Walunas TL. Rapid identification of inflammatory arthritis and associated adverse events following immune checkpoint therapy: a machine learning approach. Front Immunol 2024; 15:1331959. [PMID: 38558818 PMCID: PMC10978703 DOI: 10.3389/fimmu.2024.1331959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 02/26/2024] [Indexed: 04/04/2024] Open
Abstract
Introduction Immune checkpoint inhibitor-induced inflammatory arthritis (ICI-IA) poses a major clinical challenge to ICI therapy for cancer, with 13% of cases halting ICI therapy and ICI-IA being difficult to identify for timely referral to a rheumatologist. The objective of this study was to rapidly identify ICI-IA patients in clinical data and assess associated immune-related adverse events (irAEs) and risk factors. Methods We conducted a retrospective study of the electronic health records (EHRs) of 89 patients who developed ICI-IA out of 2451 cancer patients who received ICI therapy at Northwestern University between March 2011 to January 2021. Logistic regression and random forest machine learning models were trained on all EHR diagnoses, labs, medications, and procedures to identify ICI-IA patients and EHR codes indicating ICI-IA. Multivariate logistic regression was then used to test associations between ICI-IA and cancer type, ICI regimen, and comorbid irAEs. Results Logistic regression and random forest models identified ICI-IA patients with accuracies of 0.79 and 0.80, respectively. Key EHR features from the random forest model included ICI-IA relevant features (joint pain, steroid prescription, rheumatoid factor tests) and features suggesting comorbid irAEs (thyroid function tests, pruritus, triamcinolone prescription). Compared to 871 adjudicated ICI patients who did not develop arthritis, ICI-IA patients had higher odds of developing cutaneous (odds ratio [OR]=2.66; 95% Confidence Interval [CI] 1.63-4.35), endocrine (OR=2.09; 95% CI 1.15-3.80), or gastrointestinal (OR=2.88; 95% CI 1.76-4.72) irAEs adjusting for demographics, cancer type, and ICI regimen. Melanoma (OR=1.99; 95% CI 1.08-3.65) and renal cell carcinoma (OR=2.03; 95% CI 1.06-3.84) patients were more likely to develop ICI-IA compared to lung cancer patients. Patients on nivolumab+ipilimumab were more likely to develop ICI-IA compared to patients on pembrolizumab (OR=1.86; 95% CI 1.01-3.43). Discussion Our machine learning models rapidly identified patients with ICI-IA in EHR data and elucidated clinical features indicative of comorbid irAEs. Patients with ICI-IA were significantly more likely to also develop cutaneous, endocrine, and gastrointestinal irAEs during their clinical course compared to ICI therapy patients without ICI-IA.
Collapse
Affiliation(s)
- Steven D. Tran
- Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Jean Lin
- Department of Medicine, Division of Rheumatology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Carlos Galvez
- Hematology and Oncology, University of Illinois Health, Chicago, IL, United States
| | - Luke V. Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Jennifer Pacheco
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | | | - Kian J. Rahbari
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Charles D. Miller
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Jordan D. John
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Jonathan Theros
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Kelly Vogel
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Patrick V. Dinh
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Sara Malik
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Umar Ramzan
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Kyle Tegtmeyer
- Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Nisha Mohindra
- Department of Medicine, Division of Oncology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
- Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, IL, United States
| | - Jodi L. Johnson
- Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, IL, United States
- Departments of Pathology and Dermatology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Abel Kho
- Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
- Department of Medicine, Division of General Internal Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Jeffrey Sosman
- Department of Medicine, Division of Oncology, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
- Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, IL, United States
| | - Theresa L. Walunas
- Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
- Department of Medicine, Division of General Internal Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| |
Collapse
|
2
|
Lim H, Park Y, Hong JH, Yoo KB, Seo KD. Use of machine learning techniques for identifying ischemic stroke instead of the rule-based methods: a nationwide population-based study. Eur J Med Res 2024; 29:6. [PMID: 38173022 PMCID: PMC10763197 DOI: 10.1186/s40001-023-01594-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Many studies have evaluated stroke using claims data; most of these studies have defined ischemic stroke using an operational definition following the rule-based method. Rule-based methods tend to overestimate the number of patients with ischemic stroke. OBJECTIVES We aimed to identify an appropriate algorithm for identifying stroke by applying machine learning (ML) techniques to analyze the claims data. METHODS We obtained the data from the Korean National Health Insurance Service database, which is linked to the Ilsan Hospital database (n = 30,897). The performance of prediction models (extreme gradient boosting [XGBoost] or gated recurrent unit [GRU]) was evaluated using the area under the receiver operating characteristic curve (AUROC), the area under precision-recall curve (AUPRC), and calibration curve. RESULTS In total, 30,897 patients were enrolled in this study, 3145 of whom (10.18%) had ischemic stroke. XGBoost, a tree-based ML technique, had the AUROC was 94.46% and AUPRC was 92.80%. GRU showed the highest accuracy (99.81%), precision (99.92%) and recall (99.69%). CONCLUSIONS We proposed recurrent neural network-based deep learning techniques to improve stroke phenotyping. This can be expected to produce rapid and more accurate results than the rule-based methods.
Collapse
Affiliation(s)
- Hyunsun Lim
- Department of Research and Analysis, National Health Insurance Service Ilsan Hospital, Goyang, Republic of Korea
| | - Youngmin Park
- Department of Family Medicine, National Health Insurance Service Ilsan Hospital, Goyang, Republic of Korea
| | - Jung Hwa Hong
- Department of Research and Analysis, National Health Insurance Service Ilsan Hospital, Goyang, Republic of Korea
| | - Ki-Bong Yoo
- Division of Health Administration, Yonsei University, Wonju, Republic of Korea
| | - Kwon-Duk Seo
- Department of Neurology, National Health Insurance Service Ilsan Hospital, Goyang, Republic of Korea.
- Department of Neurology, Graduate School of Medicine, Kangwon National University, Chuncheon, Republic of Korea.
| |
Collapse
|
3
|
La Cava WG, Lee PC, Ajmal I, Ding X, Solanki P, Cohen JB, Moore JH, Herman DS. A flexible symbolic regression method for constructing interpretable clinical prediction models. NPJ Digit Med 2023; 6:107. [PMID: 37277550 PMCID: PMC10241925 DOI: 10.1038/s41746-023-00833-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 05/05/2023] [Indexed: 06/07/2023] Open
Abstract
Machine learning (ML) models trained for triggering clinical decision support (CDS) are typically either accurate or interpretable but not both. Scaling CDS to the panoply of clinical use cases while mitigating risks to patients will require many ML models be intuitively interpretable for clinicians. To this end, we adapted a symbolic regression method, coined the feature engineering automation tool (FEAT), to train concise and accurate models from high-dimensional electronic health record (EHR) data. We first present an in-depth application of FEAT to classify hypertension, hypertension with unexplained hypokalemia, and apparent treatment-resistant hypertension (aTRH) using EHR data for 1200 subjects receiving longitudinal care in a large healthcare system. FEAT models trained to predict phenotypes adjudicated by chart review had equivalent or higher discriminative performance (p < 0.001) and were at least three times smaller (p < 1 × 10-6) than other potentially interpretable models. For aTRH, FEAT generated a six-feature, highly discriminative (positive predictive value = 0.70, sensitivity = 0.62), and clinically intuitive model. To assess the generalizability of the approach, we tested FEAT on 25 benchmark clinical phenotyping tasks using the MIMIC-III critical care database. Under comparable dimensionality constraints, FEAT's models exhibited higher area under the receiver-operating curve scores than penalized linear models across tasks (p < 6 × 10-6). In summary, FEAT can train EHR prediction models that are both intuitively interpretable and accurate, which should facilitate safe and effective scaling of ML-triggered CDS to the panoply of potential clinical use cases and healthcare practices.
Collapse
Affiliation(s)
- William G La Cava
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Paul C Lee
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Imran Ajmal
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Xiruo Ding
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Priyanka Solanki
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jordana B Cohen
- Division of Renal-Electrolyte and Hypertension, Department of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason H Moore
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel S Herman
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
4
|
Chushig-Muzo D, Soguero-Ruiz C, Miguel Bohoyo PD, Mora-Jiménez I. Learning and visualizing chronic latent representations using electronic health records. BioData Min 2022; 15:18. [PMID: 36064616 PMCID: PMC9446539 DOI: 10.1186/s13040-022-00303-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 07/27/2022] [Indexed: 12/03/2022] Open
Abstract
Background Nowadays, patients with chronic diseases such as diabetes and hypertension have reached alarming numbers worldwide. These diseases increase the risk of developing acute complications and involve a substantial economic burden and demand for health resources. The widespread adoption of Electronic Health Records (EHRs) is opening great opportunities for supporting decision-making. Nevertheless, data extracted from EHRs are complex (heterogeneous, high-dimensional and usually noisy), hampering the knowledge extraction with conventional approaches. Methods We propose the use of the Denoising Autoencoder (DAE), a Machine Learning (ML) technique allowing to transform high-dimensional data into latent representations (LRs), thus addressing the main challenges with clinical data. We explore in this work how the combination of LRs with a visualization method can be used to map the patient data in a two-dimensional space, gaining knowledge about the distribution of patients with different chronic conditions. Furthermore, this representation can be also used to characterize the patient’s health status evolution, which is of paramount importance in the clinical setting. Results To obtain clinical LRs, we considered real-world data extracted from EHRs linked to the University Hospital of Fuenlabrada in Spain. Experimental results showed the great potential of DAEs to identify patients with clinical patterns linked to hypertension, diabetes and multimorbidity. The procedure allowed us to find patients with the same main chronic disease but different clinical characteristics. Thus, we identified two kinds of diabetic patients with differences in their drug therapy (insulin and non-insulin dependant), and also a group of women affected by hypertension and gestational diabetes. We also present a proof of concept for mapping the health status evolution of synthetic patients when considering the most significant diagnoses and drugs associated with chronic patients. Conclusion Our results highlighted the value of ML techniques to extract clinical knowledge, supporting the identification of patients with certain chronic conditions. Furthermore, the patient’s health status progression on the two-dimensional space might be used as a tool for clinicians aiming to characterize health conditions and identify their more relevant clinical codes. Supplementary Information The online version contains supplementary material available at (10.1186/s13040-022-00303-z).
Collapse
Affiliation(s)
- David Chushig-Muzo
- Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain
| | - Cristina Soguero-Ruiz
- Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain
| | | | - Inmaculada Mora-Jiménez
- Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain.
| |
Collapse
|
5
|
Stroganov O, Fedarovich A, Wong E, Skovpen Y, Pakhomova E, Grishagin I, Fedarovich D, Khasanova T, Merberg D, Szalma S, Bryant J. Mapping of UK Biobank clinical codes: Challenges and possible solutions. PLoS One 2022; 17:e0275816. [PMID: 36525430 PMCID: PMC9757572 DOI: 10.1371/journal.pone.0275816] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 09/23/2022] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVE The UK Biobank provides a rich collection of longitudinal clinical data coming from different healthcare providers and sources in England, Wales, and Scotland. Although extremely valuable and available to a wide research community, the heterogeneous dataset contains inconsistent medical terminology that is either aligned to several ontologies within the same category or unprocessed. To make these data useful to a research community, data cleaning, curation, and standardization are needed. Significant efforts to perform data reformatting, mapping to any selected ontologies (such as SNOMED-CT) and harmonization are required from any data user to integrate UK Biobank hospital inpatient and self-reported data, data from various registers with primary care (GP) data. The integrated clinical data would provide a more comprehensive picture of one's medical history. MATERIALS AND METHODS We evaluated several approaches to map GP clinical Read codes to International Classification of Diseases (ICD) and Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) terminologies. The results were compared, mapping inconsistencies were flagged, a quality category was assigned to each mapping to evaluate overall mapping quality. RESULTS We propose a curation and data integration pipeline for harmonizing diagnosis. We also report challenges identified in mapping Read codes from UK Biobank GP tables to ICD and SNOMED CT. DISCUSSION AND CONCLUSION Some of the challenges-the lack of precise one-to-one mapping between ontologies or the need for additional ontology to fully map terms-are general reflecting trade-offs to be made at different steps. Other challenges are due to automatic mapping and can be overcome by leveraging existing mappings, supplemented with automated and manual curation.
Collapse
Affiliation(s)
- Oleg Stroganov
- Rancho BioSciences, LLC, San Diego, California, United States of America
- * E-mail:
| | - Alena Fedarovich
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - Emily Wong
- Takeda Development Center Americas, Inc., San Diego, California, United States of America
| | - Yulia Skovpen
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - Elena Pakhomova
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - Ivan Grishagin
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - Dzmitry Fedarovich
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - Tania Khasanova
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - David Merberg
- Takeda Development Center Americas, Inc., Cambridge, Massachusetts, United States of America
| | - Sándor Szalma
- Takeda Development Center Americas, Inc., San Diego, California, United States of America
| | - Julie Bryant
- Rancho BioSciences, LLC, San Diego, California, United States of America
| |
Collapse
|
6
|
Du J, Zeng D, Li Z, Liu J, Lv M, Chen L, Zhang D, Ji S. An interpretable outcome prediction model based on electronic health records and hierarchical attention. INT J INTELL SYST 2021. [DOI: 10.1002/int.22697] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Juan Du
- Department of Gastroenterology First Affiliated Hospital of Zhejiang University School of Medicine Hangzhou China
| | - Dajian Zeng
- College of Computer Science and Technology Zhejiang University of Technology Hangzhou China
| | - Zhao Li
- Department of Gastroenterology and Hepatobiliary Ri Zhao Hospital of Traditional Chinese Medicine Rizhao China
| | - Jingxuan Liu
- College of Computer Science and Technology Zhejiang University of Technology Hangzhou China
| | - Mingqi Lv
- College of Computer Science and Technology Zhejiang University of Technology Hangzhou China
| | - Ling Chen
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Dan Zhang
- Key Laboratory of Reproductive Genetics (Ministry of Education), Women's Hospital, School of Medicine Zhejiang University Hangzhou China
| | - Shouling Ji
- College of Computer Science and Technology Zhejiang University Hangzhou China
| |
Collapse
|
7
|
Zhao Y, Fu S, Bielinski SJ, Decker PA, Chamberlain AM, Roger VL, Liu H, Larson NB. Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation. J Med Internet Res 2021; 23:e22951. [PMID: 33683212 PMCID: PMC7985804 DOI: 10.2196/22951] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 08/25/2020] [Accepted: 01/20/2021] [Indexed: 11/29/2022] Open
Abstract
Background Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events. Objective The aim of this study was to develop a machine learning–based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. Methods The algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314). Results Among the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86% (43/50; 95% CI 0.74-0.93) and a negative predictive value of 96% (96/100). For subtype identification, we achieved an accuracy of 83% in the AF cohort and 80% in the general population sample. Conclusions We developed and validated a machine learning–based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions.
Collapse
Affiliation(s)
- Yiqing Zhao
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Suzette J Bielinski
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Paul A Decker
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Alanna M Chamberlain
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Veronique L Roger
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Nicholas B Larson
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|