Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Alzoubi H, Alzubi R, Ramzan N, West D, Al-hadhrami T, Alazab M. A Review of Automatic Phenotyping Approaches using Electronic Health Records. Electronics 2019;8:1235. [DOI: 10.3390/electronics8111235] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

For:	Alzoubi H, Alzubi R, Ramzan N, West D, Al-hadhrami T, Alazab M. A Review of Automatic Phenotyping Approaches using Electronic Health Records. Electronics 2019;8:1235. [DOI: 10.3390/electronics8111235] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Number

Cited by Other Article(s)

Ding S, Zhang S, Hu X, Zou N. Identify and mitigate bias in electronic phenotyping: A comprehensive study from computational perspective. J Biomed Inform 2024;156:104671. [PMID: 38876452 DOI: 10.1016/j.jbi.2024.104671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 05/26/2024] [Accepted: 06/05/2024] [Indexed: 06/16/2024]

Cao T, Brady V, Whisenant M, Wang X, Gu Y, Wu H. Toward Reliable Symptom Coding in Electronic Health Records for Symptom Assessment and Research: Identification and Categorization of International Classification of Diseases, Ninth Revision, Clinical Modification Symptom Codes. Comput Inform Nurs 2024:00024665-990000000-00209. [PMID: 38968447 DOI: 10.1097/cin.0000000000001146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2024]

Li Y, Yang AY, Marelli A, Li Y. MixEHR-SurG: A joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records. J Biomed Inform 2024;153:104638. [PMID: 38631461 DOI: 10.1016/j.jbi.2024.104638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 03/07/2024] [Accepted: 04/03/2024] [Indexed: 04/19/2024]

Abstract

Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as mortality or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.

Collapse

Kaufmann B, Busby D, Das CK, Tillu N, Menon M, Tewari AK, Gorin MA. Validation of a Zero-shot Learning Natural Language Processing Tool to Facilitate Data Abstraction for Urologic Research. Eur Urol Focus 2024;10:279-287. [PMID: 38278710 DOI: 10.1016/j.euf.2024.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/18/2023] [Accepted: 01/15/2024] [Indexed: 01/28/2024]

Abstract

BACKGROUND

Urologic research often requires data abstraction from unstructured text contained within the electronic health record. A number of natural language processing (NLP) tools have been developed to aid with this time-consuming task; however, the generalizability of these tools is typically limited by the need for task-specific training.

OBJECTIVE

To describe the development and validation of a zero-shot learning NLP tool to facilitate data abstraction from unstructured text for use in downstream urologic research.

DESIGN, SETTING, AND PARTICIPANTS

An NLP tool based on the GPT-3.5 model from OpenAI was developed and compared with three physicians for time to task completion and accuracy for abstracting 14 unique variables from a set of 199 deidentified radical prostatectomy pathology reports. The reports were processed in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction.

INTERVENTION

A zero-shot learning NLP tool for data abstraction.

OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS

The tool was compared with the human abstractors in terms of superiority for data abstraction speed and noninferiority for accuracy.

RESULTS AND LIMITATIONS

The human abstractors required a median (interquartile range) of 93 s (72-122 s) per report for data abstraction, whereas the software required a median of 12 s (10-15 s) for the vectorized reports and 15 s (13-17 s) for the scanned reports (p < 0.001 for all paired comparisons). The accuracies of the three human abstractors were 94.7% (95% confidence interval [CI], 93.8-95.5%), 97.8% (95% CI, 97.2-98.3%), and 96.4% (95% CI, 95.6-97%) for the combined set of 2786 data points. The tool had accuracy of 94.2% (95% CI, 93.3-94.9%) for the vectorized reports and was noninferior to the human abstractors at a margin of -10% (α = 0.025). The tool had slightly lower accuracy of 88.7% (95% CI 87.5-89.9%) for the scanned reports, making it noninferior to two of three human abstractors.

CONCLUSIONS

The developed zero-shot learning NLP tool offers urologic researchers a highly generalizable and accurate method for data abstraction from unstructured text. An open access version of the tool is available for immediate use by the urologic community.

PATIENT SUMMARY

In this report, we describe the design and validation of an artificial intelligence tool for abstracting discrete data from unstructured notes contained within the electronic medical record. This freely available tool, which is based on the GPT-3.5 technology from OpenAI, is intended to facilitate research and scientific discovery by the urologic community.

Collapse

Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024;31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open

Jeffery AD, Fabbri D, Reeves RM, Matheny ME. Use of Noisy Labels as Weak Learners to Identify Incompletely Ascertainable Outcomes: A Feasibility Study with Opioid-Induced Respiratory Depression. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.29.24301963. [PMID: 38352435 PMCID: PMC10863026 DOI: 10.1101/2024.01.29.24301963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]

Abstract

Objective

Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts' heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence.

Materials and Methods

Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records.

Results

The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599).

Discussion

All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities.

Conclusion

Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.

Collapse

Clermont G. The Learning Electronic Health Record. Crit Care Clin 2023;39:689-700. [PMID: 37704334 DOI: 10.1016/j.ccc.2023.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]

Liu P, Wang Z, Liu N, Peres MA. A scoping review of the clinical application of machine learning in data-driven population segmentation analysis. J Am Med Inform Assoc 2023;30:1573-1582. [PMID: 37369006 PMCID: PMC10436153 DOI: 10.1093/jamia/ocad111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 06/08/2023] [Accepted: 06/16/2023] [Indexed: 06/29/2023] Open

Oommen C, Howlett-Prieto Q, Carrithers MD, Hier DB. Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records. Front Digit Health 2023;5:1075771. [PMID: 37383943 PMCID: PMC10294690 DOI: 10.3389/fdgth.2023.1075771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 05/26/2023] [Indexed: 06/30/2023] Open

Hossain E, Rana R, Higgins N, Soar J, Barua PD, Pisani AR, Turner K. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med 2023;155:106649. [PMID: 36805219 DOI: 10.1016/j.compbiomed.2023.106649] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/04/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023]

Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023;30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open

Abstract

OBJECTIVE

Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used.

MATERIALS AND METHODS

We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies.

RESULTS

Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions.

DISCUSSION

Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released.

CONCLUSION

Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.

Collapse

Alzubi R, Alzoubi H, Katsigiannis S, West D, Ramzan N. Automated Detection of Substance-Use Status and Related Information from Clinical Text. SENSORS (BASEL, SWITZERLAND) 2022;22:9609. [PMID: 36559979 PMCID: PMC9783118 DOI: 10.3390/s22249609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 11/21/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]

Rodríguez-Fernández JM, Loeb JA, Hier DB. It's time to change our documentation philosophy: writing better neurology notes without the burnout. Front Digit Health 2022;4:1063141. [PMID: 36518562 PMCID: PMC9742203 DOI: 10.3389/fdgth.2022.1063141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 11/10/2022] [Indexed: 08/23/2023] Open

Zou Y, Pesaranghader A, Song Z, Verma A, Buckeridge DL, Li Y. Modeling electronic health record data using an end-to-end knowledge-graph-informed topic model. Sci Rep 2022;12:17868. [PMID: 36284225 PMCID: PMC9596500 DOI: 10.1038/s41598-022-22956-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 10/21/2022] [Indexed: 01/20/2023] Open

Chen W, Abeyaratne A, Gorham G, George P, Karepalli V, Tran D, Brock C, Cass A. Development and validation of algorithms to identify patients with chronic kidney disease and related chronic diseases across the Northern Territory, Australia. BMC Nephrol 2022;23:320. [PMID: 36151531 PMCID: PMC9502610 DOI: 10.1186/s12882-022-02947-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/13/2022] [Indexed: 11/15/2022] Open

Abstract

BACKGROUND

Electronic health records can be used for population-wide identification and monitoring of disease. The Territory Kidney Care project developed algorithms to identify individuals with chronic kidney disease (CKD) and several commonly comorbid chronic diseases. This study aims to describe the development and validation of our algorithms for CKD, diabetes, hypertension, and cardiovascular disease. A secondary aim of the study was to describe data completeness of the Territory Kidney Care database.

METHODS

The Territory Kidney Care database consolidates electronic health records from multiple health services including public hospitals (n = 6) and primary care health services (> 60) across the Northern Territory, Australia. Using the database (n = 48,569) we selected a stratified random sample of patients (n = 288), which included individuals with mild to end-stage CKD. Diagnostic accuracy of the algorithms was tested against blinded manual chart reviews. Data completeness of the database was also described.

RESULTS

For CKD defined as CKD stage 1 or higher (eGFR of any level with albuminuria or persistent eGFR < 60 ml/min/1.732, including renal replacement therapy) overall algorithm sensitivity was 93% (95%CI 89 to 96%) and specificity was 73% (95%CI 64 to 82%). For CKD defined as CKD stage 3a or higher (eGFR < 60 ml/min/1.732) algorithm sensitivity and specificity were 93% and 97% respectively. Among the CKD 1 to 5 staging algorithms, the CKD stage 5 algorithm was most accurate with > 99% sensitivity and specificity. For related comorbidities - algorithm sensitivity and specificity results were 75% and 97% for diabetes; 85% and 88% for hypertension; and 79% and 96% for cardiovascular disease.

CONCLUSIONS

We developed and validated algorithms to identify CKD and related chronic diseases within electronic health records. Validation results showed that CKD algorithms have a high degree of diagnostic accuracy compared to traditional administrative codes. Our highly accurate algorithms present new opportunities in early kidney disease detection, monitoring, and epidemiological research.

Collapse

Kandaswamy S, Orenstein E, Quincer EM, Fernandez A, Gonzalez M, Lu L, Kamaleswaran R, Banerjee I, Jaggi P. Automated Identification of Immunocompromised Status in Critically Ill Children. Methods Inf Med 2022;61:46-54. [PMID: 35381616 DOI: 10.1055/a-1817-7208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Applying Machine Learning in Distributed Data Networks for Pharmacoepidemiologic and Pharmacovigilance Studies: Opportunities, Challenges, and Considerations. Drug Saf 2022;45:493-510. [PMID: 35579813 PMCID: PMC9112258 DOI: 10.1007/s40264-022-01158-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/13/2022] [Indexed: 01/28/2023]

Binkheder S, Asiri MA, Altowayan KW, Alshehri TM, Alzarie MF, Aldekhyyel RN, Almaghlouth IA, Almulhem JA. Real-World Evidence of COVID-19 Patients' Data Quality in the Electronic Health Records. Healthcare (Basel) 2021;9:1648. [PMID: 34946374 PMCID: PMC8701465 DOI: 10.3390/healthcare9121648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/18/2021] [Accepted: 11/25/2021] [Indexed: 11/19/2022] Open

Affiliation(s)

Samar Binkheder Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
Mohammed Ahmed Asiri Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.) Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
Khaled Waleed Altowayan Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.) Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
Turki Mohammed Alshehri Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.) Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
Mashhour Faleh Alzarie Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.) Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
Raniah N. Aldekhyyel Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
Ibrahim A. Almaghlouth Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
Jwaher A. Almulhem Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)

Collapse

Mahajan A, Deonarine A, Bernal A, Lyons G, Norgeot B. Developing the Total Health Profile, a Generalizable Unified Set of Multimorbidity Risk Scores Derived From Machine Learning for Broad Patient Populations: Retrospective Cohort Study. J Med Internet Res 2021;23:e32900. [PMID: 34842542 PMCID: PMC8665380 DOI: 10.2196/32900] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 09/15/2021] [Accepted: 09/18/2021] [Indexed: 11/16/2022] Open

Abstract

BACKGROUND

Multimorbidity clinical risk scores allow clinicians to quickly assess their patients' health for decision making, often for recommendation to care management programs. However, these scores are limited by several issues: existing multimorbidity scores (1) are generally limited to one data group (eg, diagnoses, labs) and may be missing vital information, (2) are usually limited to specific demographic groups (eg, age), and (3) do not formally provide any granularity in the form of more nuanced multimorbidity risk scores to direct clinician attention.

OBJECTIVE

Using diagnosis, lab, prescription, procedure, and demographic data from electronic health records (EHRs), we developed a physiologically diverse and generalizable set of multimorbidity risk scores.

METHODS

Using EHR data from a nationwide cohort of patients, we developed the total health profile, a set of six integrated risk scores reflecting five distinct organ systems and overall health. We selected the occurrence of an inpatient hospital visitation over a 2-year follow-up window, attributable to specific organ systems, as our risk endpoint. Using a physician-curated set of features, we trained six machine learning models on 794,294 patients to predict the calibrated probability of the aforementioned endpoint, producing risk scores for heart, lung, neuro, kidney, and digestive functions and a sixth score for combined risk. We evaluated the scores using a held-out test cohort of 198,574 patients.

RESULTS

Study patients closely matched national census averages, with a median age of 41 years, a median income of $66,829, and racial averages by zip code of 73.8% White, 5.9% Asian, and 11.9% African American. All models were well calibrated and demonstrated strong performance with areas under the receiver operating curve (AUROCs) of 0.83 for the total health score (THS), 0.89 for heart, 0.86 for lung, 0.84 for neuro, 0.90 for kidney, and 0.83 for digestive functions. There was consistent performance of this scoring system across sexes, diverse patient ages, and zip code income levels. Each model learned to generate predictions by focusing on appropriate clinically relevant patient features, such as heart-related hospitalizations and chronic hypertension diagnosis for the heart model. The THS outperformed the other commonly used multimorbidity scoring systems, specifically the Charlson Comorbidity Index (CCI) and the Elixhauser Comorbidity Index (ECI) overall (AUROCs: THS=0.823, CCI=0.735, ECI=0.649) as well as for every age, sex, and income bracket. Performance improvements were most pronounced for middle-aged and lower-income subgroups. Ablation tests using only diagnosis, prescription, social determinants of health, and lab feature groups, while retaining procedure-related features, showed that the combination of feature groups has the best predictive performance, though only marginally better than the diagnosis-only model on at-risk groups.

CONCLUSIONS

Massive retrospective EHR data sets have made it possible to use machine learning to build practical multimorbidity risk scores that are highly predictive, personalizable, intuitive to explain, and generalizable across diverse patient populations.

Collapse

Hara K, Kobayashi Y, Tomio J, Ito Y, Svensson T, Ikesu R, Chung UI, Svensson AK. Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods. PLoS One 2021;16:e0254394. [PMID: 34570785 PMCID: PMC8476042 DOI: 10.1371/journal.pone.0254394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 06/25/2021] [Indexed: 11/29/2022] Open

Abstract

Identification of medical conditions using claims data is generally conducted with algorithms based on subject-matter knowledge. However, these claims-based algorithms (CBAs) are highly dependent on the knowledge level and not necessarily optimized for target conditions. We investigated whether machine learning methods can supplement researchers' knowledge of target conditions in building CBAs. Retrospective cohort study using a claims database combined with annual health check-up results of employees' health insurance programs for fiscal year 2016-17 in Japan (study population for hypertension, N = 631,289; diabetes, N = 152,368; dyslipidemia, N = 614,434). We constructed CBAs with logistic regression, k-nearest neighbor, support vector machine, penalized logistic regression, tree-based model, and neural network for identifying patients with three common chronic conditions: hypertension, diabetes, and dyslipidemia. We then compared their association measures using a completely hold-out test set (25% of the study population). Among the test cohorts of 157,822, 38,092, and 153,608 enrollees for hypertension, diabetes, and dyslipidemia, 25.4%, 8.4%, and 38.7% of them had a diagnosis of the corresponding condition. The areas under the receiver operating characteristic curve (AUCs) of the logistic regression with/without subject-matter knowledge about the target condition were .923/.921 for hypertension, .957/.938 for diabetes, and .739/.747 for dyslipidemia. The logistic lasso, logistic elastic-net, and tree-based methods yielded AUCs comparable to those of the logistic regression with subject-matter knowledge: .923-.931 for hypertension; .958-.966 for diabetes; .747-.773 for dyslipidemia. We found that machine learning methods can attain AUCs comparable to the conventional knowledge-based method in building CBAs.

Collapse

Chiang C, Zhang P, Donneyong M, Chen Y, Su Y, Li L. Random control selection for conducting high-throughput adverse drug events screening using large-scale longitudinal health data. CPT Pharmacometrics Syst Pharmacol 2021;10:1032-1042. [PMID: 34313404 PMCID: PMC8452297 DOI: 10.1002/psp4.12673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 04/07/2021] [Accepted: 05/22/2021] [Indexed: 11/12/2022] Open