1
|
Lee D, Kim S, Lee S, Kim HJ, Kim JH, Lim MC, Cho H. Deep Learning-Based Dynamic Risk Prediction of Venous Thromboembolism for Patients With Ovarian Cancer in Real-World Settings From Electronic Health Records. JCO Clin Cancer Inform 2024; 8:e2300192. [PMID: 38996199 DOI: 10.1200/cci.23.00192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 03/19/2024] [Accepted: 04/16/2024] [Indexed: 07/14/2024] Open
Abstract
PURPOSE Patients with epithelial ovarian cancer (EOC) have an elevated risk for venous thromboembolism (VTE). To assess the risk of VTE, models were developed by statistical or machine learning algorithms. However, few models have accommodated deep learning (DL) algorithms in realistic clinical settings. We aimed to develop a predictive DL model, exploiting rich information from electronic health records (EHRs), including dynamic clinical features and the presence of competing risks. METHODS We extracted EHRs of 1,268 patients diagnosed with EOC from January 2007 through December 2017 at the National Cancer Center, Korea. DL survival networks using fully connected layers, temporal attention, and recurrent neural networks were adopted and compared with multi-perceptron-based classification models. Prediction accuracy was independently validated in the data set of 423 patients newly diagnosed with EOC from January 2018 to December 2019. Personalized risk plots displaying the individual interval risk were developed. RESULTS DL-based survival networks achieved a superior area under the receiver operating characteristic curve (AUROC) between 0.95 and 0.98 while the AUROC of classification models was between 0.85 and 0.90. As clinical information benefits the prediction accuracy, the proposed dynamic survival network outperformed other survival networks for the test and validation data set with the highest time-dependent concordance index (0.974, 0.975) and lowest Brier score (0.051, 0.049) at 6 months after a cancer diagnosis. Our visualization showed that the interval risk fluctuating along with the changes in longitudinal clinical features. CONCLUSION Adaption of dynamic patient clinical features and accounting for competing risks from EHRs into the DL algorithms demonstrated VTE risk prediction with high accuracy. Our results show that this novel dynamic survival network can provide personalized risk prediction with the potential to assist risk-based clinical intervention to prevent VTE among patients with EOC.
Collapse
Affiliation(s)
- Dahhay Lee
- Department of Cancer AI and Digital Health, National Cancer Center Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Republic of Korea
- School of Mathematics and Computing (Computational Science and Engineering), Yonsei University, Seoul, Republic of Korea
| | - Seongyoon Kim
- School of Mathematics and Computing (Computational Science and Engineering), Yonsei University, Seoul, Republic of Korea
| | - Sanghee Lee
- Department of Cancer AI and Digital Health, National Cancer Center Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Republic of Korea
- Health Insurance Research Institute, National Health Insurance Service, Wonju, Republic of Korea
| | - Hak Jin Kim
- Department of Cardiology, Gumdan Top General Hospital, Incheon, Republic of Korea
- Branch of Cardiology, Department of Internal Medicine, National Cancer Center, Goyang, Republic of Korea
| | - Ji Hyun Kim
- Center for Gynecologic Cancer, National Cancer Center, Goyang, Republic of Korea
| | - Myong Cheol Lim
- Center for Gynecologic Cancer, National Cancer Center, Goyang, Republic of Korea
- Division of Tumor Immunology, Research Institute and Hospital, National Cancer Center, Goyang, Republic of Korea
- Center for Clinical Trials, Research Institute and Hospital, National Cancer Center, Goyang, Republic of Korea
| | - Hyunsoon Cho
- Department of Cancer AI and Digital Health, National Cancer Center Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Republic of Korea
- Integrated Biostatistics Research Branch, National Cancer Center, Goyang, Republic of Korea
| |
Collapse
|
2
|
van der Veer SN, Anderson NE, Finnigan R, Kyte D. Electronic Collection of Patient-Reported Outcomes to Improve Kidney Care: Benefits, Drawbacks, and Next Steps. Semin Nephrol 2024; 44:151552. [PMID: 39164148 DOI: 10.1016/j.semnephrol.2024.151552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2024]
Abstract
Kidney services worldwide are increasingly using digital health technologies to deliver care. This includes kidney electronic patient-reported outcome (ePRO) systems: ambulatory digital technologies that enable the capture of PRO data electronically from people with kidney disease remotely and in real time to be shared with their kidney care team. Current kidney ePRO systems commonly aim to support the monitoring and management of symptoms in patients with kidney disease. The majority have thus far only been implemented in research settings and are not yet routinely used in clinical practice, leaving their readiness for real-world implementation largely unknown. Compared with paper-based PRO collection, ePRO systems have certain advantages, which we categorize as efficiency benefits (e.g., lower administrative burden), direct patient care benefits (e.g., automated PRO-based patient education), and health system and research benefits (e.g., collecting ePRO data once for multiple purposes). At the same time, kidney ePRO systems come with drawbacks, such as their potential to exacerbate existing inequities in care and outcomes and to negatively affect staff burden and patients' experience of kidney care. Areas that hold promise for expediting the development and uptake of kidney ePRO systems at the local, organizational, and national level include harnessing national kidney registries as enabling infrastructures; using novel data-driven technologies (e.g., computerized adaptive test systems, configurable dashboards); applying implementation science and action research approaches to enhance translation of ePRO research findings into clinical practice; and engaging stakeholders, including patients and carers, health care professionals, policymakers, payers, ePRO experts, technology providers, and organizations that monitor and improve the quality of kidney services.
Collapse
Affiliation(s)
- Sabine N van der Veer
- Division of Informatics, Imaging and Data Science, School of Health Sciences, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK.
| | - Nicola E Anderson
- Centre for Patient Reported Outcomes Research (CPROR), Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK; National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK; National Institute for Health and Care Research (NIHR) Applied Research Collaboration (ARC) West Midlands, University of Birmingham, Birmingham, UK; University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Rob Finnigan
- NHS England North West Kidney Network, NHS England, Leeds, UK
| | - Derek Kyte
- School of Allied Health and Community, University of Worcester, Worcester, UK
| |
Collapse
|
3
|
Lee MY, Heo KN, Lee S, Ah YM, Shin J, Lee JY. Development and validation of a medication-based risk prediction model for acute kidney injury in older outpatients. Arch Gerontol Geriatr 2024; 120:105332. [PMID: 38382232 DOI: 10.1016/j.archger.2024.105332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 01/06/2024] [Accepted: 01/13/2024] [Indexed: 02/23/2024]
Abstract
BACKGROUND Older adults are at an increased risk of acute kidney injury (AKI), particularly in community settings, often due to medications. Effective prevention hinges on identifying high-risk patients, yet existing models for predicting AKI risk in older outpatients are scarce, particularly those incorporating medication variables. We aimed to develop an AKI risk prediction model that included medication-related variables for older outpatients. METHODS We constructed a cohort of 2,272,257 outpatients aged ≥65 years using a national claims database. This cohort was split into a development (70%) and validation (30%) groups. Our primary goal was to identify newly diagnosed AKI within one month of cohort entry in an outpatient context. We screened 170 variables and developed a risk prediction model using logistic regression. RESULTS The final model integrated 12 variables: 2 demographic, 4 comorbid, and 6 medication-related. It showed good performance with acceptable calibration. In the validation cohort, the area under the receiver operating characteristic curve value was 0.720 (95% confidence interval, 0.692-0.748). Sensitivity and specificity were 69.9% and 61.9%, respectively. Notably, the model identified high-risk patients as having a 27-fold increased AKI risk compared with low-risk individuals. CONCLUSION We have developed a new AKI risk prediction model for older outpatients, incorporating critical medication-related variables with good discrimination. This tool may be useful in identifying and targeting patients who may require interventions to prevent AKI in an outpatient setting.
Collapse
Affiliation(s)
- Mee Yeon Lee
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Kyu-Nam Heo
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Suhyun Lee
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Young-Mi Ah
- College of Pharmacy, Yeungnam University, Gyeongsan, Republic of Korea
| | - Jaekyu Shin
- Department of Clinical Pharmacy, School of Pharmacy, University of California, San Francisco, CA, United States
| | - Ju-Yeun Lee
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea.
| |
Collapse
|
4
|
Wu CC, Islam MM, Poly TN, Weng YC. Artificial Intelligence in Kidney Disease: A Comprehensive Study and Directions for Future Research. Diagnostics (Basel) 2024; 14:397. [PMID: 38396436 PMCID: PMC10887584 DOI: 10.3390/diagnostics14040397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/03/2024] [Accepted: 02/05/2024] [Indexed: 02/25/2024] Open
Abstract
Artificial intelligence (AI) has emerged as a promising tool in the field of healthcare, with an increasing number of research articles evaluating its applications in the domain of kidney disease. To comprehend the evolving landscape of AI research in kidney disease, a bibliometric analysis is essential. The purposes of this study are to systematically analyze and quantify the scientific output, research trends, and collaborative networks in the application of AI to kidney disease. This study collected AI-related articles published between 2012 and 20 November 2023 from the Web of Science. Descriptive analyses of research trends in the application of AI in kidney disease were used to determine the growth rate of publications by authors, journals, institutions, and countries. Visualization network maps of country collaborations and author-provided keyword co-occurrences were generated to show the hotspots and research trends in AI research on kidney disease. The initial search yielded 673 articles, of which 631 were included in the analyses. Our findings reveal a noteworthy exponential growth trend in the annual publications of AI applications in kidney disease. Nephrology Dialysis Transplantation emerged as the leading publisher, accounting for 4.12% (26 out of 631 papers), followed by the American Journal of Transplantation at 3.01% (19/631) and Scientific Reports at 2.69% (17/631). The primary contributors were predominantly from the United States (n = 164, 25.99%), followed by China (n = 156, 24.72%) and India (n = 62, 9.83%). In terms of institutions, Mayo Clinic led with 27 contributions (4.27%), while Harvard University (n = 19, 3.01%) and Sun Yat-Sen University (n = 16, 2.53%) secured the second and third positions, respectively. This study summarized AI research trends in the field of kidney disease through statistical analysis and network visualization. The findings show that the field of AI in kidney disease is dynamic and rapidly progressing and provides valuable information for recognizing emerging patterns, technological shifts, and interdisciplinary collaborations that contribute to the advancement of knowledge in this critical domain.
Collapse
Affiliation(s)
- Chieh-Chen Wu
- Department of Healthcare Information and Management, School of Health and Medical Engineering, Ming Chuan University, Taipei 111, Taiwan;
| | - Md. Mohaimenul Islam
- Outcomes and Translational Sciences, College of Pharmacy, The Ohio State University, Columbus, OH 43210, USA;
| | - Tahmina Nasrin Poly
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 110, Taiwan;
| | - Yung-Ching Weng
- Department of Healthcare Information and Management, School of Health and Medical Engineering, Ming Chuan University, Taipei 111, Taiwan;
| |
Collapse
|
5
|
Han BC, Kim J, Choi J. Prediction of complications in diabetes mellitus using machine learning models with transplanted topic model features. Biomed Eng Lett 2024; 14:163-171. [PMID: 38186952 PMCID: PMC10769946 DOI: 10.1007/s13534-023-00322-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 08/05/2023] [Accepted: 09/16/2023] [Indexed: 01/09/2024] Open
Abstract
Purpose: This study aims to predict the progression of Diabetes Mellitus (DM) from the clinical notes through machine learning based on latent Dirichlet allocation (LDA) topic modeling. Particularly, 174,427 clinical notes of DM patients were collected from the electronic medical record (EMR) system of the Seoul National University Hospital outpatient clinic. Method: We developed a model to predict the development of DM complications. Topics developed by the topic model were exploited as the key feature of our machine-learning model. The proposed model generalized a correlation between topic structures and complications. Results: The model provided acceptable predictive performance for all four types of complications (diabetic retinopathy, diabetic nephropathy, nonalcoholic fatty liver disease, and cerebrovascular accident). Upon employing extreme gradient boosting (XGBoost), we obtained the F1 scores of the predictions for each complication type as 0.844, 0.921, 0.831, and 0.762. Conclusion: This study shows that a machine learning project based on topic modeling can effectively predict the progress of a disease. Furthermore, a unique way of topic model transplanting, which matches the dimension of the topic structures of the two data sets, is presented. Supplementary Information The online version contains supplementary material available at 10.1007/s13534-023-00322-7.
Collapse
Affiliation(s)
- Benedict Choonghyun Han
- Interdisciplinary Program in Bioengineering, Seoul National University, 1 Gwanak-ro Gwanak-gu, Seoul, 08826 Republic of Korea
| | - Jimin Kim
- English Language and Literature, Seoul National University, 1 Gwanak-ro Gwanak-gu, Seoul, 08826 Republic of Korea
| | - Jinwook Choi
- Department of Biomedical Engineering, College of Medicine, Seoul National University, 101 Daehak-ro Jongno-gu, Seoul, 03080 Republic of Korea
- Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, 103 Daehak-ro Jongno-gu, Seoul, 03080 Republic of Korea
| |
Collapse
|
6
|
Bazoge A, Morin E, Daille B, Gourraud PA. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review. JMIR Med Inform 2023; 11:e42477. [PMID: 38100200 PMCID: PMC10757232 DOI: 10.2196/42477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 01/16/2023] [Accepted: 09/07/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible. OBJECTIVE The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks. METHODS This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English. RESULTS We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%). CONCLUSIONS CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.
Collapse
Affiliation(s)
- Adrien Bazoge
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
| | - Emmanuel Morin
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Béatrice Daille
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Pierre-Antoine Gourraud
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
- Nantes Université, INSERM, CHU de Nantes, École Centrale Nantes, Centre de Recherche Translationnelle en Transplantation et Immunologie, CR2TI, F-44000 Nantes, France
| |
Collapse
|
7
|
Ge J, Digitale JC, Fenton C, McCulloch CE, Lai JC, Pletcher MJ, Gennatas ED. Predicting post-liver transplant outcomes in patients with acute-on-chronic liver failure using Expert-Augmented Machine Learning. Am J Transplant 2023; 23:1908-1921. [PMID: 37652176 PMCID: PMC11018271 DOI: 10.1016/j.ajt.2023.08.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 08/04/2023] [Accepted: 08/25/2023] [Indexed: 09/01/2023]
Abstract
Liver transplantation (LT) is a treatment for acute-on-chronic liver failure (ACLF), but high post-LT mortality has been reported. Existing post-LT models in ACLF have been limited. We developed an Expert-Augmented Machine Learning (EAML) model to predict post-LT outcomes. We identified ACLF patients who underwent LT in the University of California Health Data Warehouse. We applied the RuleFit machine learning (ML) algorithm to extract rules from decision trees and create intermediate models. We asked human experts to rate the rules generated by RuleFit and incorporated these ratings to generate final EAML models. We identified 1384 ACLF patients. For death at 1 year, areas under the receiver-operating characteristic curve were 0.707 (confidence interval [CI] 0.625-0.793) for EAML and 0.719 (CI 0.640-0.800) for RuleFit. For death at 90 days, areas under the receiver-operating characteristic curve were 0.678 (CI 0.581-0.776) for EAML and 0.707 (CI 0.615-0.800) for RuleFit. In pairwise comparisons, both EAML and RuleFit models outperformed cross-sectional models. Significant discrepancies between experts and ML occurred in rankings of biomarkers used in clinical practice. EAML may serve as a method for ML-guided hypothesis generation in further ACLF research.
Collapse
Affiliation(s)
- Jin Ge
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California-San Francisco, San Francisco, California, USA.
| | - Jean C Digitale
- Department of Epidemiology and Biostatistics, University of California-San Francisco, San Francisco, California, USA
| | - Cynthia Fenton
- Division of Hospital Medicine, Department of Medicine, University of California-San Francisco, San Francisco, California, USA
| | - Charles E McCulloch
- Department of Epidemiology and Biostatistics, University of California-San Francisco, San Francisco, California, USA
| | - Jennifer C Lai
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California-San Francisco, San Francisco, California, USA
| | - Mark J Pletcher
- Department of Epidemiology and Biostatistics, University of California-San Francisco, San Francisco, California, USA
| | - Efstathios D Gennatas
- Department of Epidemiology and Biostatistics, University of California-San Francisco, San Francisco, California, USA
| |
Collapse
|
8
|
Tseng YJ, Chen CJ, Chang CW. lab: an R package for generating analysis-ready data from laboratory records. PeerJ Comput Sci 2023; 9:e1528. [PMID: 37705643 PMCID: PMC10495959 DOI: 10.7717/peerj-cs.1528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 07/20/2023] [Indexed: 09/15/2023]
Abstract
Background Electronic health records (EHRs) play a crucial role in healthcare decision-making by giving physicians insights into disease progression and suitable treatment options. Within EHRs, laboratory test results are frequently utilized for predicting disease progression. However, processing laboratory test results often poses challenges due to variations in units and formats. In addition, leveraging the temporal information in EHRs can improve outcomes, prognoses, and diagnosis predication. Nevertheless, the irregular frequency of the data in these records necessitates data preprocessing, which can add complexity to time-series analyses. Methods To address these challenges, we developed an open-source R package that facilitates the extraction of temporal information from laboratory records. The proposed lab package generates analysis-ready time series data by segmenting the data into time-series windows and imputing missing values. Moreover, users can map local laboratory codes to the Logical Observation Identifier Names and Codes (LOINC), an international standard. This mapping allows users to incorporate additional information, such as reference ranges and related diseases. Moreover, the reference ranges provided by LOINC enable us to categorize results into normal or abnormal. Finally, the analysis-ready time series data can be further summarized using descriptive statistics and utilized to develop models using machine learning technologies. Results Using the lab package, we analyzed data from MIMIC-III, focusing on newborns with patent ductus arteriosus (PDA). We extracted time-series laboratory records and compared the differences in test results between patients with and without 30-day in-hospital mortality. We then identified significant variations in several laboratory test results 7 days after PDA diagnosis. Leveraging the time series-analysis-ready data, we trained a prediction model with the long short-term memory algorithm, achieving an area under the receiver operating characteristic curve of 0.83 for predicting 30-day in-hospital mortality in model training. These findings demonstrate the lab package's effectiveness in analyzing disease progression. Conclusions The proposed lab package simplifies and expedites the workflow involved in laboratory records extraction. This tool is particularly valuable in assisting clinical data analysts in overcoming the obstacles associated with heterogeneous and sparse laboratory records.
Collapse
Affiliation(s)
- Yi-Ju Tseng
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, United States of America
| | - Chun Ju Chen
- Department of Information Management, National Taiwan University, Taipei, Taiwan
| | - Chia Wei Chang
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| |
Collapse
|
9
|
Ketenci M, Bhave S, Elhadad N, Perotte A. Maximum Likelihood Estimation of Flexible Survival Densities with Importance Sampling. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2023; 219:360-380. [PMID: 39350918 PMCID: PMC11441640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
Survival analysis is a widely-used technique for analyzing time-to-event data in the presence of censoring. In recent years, numerous survival analysis methods have emerged which scale to large datasets and relax traditional assumptions such as proportional hazards. These models, while being performant, are very sensitive to model hyperparameters including: (1) number of bins and bin size for discrete models and (2) number of cluster assignments for mixture-based models. Each of these choices requires extensive tuning by practitioners to achieve optimal performance. In addition, we demonstrate in empirical studies that: (1) optimal bin size may drastically differ based on the metric of interest (e.g., concordance vs brier score), and (2) mixture models may suffer from mode collapse and numerical instability. We propose a survival analysis approach which eliminates the need to tune hyperparameters such as mixture assignments and bin sizes, reducing the burden on practitioners. We show that the proposed approach matches or outperforms baselines on several real-world datasets.
Collapse
Affiliation(s)
- Mert Ketenci
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Shreyas Bhave
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Adler Perotte
- Department of Biomedical Informatics, Columbia University, New York, NY
| |
Collapse
|
10
|
Farrell D, Chan L. Application of Natural Language Processing in Nephrology Research. Clin J Am Soc Nephrol 2023; 18:806-808. [PMID: 36758147 PMCID: PMC10278815 DOI: 10.2215/cjn.0000000000000118] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 01/24/2023] [Indexed: 02/11/2023]
Affiliation(s)
- Douglas Farrell
- Division of Nephrology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Lili Chan
- Division of Nephrology, Icahn School of Medicine at Mount Sinai, New York, New York
- Charles Bronfman Institute of Personalized Medicine, Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
11
|
Dashtban A, Mizani MA, Pasea L, Denaxas S, Corbett R, Mamza JB, Gao H, Morris T, Hemingway H, Banerjee A. Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals. EBioMedicine 2023; 89:104489. [PMID: 36857859 PMCID: PMC9989643 DOI: 10.1016/j.ebiom.2023.104489] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 01/31/2023] [Accepted: 02/06/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions. METHODS We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter). FINDINGS After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%). MEDICATIONS Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD. INTERPRETATION In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction. FUNDING AstraZeneca UK Ltd, Health Data Research UK.
Collapse
Affiliation(s)
- Ashkan Dashtban
- Institute of Health Informatics, University College London, London, UK
| | - Mehrdad A Mizani
- Institute of Health Informatics, University College London, London, UK; British Heart Foundation Data Science Centre, Health Data Research UK, London, UK
| | - Laura Pasea
- Institute of Health Informatics, University College London, London, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
| | | | - Jil B Mamza
- Medical and Scientific Affairs, BioPharmaceuticals Medical, AstraZeneca, London, UK
| | - He Gao
- Medical and Scientific Affairs, BioPharmaceuticals Medical, AstraZeneca, London, UK
| | - Tamsin Morris
- Medical and Scientific Affairs, BioPharmaceuticals Medical, AstraZeneca, London, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London, UK; Health Data Research UK, University College London, London, UK
| | - Amitava Banerjee
- Institute of Health Informatics, University College London, London, UK; Barts Health NHS Trust, London, UK; University College London Hospitals NHS Trust, London, UK.
| |
Collapse
|
12
|
Tabas RY, Ahmadian L, Samadbeik M, Arian A, Ameri A. Determining the readiness of patients with renal failure to use health information technology. BMC Med Inform Decis Mak 2022; 22:324. [PMID: 36482469 PMCID: PMC9732994 DOI: 10.1186/s12911-022-02073-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 11/29/2022] [Indexed: 12/13/2022] Open
Abstract
INTRODUCTION Using information technology (IT) for purposes such as patient education and disease prevention and management is effective when patients are ready to use it. The objective of this study was to determine the readiness of patients with renal failure to use health IT. METHODS This study was performed on all dialysis patients in South Khorasan province (n = 263) using a 28-item questionnaire. The questionnaire consisted of (1) demographic information of participants and (2) questions concerning eight main factors including the need for information, desire to receive information, ability to use computers and the Internet, computers and the Internet anxiety, communication with physicians, using mobile phones and concerns about security and confidentiality of information. Descriptive statistics and Mann-Whitney and Kruskal-Wallis statistical tests were used to analyze the data. RESULTS About 15% of the participants stated that they do not want to receive information from the Internet. Anxiety and concern about Internet security and confidentiality were higher in women, married people, people over 60, villagers, and illiterate people (p < 0.05). Married people and people over 60 years had a higher desire to get information (p < 0.05). The rate of computer anxiety and Internet privacy concern was higher than average (p < 0.001). Most patients (34.2%) could only send text messages using mobile phones. CONCLUSION Despite the need of most patients to online health information, they do not use this information due to a lack of skills and experience to use IT. Therefore, the ability of users should be considered when developing IT-based interventions. Due to patients' concerns about Internet privacy, it is required to teach patients how to protect their privacy while using the Internet.
Collapse
Affiliation(s)
- Raana Younesi Tabas
- grid.411701.20000 0004 0417 4622Health Information Management Department, Valiasr Hospital, Birjand University of Medical Sciences, Birjand, Iran
| | - Leila Ahmadian
- grid.412105.30000 0001 2092 9755Health Information Sciences Department, Faculty of Management and Medical Information Sciences, Kerman University of Medical Sciences, Kerman, Iran
| | - Mahnaz Samadbeik
- grid.508728.00000 0004 0612 1516Social Determinants of Health Research Center, Lorestan University of Medical Sciences, Lorestan, Iran
| | - Anahita Arian
- grid.411701.20000 0004 0417 4622Department of Internal Medicine, Cardiovascular Diseases Research Center Valiasr Hospital, Birjand University of Medical Sciences, Birjand, Iran
| | - Arefeh Ameri
- grid.412105.30000 0001 2092 9755Health Information Sciences Department, Faculty of Management and Medical Information Sciences, Kerman University of Medical Sciences, Kerman, Iran
| |
Collapse
|
13
|
Kline A, Wang H, Li Y, Dennis S, Hutch M, Xu Z, Wang F, Cheng F, Luo Y. Multimodal machine learning in precision health: A scoping review. NPJ Digit Med 2022; 5:171. [PMID: 36344814 PMCID: PMC9640667 DOI: 10.1038/s41746-022-00712-8] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 10/14/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning is frequently being leveraged to tackle problems in the health sector including utilization for clinical decision-support. Its use has historically been focused on single modal data. Attempts to improve prediction and mimic the multimodal nature of clinical expert decision-making has been met in the biomedical field of machine learning by fusing disparate data. This review was conducted to summarize the current studies in this field and identify topics ripe for future research. We conducted this review in accordance with the PRISMA extension for Scoping Reviews to characterize multi-modal data fusion in health. Search strings were established and used in databases: PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. A final set of 128 articles were included in the analysis. The most common health areas utilizing multi-modal methods were neurology and oncology. Early fusion was the most common data merging strategy. Notably, there was an improvement in predictive performance when using data fusion. Lacking from the papers were clear clinical deployment strategies, FDA-approval, and analysis of how using multimodal approaches from diverse sub-populations may improve biases and healthcare disparities. These findings provide a summary on multimodal data fusion as applied to health diagnosis/prognosis problems. Few papers compared the outputs of a multimodal approach with a unimodal prediction. However, those that did achieved an average increase of 6.4% in predictive accuracy. Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.
Collapse
Affiliation(s)
- Adrienne Kline
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Hanyin Wang
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Yikuan Li
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Saya Dennis
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Meghan Hutch
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Zhenxing Xu
- Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
| | - Fei Wang
- Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
| | - Feixiong Cheng
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, 44195, OH, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA.
| |
Collapse
|
14
|
Schena FP, Anelli VW, Abbrescia DI, Di Noia T. Prediction of chronic kidney disease and its progression by artificial intelligence algorithms. J Nephrol 2022; 35:1953-1971. [PMID: 35543912 DOI: 10.1007/s40620-022-01302-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 03/04/2022] [Indexed: 12/14/2022]
Abstract
BACKGROUND AND OBJECTIVE Aim of nephrologists is to delay the outcome and reduce the number of patients undergoing renal failure (RF) by applying prevention protocols and accurately monitoring chronic kidney disease (CKD) patients. General practitioners and nephrologists are involved in the first and in the late stages of the disease, respectively. Early diagnosis of CKD is an important step in preventing the progression of kidney damage. Our aim was to review publications on machine learning algorithms (MLAs) that can predict early CKD and its progression. METHODS We conducted a systematic review and selected 55 articles on the application of MLAs in CKD. PubMed, Medline, Scopus, Web of Science and IEEE Xplore Digital Library of the Institute of Electrical and Electronics Engineers were searched. The search terms were chronic kidney disease, artificial intelligence, data mining and machine learning algorithms. RESULTS MLAs use enormous numbers of predictors combining them in non-linear and highly interactive ways. This ability increases when new data is added. We observed some limitations in the publications: (i) databases were not accurately reviewed by physicians; (ii) databases did not report the ethnicity of the patients; (iii) some databases collected variables that were not important for the diagnosis and progression of CKD; (iv) no information was presented on the native kidney disease causing CKD; (v) no validation of the results in external independent cohorts was provided; and (vi) no insights were given on the MLAs that were used. Overall, there was limited collaboration among experts in electronics, computer science and physicians. CONCLUSIONS The application of MLAs in kidney diseases may enhance the ability of clinicians to predict CKD and RF, thus improving diagnostic assistance and providing suitable therapeutic decisions. However, it is necessary to improve the development process of MLA tools.
Collapse
Affiliation(s)
- Francesco Paolo Schena
- Department of Emergency and Organ Transplants, University of Bari, Bari, Italy.
- Department of Electrical and Information Engineering, Polytechnic of Bari, Bari, Italy.
| | - Vito Walter Anelli
- Department of Electrical and Information Engineering, Polytechnic of Bari, Bari, Italy
| | | | - Tommaso Di Noia
- Department of Electrical and Information Engineering, Polytechnic of Bari, Bari, Italy
| |
Collapse
|
15
|
Su D, Zhang X, He K, Chen Y, Wu N. Individualized prediction of chronic kidney disease for the elderly in longevity areas in China: Machine learning approaches. Front Public Health 2022; 10:998549. [PMID: 36339144 PMCID: PMC9634246 DOI: 10.3389/fpubh.2022.998549] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 09/20/2022] [Indexed: 01/26/2023] Open
Abstract
Background Chronic kidney disease (CKD) has become a major public health problem worldwide and has caused a huge social and economic burden, especially in developing countries. No previous study has used machine learning (ML) methods combined with longitudinal data to predict the risk of CKD development in 2 years amongst the elderly in China. Methods This study was based on the panel data of 925 elderly individuals in the 2012 baseline survey and 2014 follow-up survey of the Healthy Aging and Biomarkers Cohort Study (HABCS) database. Six ML models, logistic regression (LR), lasso regression, random forests (RF), gradient-boosted decision tree (GBDT), support vector machine (SVM), and deep neural network (DNN), were developed to predict the probability of CKD amongst the elderly in 2 years (the year of 2014). The decision curve analysis (DCA) provided a range of threshold probability of the outcome and the net benefit of each ML model. Results Amongst the 925 elderly in the HABCS 2014 survey, 289 (18.8%) had CKD. Compared with the other models, LR, lasso regression, RF, GBDT, and DNN had no statistical significance of the area under the receiver operating curve (AUC) value (>0.7), and SVM exhibited the lowest predictive performance (AUC = 0.633, p-value = 0.057). DNN had the highest positive predictive value (PPV) (0.328), whereas LR had the lowest (0.287). DCA results indicated that within the threshold ranges of ~0-0.03 and 0.37-0.40, the net benefit of GBDT was the largest. Within the threshold ranges of ~0.03-0.10 and 0.26-0.30, the net benefit of RF was the largest. Age was the most important predictor variable in the RF and GBDT models. Blood urea nitrogen, serum albumin, uric acid, body mass index (BMI), marital status, activities of daily living (ADL)/instrumental activities of daily living (IADL) and gender were crucial in predicting CKD in the elderly. Conclusion The ML model could successfully capture the linear and nonlinear relationships of risk factors for CKD in the elderly. The decision support system based on the predictive model in this research can help medical staff detect and intervene in the health of the elderly early.
Collapse
Affiliation(s)
- Dai Su
- Department of Health Management and Policy, School of Public Health, Capital Medical University, Beijing, China
| | - Xingyu Zhang
- Department of Systems, Populations, and Leadership, University of Michigan School of Nursing, Ann Arbor, MI, United States,Thomas E. Starzl Transplantation Institute, University of Pittsburgh Medical Center, Pittsburgh, PA, United States
| | - Kevin He
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, United States
| | - Yingchun Chen
- Department of Health Management, School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China,Research Center for Rural Health Services, Hubei Province Key Research Institute of Humanities and Social Sciences, Wuhan, China
| | - Nina Wu
- Department of Health Management and Policy, School of Public Health, Capital Medical University, Beijing, China,*Correspondence: Nina Wu
| |
Collapse
|
16
|
Van Vleck TT, Farrell D, Chan L. Natural Language Processing in Nephrology. Adv Chronic Kidney Dis 2022; 29:465-471. [PMID: 36253030 PMCID: PMC9586467 DOI: 10.1053/j.ackd.2022.07.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/19/2022] [Accepted: 07/06/2022] [Indexed: 01/25/2023]
Abstract
Unstructured data in the electronic health records contain essential patient information. Natural language processing (NLP), teaching a computer to read, allows us to tap into these data without needing the time and effort of manual chart abstraction. The core first step for all NLP algorithms is preprocessing the text to identify the core words that differentiate the text while filtering out the noise. Traditional NLP uses a rule-based approach, applying grammatical rules to infer meaning from the text. Newer NLP approaches use machine learning/deep learning which can infer meaning without explicitly being programmed. NLP use in nephrology research has focused on identifying distinct disease processes, such as CKD, and extraction of patient-oriented outcomes such as symptoms with high sensitivity. NLP can identify patient features from clinical text associated with acute kidney injury and progression of CKD. Lastly, inclusion of features extracted using NLP improved the performance of risk-prediction models compared to models that only use structured data. Implementation of NLP algorithms has been slow, partially hindered by the lack of external validation of NLP algorithms. However, NLP allows for extraction of key patient characteristics from free text, an infrequently used resource in nephrology.
Collapse
Affiliation(s)
- Tielman T Van Vleck
- Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Douglas Farrell
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Lili Chan
- Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY; Division of Nephrology, Icahn School of Medicine at Mount Sinai, New York, NY.
| |
Collapse
|
17
|
Zafarnejad R, Dumbauld S, Dumbauld D, Adibuzzaman M, Griffin P, Rutsky E. Using CUSUM in real time to signal clinically relevant decreases in estimated glomerular filtration rate. BMC Nephrol 2022; 23:287. [PMID: 35982411 PMCID: PMC9389810 DOI: 10.1186/s12882-022-02910-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 08/01/2022] [Indexed: 11/16/2022] Open
Abstract
Background The electronic health record (EHR), utilized to apply statistical methodology, assists provider decision-making, including during the care of chronic kidney disease (CKD) patients. When estimated glomerular filtration (eGFR) decreases, the rate of that change adds meaning to a patient’s single eGFR and may represent severity of renal injury. Since the cumulative sum chart technique (CUSUM), often used in quality control and surveillance, continuously checks for change in a series of measurements, we selected this statistical tool to detect clinically relevant eGFR decreases and developed CUSUMGFR. Methods In a retrospective analysis we applied an age adjusted CUSUMGFR, to signal identification of eventual ESKD patients prior to diagnosis date. When the patient signaled by reaching a specified threshold value, days from CUSUM signal date to ESKD diagnosis date (earliness days) were measured, along with the corresponding eGFR measurement at the signal. Results Signaling occurred by CUSUMGFR on average 791 days (se = 12 days) prior to ESKD diagnosis date with sensitivity = 0.897, specificity = 0.877, and accuracy = .878. Mean days prior to ESKD diagnosis were significantly greater in Black patients (905 days) and patients with hypertension (852 days), diabetes (940 days), cardiovascular disease (1027 days), and hypercholesterolemia (971 days). Sensitivity and specificity did not vary by sociodemographic and clinical risk factors. Conclusions CUSUMGFR correctly identified 30.6% of CKD patients destined for ESKD when eGFR was > 60 ml/min/1.73 m2 and signaled 12.3% of patients that did not go on to ESKD (though almost all went on to later-stage CKD). If utilized in an EHR, signaling patients could focus providers’ efforts to slow or prevent progression to later stage CKD and ESKD. Supplementary Information The online version contains supplementary material available at 10.1186/s12882-022-02910-8.
Collapse
Affiliation(s)
- Reyhaneh Zafarnejad
- Department of Industrial Engineering, Penn State University, 310 Leonhard Bldg., University Park, PA, 16803, USA
| | - Steven Dumbauld
- Regenstrief Center for Healthcare Engineering, Purdue University, West Lafayette, IN, USA
| | | | - Mohammad Adibuzzaman
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health Sciences University, Portland, OR, USA
| | - Paul Griffin
- Department of Industrial Engineering, Penn State University, 310 Leonhard Bldg., University Park, PA, 16803, USA.
| | - Edwin Rutsky
- Division of Nephrology, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
18
|
Cleary F, Prieto-Merino D, Nitsch D. A systematic review of statistical methodology used to evaluate progression of chronic kidney disease using electronic healthcare records. PLoS One 2022; 17:e0264167. [PMID: 35905096 PMCID: PMC9337679 DOI: 10.1371/journal.pone.0264167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 02/05/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Electronic healthcare records (EHRs) are a useful resource to study chronic kidney disease (CKD) progression prior to starting dialysis, but pose methodological challenges as kidney function tests are not done on everybody, nor are tests evenly spaced. We sought to review previous research of CKD progression using renal function tests in EHRs, investigating methodology used and investigators' recognition of data quality issues. METHODS AND FINDINGS We searched for studies investigating CKD progression using EHRs in 4 databases (Medline, Embase, Global Health and Web of Science) available as of August 2021. Of 80 articles eligible for review, 59 (74%) were published in the last 5.5 years, mostly using EHRs from the UK, USA and East Asian countries. 33 articles (41%) studied rates of change in eGFR, 23 (29%) studied changes in eGFR from baseline and 15 (19%) studied progression to binary eGFR thresholds. Sample completeness data was available in 44 studies (55%) with analysis populations including less than 75% of the target population in 26 studies (33%). Losses to follow-up went unreported in 62 studies (78%) and 11 studies (14%) defined their cohort based on complete data during follow up. Methods capable of handling data quality issues and other methodological challenges were used in a minority of studies. CONCLUSIONS Studies based on renal function tests in EHRs may have overstated reliability of findings in the presence of informative missingness. Future renal research requires more explicit statements of data completeness and consideration of i) selection bias and representativeness of sample to the intended target population, ii) ascertainment bias where follow-up depends on risk, and iii) the impact of competing mortality. We recommend that renal progression studies should use statistical methods that take into account variability in renal function, informative censoring and population heterogeneity as appropriate to the study question.
Collapse
Affiliation(s)
- Faye Cleary
- Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - David Prieto-Merino
- Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Dorothea Nitsch
- Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| |
Collapse
|
19
|
Xie J, Zhang B, Ma J, Zeng D, Lo-Ciganic J. Readmission Prediction for Patients with Heterogeneous Medical History: A Trajectory-Based Deep Learning Approach. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2022. [DOI: 10.1145/3468780] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Hospital readmission refers to the situation where a patient is re-hospitalized with the same primary diagnosis within a specific time interval after discharge. Hospital readmission causes $26 billion preventable expenses to the U.S. health systems annually and often indicates suboptimal patient care. To alleviate those severe financial and health consequences, it is crucial to proactively predict patients’ readmission risk. Such prediction is challenging because the evolution of patients’ medical history is dynamic and complex. The state-of-the-art studies apply statistical models which use static predictors in a period, failing to consider patients’ heterogeneous medical history. Our approach –
Trajectory-BAsed DEep Learning (TADEL)
– is motivated to tackle the deficiencies of the existing approaches by capturing dynamic medical history. We evaluate TADEL on a five-year national Medicare claims dataset including 3.6 million patients per year over all hospitals in the United States, reaching an F1 score of 87.3% and an AUC of 88.4%. Our approach significantly outperforms all the state-of-the-art methods. Our findings suggest that health status factors and insurance coverage are important predictors for readmission. This study contributes to IS literature and analytical methodology by formulating the trajectory-based readmission prediction problem and developing a novel deep-learning-based readmission risk prediction framework. From a health IT perspective, this research delivers implementable methods to assess patients’ readmission risk and take early interventions to avoid potential negative consequences.
Collapse
Affiliation(s)
- Jiaheng Xie
- Lerner College of Business & Economics, University of Delaware, Newark, DE, USA
| | - Bin Zhang
- Eller College of Management, University of Arizona, Tucson, AZ, USA
| | - Jian Ma
- University of Colorado, Colorado Springs, Colorado Springs CO, USA
| | - Daniel Zeng
- Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Jenny Lo-Ciganic
- Department of Pharmaceutical Outcomes & Policy, University of Florida, FL
| |
Collapse
|
20
|
Chuah A, Walters G, Christiadi D, Karpe K, Kennard A, Singer R, Talaulikar G, Ge W, Suominen H, Andrews TD, Jiang S. Machine Learning Improves Upon Clinicians' Prediction of End Stage Kidney Disease. Front Med (Lausanne) 2022; 9:837232. [PMID: 35372378 PMCID: PMC8965763 DOI: 10.3389/fmed.2022.837232] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 02/18/2022] [Indexed: 11/30/2022] Open
Abstract
Background and Objectives Chronic kidney disease progression to ESKD is associated with a marked increase in mortality and morbidity. Its progression is highly variable and difficult to predict. Methods This is an observational, retrospective, single-centre study. The cohort was patients attending hospital and nephrology clinic at The Canberra Hospital from September 1996 to March 2018. Demographic data, vital signs, kidney function test, proteinuria, and serum glucose were extracted. The model was trained on the featurised time series data with XGBoost. Its performance was compared against six nephrologists and the Kidney Failure Risk Equation (KFRE). Results A total of 12,371 patients were included, with 2,388 were found to have an adequate density (three eGFR data points in the first 2 years) for subsequent analysis. Patients were divided into 80%/20% ratio for training and testing datasets. ML model had superior performance than nephrologist in predicting ESKD within 2 years with 93.9% accuracy, 60% sensitivity, 97.7% specificity, 75% positive predictive value. The ML model was superior in all performance metrics to the KFRE 4- and 8-variable models. eGFR and glucose were found to be highly contributing to the ESKD prediction performance. Conclusions The computational predictions had higher accuracy, specificity and positive predictive value, which indicates the potential integration into clinical workflows for decision support.
Collapse
Affiliation(s)
- Aaron Chuah
- Department of Immunology and Infectious Disease, John Curtin School of Medical Research, Australian National University (ANU), Canberra, ACT, Australia
| | - Giles Walters
- Department of Renal Medicine, The Canberra Hospital, Garran, ACT, Australia
| | - Daniel Christiadi
- Department of Renal Medicine, The Canberra Hospital, Garran, ACT, Australia
| | - Krishna Karpe
- Department of Renal Medicine, The Canberra Hospital, Garran, ACT, Australia
| | - Alice Kennard
- Department of Renal Medicine, The Canberra Hospital, Garran, ACT, Australia
| | - Richard Singer
- Department of Renal Medicine, The Canberra Hospital, Garran, ACT, Australia
| | - Girish Talaulikar
- Department of Renal Medicine, The Canberra Hospital, Garran, ACT, Australia
| | - Wenbo Ge
- School of Computing, Australian National University, ACT, Australia
| | - Hanna Suominen
- School of Computing, Australian National University, ACT, Australia.,Department of Computing, University of Turku, Turku, Finland
| | - T Daniel Andrews
- Department of Immunology and Infectious Disease, John Curtin School of Medical Research, Australian National University (ANU), Canberra, ACT, Australia.,Centre for Personalised Immunology, Australian National University (ANU), Canberra, ACT, Australia
| | - Simon Jiang
- Department of Immunology and Infectious Disease, John Curtin School of Medical Research, Australian National University (ANU), Canberra, ACT, Australia.,Department of Renal Medicine, The Canberra Hospital, Garran, ACT, Australia.,Centre for Personalised Immunology, Australian National University (ANU), Canberra, ACT, Australia
| |
Collapse
|
21
|
Uchida T, Fujiwara K, Nishioji K, Kobayashi M, Kano M, Seko Y, Yamaguchi K, Itoh Y, Kadotani H. Medical checkup data analysis method based on LiNGAM and its application to nonalcoholic fatty liver disease. Artif Intell Med 2022; 128:102310. [DOI: 10.1016/j.artmed.2022.102310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 03/24/2022] [Accepted: 04/17/2022] [Indexed: 11/02/2022]
|
22
|
Oh EJ, Parikh RB, Chivers C, Chen J. Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology. JCO Clin Cancer Inform 2021; 5:1015-1023. [PMID: 34591602 PMCID: PMC8812620 DOI: 10.1200/cci.21.00077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 07/24/2021] [Accepted: 08/26/2021] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Machine learning models developed from electronic health records data have been increasingly used to predict risk of mortality for general oncology patients. But these models may have suboptimal performance because of patient heterogeneity. The objective of this work is to develop a new modeling approach to predicting short-term mortality that accounts for heterogeneity across multiple subgroups in the presence of a large number of electronic health record predictors. METHODS We proposed a two-stage approach to addressing heterogeneity among oncology patients of different cancer types for predicting their risk of mortality. Structured data were extracted from the University of Pennsylvania Health System for 20,723 patients of 11 cancer types, where 1,340 (6.5%) patients were deceased. We first modeled the overall risk for all patients without differentiating cancer types, as is done in the current practice. We then developed cancer type-specific models using the overall risk score as a predictor along with preselected type-specific predictors. The overall and type-specific models were compared with respect to discrimination using the area under the precision-recall curve (AUPRC) and calibration using the calibration slope. We also proposed metrics that characterize the degree of risk heterogeneity by comparing risk predictors in the overall and type-specific models. RESULTS The two-stage modeling resulted in improved calibration and discrimination across all 11 cancer types. The improvement in AUPRC was significant for hematologic malignancies including leukemia, lymphoma, and myeloma. For instance, the AUPRC increased from 0.358 to 0.519 (∆ = 0.161; 95% CI, 0.102 to 0.224) and from 0.299 to 0.354 (∆ = 0.055; 95% CI, 0.009 to 0.107) for leukemia and lymphoma, respectively. For all 11 cancer types, the two-stage approach generated well-calibrated risks. A high degree of heterogeneity between type-specific and overall risk predictors was observed for most cancer types. CONCLUSION Our two-stage modeling approach that accounts for cancer type-specific risk heterogeneity has improved calibration and discrimination than a model agnostic to cancer types.
Collapse
Affiliation(s)
- Eun Jeong Oh
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Ravi B. Parikh
- Department of Medical Ethics and Health Policy, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Corey Chivers
- University of Pennsylvania Health System, Philadelphia, PA
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| |
Collapse
|
23
|
Langerhuizen DW, Janssen SJ, Kortlever JT, Ring D, Kerkhoffs GM, Jaarsma RL, Doornberg JN. Factors Associated with a Recommendation for Operative Treatment for Fracture of the Distal Radius. J Wrist Surg 2021; 10:316-321. [PMID: 34381635 PMCID: PMC8328550 DOI: 10.1055/s-0041-1725962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 01/21/2021] [Indexed: 10/21/2022]
Abstract
Background Evidence suggests that there is substantial and unexplained surgeon-to-surgeon variation in recommendation of operative treatment for fractures of the distal radius. We studied (1) what factors are associated with recommendation for operative treatment of a fracture of the distal radius and (2) which factors are rated as the most influential on recommendation of operative treatment. Methods One-hundred thirty-one upper extremity and fracture surgeons evaluated 20 fictitious patient scenarios with randomly assigned factors (e.g., personal, clinical, and radiologic factors) for patients with a fracture of the distal radius. They addressed the following questions: (1) Do you recommend operative treatment for this patient (yes/no)? We determined the influence of each factor on this recommendation using random forest algorithms. Also, participants rated the influence of each factor-excluding age and sex- on a scale from 0 (not at all important) to 10 (extremely important). Results Random forest algorithms determined that age and angulation were having the most influence on recommendation for operative treatment of a fracture of the distal radius. Angulation on the lateral radiograph and presence or absence of lunate subluxation were rated as having the greatest influence and smoking status and stress levels the lowest influence on advice to patients. Conclusions The observation that-other than age-personal factors have limited influence on surgeon recommendations for surgery may reflect how surgeon cognitive biases, personal preferences, different perspectives, and incentives may contribute to variations in care. Future research can determine whether decision aids-those that use patient-specific probabilities based on predictive analytics in particular-might help match patient treatment choices to what matters most to them, in part by helping to neutralize the influence of common misconceptions as well as surgeon bias and incentives. Level of Evidence There is no level of evidence for the study.
Collapse
Affiliation(s)
- David W.G. Langerhuizen
- Department of Orthopaedic & Trauma Surgery, Flinders University, Flinders Medical Centre, Adelaide, Australia
| | - Stein J. Janssen
- Department of Orthopaedic Surgery, Amsterdam Movement Sciences (AMS), Amsterdam University Medical Centre, Amsterdam, The Netherlands
| | - Joost T.P. Kortlever
- Department of Surgery and Perioperative Care, Dell Medical School, The University of Texas at Austin, Austin, Texas
| | - David Ring
- Department of Surgery and Perioperative Care, Dell Medical School, The University of Texas at Austin, Austin, Texas
| | - Gino M.M.J. Kerkhoffs
- Department of Orthopaedic Surgery, Amsterdam Movement Sciences (AMS), Amsterdam University Medical Centre, Amsterdam, The Netherlands
| | - Ruurd L. Jaarsma
- Department of Orthopaedic & Trauma Surgery, Flinders University, Flinders Medical Centre, Adelaide, Australia
| | - Job N. Doornberg
- Department of Orthopaedic & Trauma Surgery, Flinders University, Flinders Medical Centre, Adelaide, Australia
| |
Collapse
|
24
|
Shang Y, Tian Y, Zhou M, Zhou T, Lyu K, Wang Z, Xin R, Liang T, Zhu S, Li J. EHR-Oriented Knowledge Graph System: Toward Efficient Utilization of Non-Used Information Buried in Routine Clinical Practice. IEEE J Biomed Health Inform 2021; 25:2463-2475. [PMID: 34057901 DOI: 10.1109/jbhi.2021.3085003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Non-used clinical information has negative implications on healthcare quality. Clinicians pay priority attention to clinical information relevant to their specialties during routine clinical practices but may be insensitive or less concerned about information showing disease risks beyond their specialties, resulting in delayed and missed diagnoses or improper management. In this study, we introduced an electronic health record (EHR)-oriented knowledge graph system to efficiently utilize non-used information buried in EHRs. EHR data were transformed into a semantic patient-centralized information model under the ontology structure of a knowledge graph. The knowledge graph then creates an EHR data trajectory and performs reasoning through semantic rules to identify important clinical findings within EHR data. A graphical reasoning pathway illustrates the reasoning footage and explains the clinical significance for clinicians to better understand the neglected information. An application study was performed to evaluate unconsidered chronic kidney disease (CKD) reminding for non-nephrology clinicians to identify important neglected information. The study covered 71,679 patients in non-nephrology departments. The system identified 2,774 patients meeting CKD diagnosis criteria and 10,377 patients requiring high attention. A follow-up study of 5,439 patients showed that 82.1% of patients who met the diagnosis criteria and 61.4% of patients requiring high attention were confirmed to be CKD positive during follow-up research. The application demonstrated that the proposed approach is feasible and effective in clinical information utilization. Additionally, it's valuable as an explainable artificial intelligence to provide interpretable recommendations for specialist physicians to understand the importance of non-used data and make comprehensive decisions.
Collapse
|
25
|
Yang X, Zhang J, Chen S, Weissman S, Olatosi B, Li X. Utilizing electronic health record data to understand comorbidity burden among people living with HIV: a machine learning approach. AIDS 2021; 35:S39-S51. [PMID: 33867488 PMCID: PMC8058944 DOI: 10.1097/qad.0000000000002736] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
OBJECTIVES An understanding of the predictors of comorbidity among people living with HIV (PLWH) is critical for effective HIV care management. In this study, we identified predictors of comorbidity burden among PLWH based on machine learning models with electronic health record (EHR) data. METHODS The study population are individuals with a HIV diagnosis between January 2005 and December 2016 in South Carolina (SC). The change of comorbidity burden, represented by the Charlson Comorbidity Index (CCI) score, was measured by the score difference between pre- and post-HIV diagnosis, and dichotomized into a binary outcome variable. Thirty-five risk predictors from multiple domains were used to predict the increase in comorbidity burden based on the logistic least absolute shrinkage and selection operator (Lasso) regression analysis using 80% data for model development and 20% data for validation. RESULTS Of 8253 PLWH, the mean value of the CCI score difference was 0.8 ± 1.9 (range from 0 to 21) with 2328 (28.2%) patients showing an increase in CCI score after HIV diagnosis. Top predictors for an increase in CCI score using the LASSO model included older age at HIV diagnosis, positive family history of chronic conditions, tobacco use, longer duration with retention in care, having PEBA insurance, having low recent CD4+ cell count and duration of viral suppression. CONCLUSION The application of machine learning methods to EHR data could identify important predictors of increased comorbidity burden among PLWH with high accuracy. Results may enhance the understanding of comorbidities and provide the evidence based data for integrated HIV and comorbidity care management of PLWH.
Collapse
Affiliation(s)
- Xueying Yang
- South Carolina SmartState Center for Healthcare Quality, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA, 29208
- Department of Health Promotion, Education and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA, 29208
| | - Jiajia Zhang
- Department of Health Promotion, Education and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA, 29208
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA, 29208
| | - Shujie Chen
- Department of Health Promotion, Education and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA, 29208
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA, 29208
| | - Sharon Weissman
- Department of Internal Medicine, School of Medicine, University of South Carolina, Columbia, SC, USA, 29208
| | - Bankole Olatosi
- South Carolina SmartState Center for Healthcare Quality, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA, 29208
- Department of Health Services Policy and Management, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA, 29208
| | - Xiaoming Li
- South Carolina SmartState Center for Healthcare Quality, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA, 29208
- Department of Health Promotion, Education and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA, 29208
| |
Collapse
|
26
|
Toth EG, Gibbs D, Moczygemba J, McLeod A. Decision tree modeling in R software to aid clinical decision making. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00542-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
27
|
Li F, Du J, He Y, Song HY, Madkour M, Rao G, Xiang Y, Luo Y, Chen HW, Liu S, Wang L, Liu H, Xu H, Tao C. Time event ontology (TEO): to support semantic representation and reasoning of complex temporal relations of clinical events. J Am Med Inform Assoc 2021; 27:1046-1056. [PMID: 32626903 DOI: 10.1093/jamia/ocaa058] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 04/03/2020] [Accepted: 04/13/2020] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE The goal of this study is to develop a robust Time Event Ontology (TEO), which can formally represent and reason both structured and unstructured temporal information. MATERIALS AND METHODS Using our previous Clinical Narrative Temporal Relation Ontology 1.0 and 2.0 as a starting point, we redesigned concept primitives (clinical events and temporal expressions) and enriched temporal relations. Specifically, 2 sets of temporal relations (Allen's interval algebra and a novel suite of basic time relations) were used to specify qualitative temporal order relations, and a Temporal Relation Statement was designed to formalize quantitative temporal relations. Moreover, a variety of data properties were defined to represent diversified temporal expressions in clinical narratives. RESULTS TEO has a rich set of classes and properties (object, data, and annotation). When evaluated with real electronic health record data from the Mayo Clinic, it could faithfully represent more than 95% of the temporal expressions. Its reasoning ability was further demonstrated on a sample drug adverse event report annotated with respect to TEO. The results showed that our Java-based TEO reasoner could answer a set of frequently asked time-related queries, demonstrating that TEO has a strong capability of reasoning complex temporal relations. CONCLUSION TEO can support flexible temporal relation representation and reasoning. Our next step will be to apply TEO to the natural language processing field to facilitate automated temporal information annotation, extraction, and timeline reasoning to better support time-based clinical decision-making.
Collapse
Affiliation(s)
- Fang Li
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Jingcheng Du
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Yongqun He
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Hsing-Yi Song
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | | | - Guozheng Rao
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yang Xiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Yi Luo
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Henry W Chen
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA.,University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hua Xu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
28
|
Abstract
Machine learning shows enormous potential in facilitating decision-making regarding kidney diseases. With the development of data preservation and processing, as well as the advancement of machine learning algorithms, machine learning is expected to make remarkable breakthroughs in nephrology. Machine learning models have yielded many preliminaries to moderate and several excellent achievements in the fields, including analysis of renal pathological images, diagnosis and prognosis of chronic kidney diseases and acute kidney injury, as well as management of dialysis treatments. However, it is just scratching the surface of the field; at the same time, machine learning and its applications in renal diseases are facing a number of challenges. In this review, we discuss the application status, challenges and future prospects of machine learning in nephrology to help people further understand and improve the capacity for prediction, detection, and care quality in kidney diseases.
Collapse
|
29
|
Taşkın Z. Forecasting the future of library and information science and its sub-fields. Scientometrics 2020; 126:1527-1551. [PMID: 33353991 PMCID: PMC7745590 DOI: 10.1007/s11192-020-03800-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 11/16/2020] [Indexed: 11/29/2022]
Abstract
Forecasting is one of the methods applied in many studies in the library and information science (LIS) field for numerous purposes, from making predictions of the next Nobel laureates to potential technological developments. This study sought to draw a picture for the future of the LIS field and its sub-fields by analysing 97 years of publication and citation patterns. The core Web of Science indexes were used as the data source, and 123,742 articles were examined in-depth for time series analysis. The social network analysis method was used for sub-field classification. The field was divided into four sub-fields: (1) librarianship and law librarianship, (2) health information in LIS, (3) scientometrics and information retrieval and (4) management and information systems. The results of the study show that the LIS sub-fields are completely different from each other in terms of their publication and citation patterns, and all the sub-fields have different dynamics. Furthermore, the number of publications, references and citations will increase significantly in the future. It is expected that more scholars will work together. The future subjects of the LIS field show astonishing diversity from fake news to predatory journals, open government, e-learning and electronic health records. However, the findings prove that publish or perish culture will shape the field. Therefore, it is important to go beyond numbers. It can only be achieved by understanding publication and citation patterns of the field and developing research policies accordingly.
Collapse
Affiliation(s)
- Zehra Taşkın
- Scholarly Communication Research Group, Adam Mickiewicz University in Poznań, Poznań, Poland
| |
Collapse
|
30
|
Martin N, De Weerdt J, Fernández-Llatas C, Gal A, Gatta R, Ibáñez G, Johnson O, Mannhardt F, Marco-Ruiz L, Mertens S, Munoz-Gama J, Seoane F, Vanthienen J, Wynn MT, Boilève DB, Bergs J, Joosten-Melis M, Schretlen S, Van Acker B. Recommendations for enhancing the usability and understandability of process mining in healthcare. Artif Intell Med 2020; 109:101962. [PMID: 34756220 DOI: 10.1016/j.artmed.2020.101962] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 07/19/2020] [Accepted: 09/22/2020] [Indexed: 11/28/2022]
Abstract
Healthcare organizations are confronted with challenges including the contention between tightening budgets and increased care needs. In the light of these challenges, they are becoming increasingly aware of the need to improve their processes to ensure quality of care for patients. To identify process improvement opportunities, a thorough process analysis is required, which can be based on real-life process execution data captured by health information systems. Process mining is a research field that focuses on the development of techniques to extract process-related insights from process execution data, providing valuable and previously unknown information to instigate evidence-based process improvement in healthcare. However, despite the potential of process mining, its uptake in healthcare organizations outside case studies in a research context is rather limited. This observation was the starting point for an international brainstorm seminar. Based on the seminar's outcomes and with the ambition to stimulate a more widespread use of process mining in healthcare, this paper formulates recommendations to enhance the usability and understandability of process mining in healthcare. These recommendations are mainly targeted towards process mining researchers and the community to consider when developing a new research agenda for process mining in healthcare. Moreover, a limited number of recommendations are directed towards healthcare organizations and health information systems vendors, when shaping an environment to enable the continuous use of process mining.
Collapse
Affiliation(s)
- Niels Martin
- Research Foundation Flanders (FWO), Belgium; Hasselt University, Belgium; Vrije Universiteit Brussel, Belgium.
| | | | | | - Avigdor Gal
- Technion - Israel Institute of Technology, Israel.
| | - Roberto Gatta
- Centre Hopitalier Universitaire de Vaudois, Switzerland; Università degli Studi di Brescia, Italy.
| | | | | | | | | | | | | | - Fernando Seoane
- Karolinska Institutet, Sweden; Karolinska University Hospital, Sweden; University of Borås, Sweden.
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Zhou F, Gillespie A, Gligorijevic D, Gligorijevic J, Obradovic Z. Use of disease embedding technique to predict the risk of progression to end-stage renal disease. J Biomed Inform 2020; 105:103409. [PMID: 32304869 PMCID: PMC9885429 DOI: 10.1016/j.jbi.2020.103409] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 03/18/2020] [Accepted: 03/19/2020] [Indexed: 02/01/2023]
Abstract
The accurate prediction of progression of Chronic Kidney Disease (CKD) to End Stage Renal Disease (ESRD) is of great importance to clinicians and a challenge to researchers as there are many causes and even more comorbidities that are ignored by the traditional prediction models. We examine whether utilizing a novel low-dimensional embedding model disease2disease (D2D) learned from a large-scale electronic health records (EHRs) could well clusters the causes of kidney diseases and comorbidities and further improve prediction of progression of CKD to ESRD compared to traditional risk factors. The study cohort consists of 2,507 hospitalized Stage 3 CKD patients of which 1,375 (54.8%) progressed to ESRD within 3 years. We evaluated the proposed unsupervised learning framework by applying a regularized logistic regression model and a cox proportional hazard model respectively, and compared the accuracies with the ones obtained by four alternative models. The results demonstrate that the learned low-dimensional disease representations from EHRs can capture the relationship between vast arrays of diseases, and can outperform traditional risk factors in a CKD progression prediction model. These results can be used both by clinicians in patient care and researchers to develop new prediction methods.
Collapse
Affiliation(s)
- Fang Zhou
- School of Data Science & Engineering, East China Normal University, Shanghai, China
| | - Avrum Gillespie
- Division of Nephrology, Hypertension, and Kidney Transplantation, Department of Medicine, Lewis Katz School of Medicine, Temple University, Philadelphia, PA
| | - Djordje Gligorijevic
- Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, PA
| | - Jelena Gligorijevic
- Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, PA
| | - Zoran Obradovic
- Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, PA
| |
Collapse
|
32
|
Mining incomplete clinical data for the early assessment of Kawasaki disease based on feature clustering and convolutional neural networks. Artif Intell Med 2020; 105:101859. [DOI: 10.1016/j.artmed.2020.101859] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 02/26/2020] [Accepted: 04/03/2020] [Indexed: 12/20/2022]
|
33
|
Feller DJ, Bear Don't Walk Iv OJ, Zucker J, Yin MT, Gordon P, Elhadad N. Detecting Social and Behavioral Determinants of Health with Structured and Free-Text Clinical Data. Appl Clin Inform 2020; 11:172-181. [PMID: 32131117 DOI: 10.1055/s-0040-1702214] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
BACKGROUND Social and behavioral determinants of health (SBDH) are environmental and behavioral factors that often impede disease management and result in sexually transmitted infections. Despite their importance, SBDH are inconsistently documented in electronic health records (EHRs) and typically collected only in an unstructured format. Evidence suggests that structured data elements present in EHRs can contribute further to identify SBDH in the patient record. OBJECTIVE Explore the automated inference of both the presence of SBDH documentation and individual SBDH risk factors in patient records. Compare the relative ability of clinical notes and structured EHR data, such as laboratory measurements and diagnoses, to support inference. METHODS We attempt to infer the presence of SBDH documentation in patient records, as well as patient status of 11 SBDH, including alcohol abuse, homelessness, and sexual orientation. We compare classification performance when considering clinical notes only, structured data only, and notes and structured data together. We perform an error analysis across several SBDH risk factors. RESULTS Classification models inferring the presence of SBDH documentation achieved good performance (F1 score: 92.7-78.7; F1 considered as the primary evaluation metric). Performance was variable for models inferring patient SBDH risk status; results ranged from F1 = 82.7 for LGBT (lesbian, gay, bisexual, and transgender) status to F1 = 28.5 for intravenous drug use. Error analysis demonstrated that lexical diversity and documentation of historical SBDH status challenge inference of patient SBDH status. Three of five classifiers inferring topic-specific SBDH documentation and 10 of 11 patient SBDH status classifiers achieved highest performance when trained using both clinical notes and structured data. CONCLUSION Our findings suggest that combining clinical free-text notes and structured data provide the best approach in classifying patient SBDH status. Inferring patient SBDH status is most challenging among SBDH with low prevalence and high lexical diversity.
Collapse
Affiliation(s)
- Daniel J Feller
- Department of Biomedical Informatics, Columbia University, New York, New York, United States
| | | | - Jason Zucker
- Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
| | - Michael T Yin
- Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
| | - Peter Gordon
- Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, New York, United States
| |
Collapse
|
34
|
Pham T, Tao X, Zhang J, Yong J. Constructing a knowledge-based heterogeneous information graph for medical health status classification. Health Inf Sci Syst 2020; 8:10. [PMID: 32117570 DOI: 10.1007/s13755-020-0100-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Accepted: 01/23/2020] [Indexed: 02/06/2023] Open
Abstract
Applying Pearson correlation and semantic relations in building a heterogeneous information graph (HIG) to develop a classification model has achieved a notable performance in improving the accuracy of predicting the status of health risks. In this study, the approach that was used, integrated knowledge of the medical domain as well as taking advantage of applying Pearson correlation and semantic relations in building a classification model for diagnosis. The research mined knowledge which was extracted from titles and abstracts of MEDLINE to discover how to assess the links between objects relating to medical concepts. A knowledge-base HIG model then was developed for the prediction of a patient's health status. The results of the experiment showed that the knowledge-base model was superior to the baseline model and has demonstrated that the knowledge-base could help improve the performance of the classification model. The contribution of this study has been to provide a framework for applying a knowledge-base in the classification model which helps these models achieve the best performance of predictions. This study has also contributed a model to medical practice to help practitioners become more confident in making final decisions in diagnosing illness. Moreover, this study affirmed that biomedical literature could assist in building a classification model. This contribution will be advantageous for future researchers in mining the knowledge-base to develop different kinds of classification models.
Collapse
Affiliation(s)
- Thuan Pham
- University of Southern Queensland, Toowoomba, Australia
| | - Xiaohui Tao
- University of Southern Queensland, Toowoomba, Australia
| | - Ji Zhang
- University of Southern Queensland, Toowoomba, Australia
| | - Jianming Yong
- University of Southern Queensland, Toowoomba, Australia
| |
Collapse
|
35
|
Senteio CR, Callahan MB. Supporting quality care for ESRD patients: the social worker can help address barriers to advance care planning. BMC Nephrol 2020; 21:55. [PMID: 32075587 PMCID: PMC7031953 DOI: 10.1186/s12882-020-01720-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Accepted: 02/11/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Advance Care Planning (ACP) is essential for preparation for end-of-life. It is a means through which patients clarify their treatment wishes. ACP is a patient-centered, dynamic process involving patients, their families, and caregivers. It is designed to 1) clarify goals of care, 2) increase patient agency over their care and treatments, and 3) help prepare for death. ACP is an active process; the end-stage renal disease (ESRD) illness trajectory creates health circumstances that necessitate that caregivers assess and nurture patient readiness for ACP discussions. Effective ACP enhances patient engagement and quality of life resulting in better quality of care. MAIN BODY Despite these benefits, ACP is not consistently completed. Clinical, technical, and social barriers result in key challenges to quality care. First, ACP requires caregivers to have end-of-life conversations that they lack the training to perform and often find difficult. Second, electronic health record (EHR) tools do not enable the efficient exchange of requisite psychosocial information such as treatment burden, patient preferences, health beliefs, priorities, and understanding of prognosis. This results in a lack of information available to enable patients and their families to understand the impact of illness and treatment options. Third, culture plays a vital role in end-of-life conversations. Social barriers include circumstances when a patient's cultural beliefs or value system conflicts with the caregiver's beliefs. Caregivers describe this disconnect as a key barrier to ACP. Consistent ACP is integral to quality patient-centered care and social workers' training and clinical roles uniquely position them to support ACP. CONCLUSION In this debate, we detail the known barriers to completing ACP for ESRD patients, and we describe its benefits. We detail how social workers, in particular, can support health outcomes by promoting the health information exchange that occurs during these sensitive conversations with patients, their family, and care team members. We aim to inform clinical social workers of this opportunity to enhance quality care by engaging in ACP. We describe research to help further elucidate barriers, and how researchers and caregivers can design and deliver interventions that support ACP to address this persistent challenge to quality end-of-life care.
Collapse
Affiliation(s)
- Charles R Senteio
- School of Communication and Information, Rutgers University, 4 Huntington Street, New Brunswick, NJ, 08901, USA.
| | - Mary Beth Callahan
- Dallas Nephrology Associates, 411 North Washington Street, Suite #7000, Dallas, TX, 75246, USA
| |
Collapse
|
36
|
Song X, Waitman LR, Yu AS, Robbins DC, Hu Y, Liu M. Longitudinal Risk Prediction of Chronic Kidney Disease in Diabetic Patients Using a Temporal-Enhanced Gradient Boosting Machine: Retrospective Cohort Study. JMIR Med Inform 2020; 8:e15510. [PMID: 32012067 PMCID: PMC7055762 DOI: 10.2196/15510] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 10/31/2019] [Accepted: 10/31/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Artificial intelligence-enabled electronic health record (EHR) analysis can revolutionize medical practice from the diagnosis and prediction of complex diseases to making recommendations in patient care, especially for chronic conditions such as chronic kidney disease (CKD), which is one of the most frequent complications in patients with diabetes and is associated with substantial morbidity and mortality. OBJECTIVE The longitudinal prediction of health outcomes requires effective representation of temporal data in the EHR. In this study, we proposed a novel temporal-enhanced gradient boosting machine (GBM) model that dynamically updates and ensembles learners based on new events in patient timelines to improve the prediction accuracy of CKD among patients with diabetes. METHODS Using a broad spectrum of deidentified EHR data on a retrospective cohort of 14,039 adult patients with type 2 diabetes and GBM as the base learner, we validated our proposed Landmark-Boosting model against three state-of-the-art temporal models for rolling predictions of 1-year CKD risk. RESULTS The proposed model uniformly outperformed other models, achieving an area under receiver operating curve of 0.83 (95% CI 0.76-0.85), 0.78 (95% CI 0.75-0.82), and 0.82 (95% CI 0.78-0.86) in predicting CKD risk with automatic accumulation of new data in later years (years 2, 3, and 4 since diabetes mellitus onset, respectively). The Landmark-Boosting model also maintained the best calibration across moderate- and high-risk groups and over time. The experimental results demonstrated that the proposed temporal model can not only accurately predict 1-year CKD risk but also improve performance over time with additionally accumulated data, which is essential for clinical use to improve renal management of patients with diabetes. CONCLUSIONS Incorporation of temporal information in EHR data can significantly improve predictive model performance and will particularly benefit patients who follow-up with their physicians as recommended.
Collapse
Affiliation(s)
- Xing Song
- University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, Kansas City, KS, United States
| | - Lemuel R Waitman
- University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, Kansas City, KS, United States
| | - Alan Sl Yu
- University of Kansas Medical Center, Division of Nephrology and Hypertension and the Kidney Institute, Kansas City, KS, United States
| | - David C Robbins
- University of Kansas Medical Center, Diabetes Institute, Kansas City, KS, United States
| | - Yong Hu
- Jinan University, Big Data Decision Institute, Guangzhou, China
| | - Mei Liu
- University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, Kansas City, KS, United States
| |
Collapse
|
37
|
Chan L, Beers K, Yau AA, Chauhan K, Duffy Á, Chaudhary K, Debnath N, Saha A, Pattharanitima P, Cho J, Kotanko P, Federman A, Coca SG, Van Vleck T, Nadkarni GN. Natural language processing of electronic health records is superior to billing codes to identify symptom burden in hemodialysis patients. Kidney Int 2019; 97:383-392. [PMID: 31883805 DOI: 10.1016/j.kint.2019.10.023] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 09/27/2019] [Accepted: 10/18/2019] [Indexed: 02/07/2023]
Abstract
Symptoms are common in patients on maintenance hemodialysis but identification is challenging. New informatics approaches including natural language processing (NLP) can be utilized to identify symptoms from narrative clinical documentation. Here we utilized NLP to identify seven patient symptoms from notes of maintenance hemodialysis patients of the BioMe Biobank and validated our findings using a separate cohort and the MIMIC-III database. NLP performance was compared for symptom detection with International Classification of Diseases (ICD)-9/10 codes and the performance of both methods were validated against manual chart review. From 1034 and 519 hemodialysis patients within BioMe and MIMIC-III databases, respectively, the most frequently identified symptoms by NLP were fatigue, pain, and nausea/vomiting. In BioMe, sensitivity for NLP (0.85 - 0.99) was higher than for ICD codes (0.09 - 0.59) for all symptoms with similar results in the BioMe validation cohort and MIMIC-III. ICD codes were significantly more specific for nausea/vomiting in BioMe and more specific for fatigue, depression, and pain in the MIMIC-III database. A majority of patients in both cohorts had four or more symptoms. Patients with more symptoms identified by NLP, ICD, and chart review had more clinical encounters. NLP had higher specificity in inpatient notes but higher sensitivity in outpatient notes and performed similarly across pain severity subgroups. Thus, NLP had higher sensitivity compared to ICD codes for identification of seven common hemodialysis-related symptoms, with comparable specificity between the two methods. Hence, NLP may be useful for the high-throughput identification of patient-centered outcomes when using electronic health records.
Collapse
Affiliation(s)
- Lili Chan
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; The Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
| | - Kelly Beers
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Amy A Yau
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Kinsuk Chauhan
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Áine Duffy
- The Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Kumardeep Chaudhary
- The Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Neha Debnath
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Aparna Saha
- The Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Pattharawin Pattharanitima
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Judy Cho
- The Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Peter Kotanko
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; Renal Research Institute, New York, New York, USA
| | - Alex Federman
- Division of General Internal Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Steven G Coca
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Tielman Van Vleck
- The Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Girish N Nadkarni
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA; The Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
| |
Collapse
|
38
|
A Long Short-Term Memory Ensemble Approach for Improving the Outcome Prediction in Intensive Care Unit. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:8152713. [PMID: 31827589 PMCID: PMC6885179 DOI: 10.1155/2019/8152713] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Revised: 09/23/2019] [Accepted: 10/08/2019] [Indexed: 12/30/2022]
Abstract
In intensive care unit (ICU), it is essential to predict the mortality of patients and mathematical models aid in improving the prognosis accuracy. Recently, recurrent neural network (RNN), especially long short-term memory (LSTM) network, showed advantages in sequential modeling and was promising for clinical prediction. However, ICU data are highly complex due to the diverse patterns of diseases; therefore, instead of single LSTM model, an ensemble algorithm of LSTM (eLSTM) is proposed, utilizing the superiority of the ensemble framework to handle the diversity of clinical data. The eLSTM algorithm was evaluated by the acknowledged database of ICU admissions Medical Information Mart for Intensive Care III (MIMIC-III). The investigation in total of 18415 cases shows that compared with clinical scoring systems SAPS II, SOFA, and APACHE II, random forests classification algorithm, and the single LSTM classifier, the eLSTM model achieved the superior performance with the largest value of area under the receiver operating characteristic curve (AUROC) of 0.8451 and the largest area under the precision-recall curve (AUPRC) of 0.4862. Furthermore, it offered an early prognosis of ICU patients. The results demonstrate that the eLSTM is capable of dynamically predicting the mortality of patients in complex clinical situations.
Collapse
|
39
|
Alizadehsani R, Roshanzamir M, Abdar M, Beykikhoshk A, Khosravi A, Panahiazar M, Koohestani A, Khozeimeh F, Nahavandi S, Sarrafzadegan N. A database for using machine learning and data mining techniques for coronary artery disease diagnosis. Sci Data 2019; 6:227. [PMID: 31645559 PMCID: PMC6811630 DOI: 10.1038/s41597-019-0206-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 08/16/2019] [Indexed: 12/28/2022] Open
Abstract
We present the coronary artery disease (CAD) database, a comprehensive resource, comprising 126 papers and 68 datasets relevant to CAD diagnosis, extracted from the scientific literature from 1992 and 2018. These data were collected to help advance research on CAD-related machine learning and data mining algorithms, and hopefully to ultimately advance clinical diagnosis and early treatment. To aid users, we have also built a web application that presents the database through various reports.
Collapse
Affiliation(s)
- R Alizadehsani
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - M Roshanzamir
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, 84156-83111, Iran
| | - M Abdar
- Département d'informatique, Université du Québec à Montréal, Montréal, Québec, Canada
| | - A Beykikhoshk
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia
| | - A Khosravi
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - M Panahiazar
- University of California San Francisco, San Francisco, CA, USA.
| | - A Koohestani
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - F Khozeimeh
- Mashhad University of Medical Science, Mashhad, Iran
| | - S Nahavandi
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - N Sarrafzadegan
- Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
- School of Population and Public Health, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
40
|
Korach ZT, Cato KD, Collins SA, Kang MJ, Knaplund C, Dykes PC, Wang L, Schnock KO, Garcia JP, Jia H, Chang F, Schwartz JM, Zhou L. Unsupervised Machine Learning of Topics Documented by Nurses about Hospitalized Patients Prior to a Rapid-Response Event. Appl Clin Inform 2019; 10:952-963. [PMID: 31853936 PMCID: PMC6920051 DOI: 10.1055/s-0039-3401814] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 11/06/2019] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND In the hospital setting, it is crucial to identify patients at risk for deterioration before it fully develops, so providers can respond rapidly to reverse the deterioration. Rapid response (RR) activation criteria include a subjective component ("worried about the patient") that is often documented in nurses' notes and is hard to capture and quantify, hindering active screening for deteriorating patients. OBJECTIVES We used unsupervised machine learning to automatically discover RR event risk/protective factors from unstructured nursing notes. METHODS In this retrospective cohort study, we obtained nursing notes of hospitalized, nonintensive care unit patients, documented from 2015 through 2018 from Partners HealthCare databases. We applied topic modeling to those notes to reveal topics (clusters of associated words) documented by nurses. Two nursing experts named each topic with a representative Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) concept. We used the concepts along with vital signs and demographics in a time-dependent covariates extended Cox model to identify risk/protective factors for RR event risk. RESULTS From a total of 776,849 notes of 45,299 patients, we generated 95 stable topics, of which 80 were mapped to 72 distinct SNOMED CT concepts. Compared with a model containing only demographics and vital signs, the latent topics improved the model's predictive ability from a concordance index of 0.657 to 0.720. Thirty topics were found significantly associated with RR event risk at a 0.05 level, and 11 remained significant after Bonferroni correction of the significance level to 6.94E-04, including physical examination (hazard ratio [HR] = 1.07, 95% confidence interval [CI], 1.03-1.12), informing doctor (HR = 1.05, 95% CI, 1.03-1.08), and seizure precautions (HR = 1.08, 95% CI, 1.04-1.12). CONCLUSION Unsupervised machine learning methods can automatically reveal interpretable and informative signals from free-text and may support early identification of patients at risk for RR events.
Collapse
Affiliation(s)
- Zfania Tom Korach
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Kenrick D. Cato
- School of Nursing, Columbia University, New York, New York, United States
| | - Sarah A. Collins
- School of Nursing, Columbia University, New York, New York, United States
- Department of Biomedical Informatics, Columbia University, New York, New York, United States
| | - Min Jeoung Kang
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | | | - Patricia C. Dykes
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Liqin Wang
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Kumiko O. Schnock
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Jose P. Garcia
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Haomiao Jia
- School of Nursing, Columbia University, New York, New York, United States
| | - Frank Chang
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | | | - Li Zhou
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States
| |
Collapse
|
41
|
Albers DJ, Levine ME, Mamykina L, Hripcsak G. The parameter Houlihan: A solution to high-throughput identifiability indeterminacy for brutally ill-posed problems. Math Biosci 2019; 316:108242. [PMID: 31454628 PMCID: PMC6759390 DOI: 10.1016/j.mbs.2019.108242] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 08/20/2019] [Accepted: 08/22/2019] [Indexed: 12/21/2022]
Abstract
One way to interject knowledge into clinically impactful forecasting is to use data assimilation, a nonlinear regression that projects data onto a mechanistic physiologic model, instead of a set of functions, such as neural networks. Such regressions have an advantage of being useful with particularly sparse, non-stationary clinical data. However, physiological models are often nonlinear and can have many parameters, leading to potential problems with parameter identifiability, or the ability to find a unique set of parameters that minimize forecasting error. The identifiability problems can be minimized or eliminated by reducing the number of parameters estimated, but reducing the number of estimated parameters also reduces the flexibility of the model and hence increases forecasting error. We propose a method, the parameter Houlihan, that combines traditional machine learning techniques with data assimilation, to select the right set of model parameters to minimize forecasting error while reducing identifiability problems. The method worked well: the data assimilation-based glucose forecasts and estimates for our cohort using the Houlihan-selected parameter sets generally also minimize forecasting errors compared to other parameter selection methods such as by-hand parameter selection. Nevertheless, the forecast with the lowest forecast error does not always accurately represent physiology, but further advancements of the algorithm provide a path for improving physiologic fidelity as well. Our hope is that this methodology represents a first step toward combining machine learning with data assimilation and provides a lower-threshold entry point for using data assimilation with clinical data by helping select the right parameters to estimate.
Collapse
Affiliation(s)
- David J Albers
- Department of Biomedical Informatics, Columbia University, 622 West 168th Street, PH-20, New York, NY, USA; Department of Pediatrics, Division of Informatics, University of Colorado Medicine, Mail: F443, 13199 E. Montview Blvd. Ste: 210-12 | Aurora, CO 80045 USA.
| | - Matthew E Levine
- Department of Computational and Mathematical sciences, California Institute of Technology, 1200 E California Blvd M/C 305-16 Pasadena, CA 91125 USA
| | - Lena Mamykina
- Department of Biomedical Informatics, Columbia University, 622 West 168th Street, PH-20, New York, NY, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, 622 West 168th Street, PH-20, New York, NY, USA
| |
Collapse
|
42
|
Tomašev N, Glorot X, Rae JW, Zielinski M, Askham H, Saraiva A, Mottram A, Meyer C, Ravuri S, Protsyuk I, Connell A, Hughes CO, Karthikesalingam A, Cornebise J, Montgomery H, Rees G, Laing C, Baker CR, Peterson K, Reeves R, Hassabis D, King D, Suleyman M, Back T, Nielson C, Ledsam JR, Mohamed S. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 2019; 572:116-119. [PMID: 31367026 PMCID: PMC6722431 DOI: 10.1038/s41586-019-1390-1] [Citation(s) in RCA: 520] [Impact Index Per Article: 104.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Accepted: 06/18/2019] [Indexed: 12/31/2022]
Abstract
The early prediction of deterioration could have an important role in supporting healthcare professionals, as an estimated 11% of deaths in hospital follow a failure to promptly recognize and treat deteriorating patients1. To achieve this goal requires predictions of patient risk that are continuously updated and accurate, and delivered at an individual level with sufficient context and enough time to act. Here we develop a deep learning approach for the continuous risk prediction of future deterioration in patients, building on recent work that models adverse events from electronic health records2-17 and using acute kidney injury-a common and potentially life-threatening condition18-as an exemplar. Our model was developed on a large, longitudinal dataset of electronic health records that cover diverse clinical environments, comprising 703,782 adult patients across 172 inpatient and 1,062 outpatient sites. Our model predicts 55.8% of all inpatient episodes of acute kidney injury, and 90.2% of all acute kidney injuries that required subsequent administration of dialysis, with a lead time of up to 48 h and a ratio of 2 false alerts for every true alert. In addition to predicting future acute kidney injury, our model provides confidence assessments and a list of the clinical features that are most salient to each prediction, alongside predicted future trajectories for clinically relevant blood tests9. Although the recognition and prompt treatment of acute kidney injury is known to be challenging, our approach may offer opportunities for identifying patients at risk within a time window that enables early treatment.
Collapse
Affiliation(s)
| | | | - Jack W Rae
- DeepMind, London, UK
- CoMPLEX, Computer Science, University College London, London, UK
| | | | | | | | | | | | | | | | | | | | | | | | - Hugh Montgomery
- Institute for Human Health and Performance, University College London, London, UK
| | - Geraint Rees
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Chris Laing
- University College London Hospitals, London, UK
| | | | - Kelly Peterson
- VA Salt Lake City Healthcare System, Salt Lake City, UT, USA
- Division of Epidemiology, University of Utah, Salt Lake City, UT, USA
| | - Ruth Reeves
- Department of Veterans Affairs, Nashville, TN, USA
| | | | | | | | | | - Christopher Nielson
- University of Nevada School of Medicine, Reno, NV, USA
- Department of Veterans Affairs, Salt Lake City, UT, USA
| | | | | |
Collapse
|
43
|
Burckhardt P, Nagin D, Vijayasarathy VPR, Padman R. Multi-Trajectory Modeling to Predict Acute Kidney Injury in Chronic Kidney Disease Patients. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:1196-1205. [PMID: 30815162 PMCID: PMC6371306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Risk-stratifying chronic disease patients in real time has the potential to facilitate targeted interventions and improve disease management and outcomes. We apply group-based multi-trajectory modeling to risk stratify patients with chronic kidney disease (CKD) and its major complications into distinct trajectories of disease development and predict acute kidney injury (AKI), a serious, under-diagnosed outcome of CKD that is both preventable and treatable with early detection. Utilizing Electronic Health Record data of 1,947 patients, we identify eight risk groups with distinct trajectories and profiles. We observe that a higher estimated probability of AKI generally coincides with a higher risk group. Overall, at least 75% of patients stabilize into their final groups within less than two years from diagnosis of CKD Stage 3. Model calibration confirms that the estimated outcome probabilities are highly correlated with AKI incidence, providing group-specific and individual level predictions to improve clinical management of AKI in CKD patients.
Collapse
Affiliation(s)
- Philipp Burckhardt
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Daniel Nagin
- The H. John Heinz III College of Information Systems and Public Policy
| | | | - Rema Padman
- The H. John Heinz III College of Information Systems and Public Policy
| |
Collapse
|
44
|
Li J, Liu M, Li X, Liu X, Liu J. Developing Embedded Taxonomy and Mining Patients' Interests From Web-Based Physician Reviews: Mixed-Methods Approach. J Med Internet Res 2018; 20:e254. [PMID: 30115610 PMCID: PMC6117498 DOI: 10.2196/jmir.8868] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 02/08/2018] [Accepted: 06/21/2018] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Web-based physician reviews are invaluable gold mines that merit further investigation. Although many studies have explored the text information of physician reviews, very few have focused on developing a systematic topic taxonomy embedded in physician reviews. The first step toward mining physician reviews is to determine how the natural structure or dimensions is embedded in reviews. Therefore, it is relevant to develop the topic taxonomy rigorously and systematically. OBJECTIVE This study aims to develop a hierarchical topic taxonomy to uncover the latent structure of physician reviews and illustrate its application for mining patients' interests based on the proposed taxonomy and algorithm. METHODS Data comprised 122,716 physician reviews, including reviews of 8501 doctors from a leading physician review website in China (haodf.com), collected between 2007 and 2015. Mixed methods, including a literature review, data-driven-based topic discovery, and human annotation were used to develop the physician review topic taxonomy. RESULTS The identified taxonomy included 3 domains or high-level categories and 9 subtopics or low-level categories. The physician-related domain included the categories of medical ethics, medical competence, communication skills, medical advice, and prescriptions. The patient-related domain included the categories of the patient profile, symptoms, diagnosis, and pathogenesis. The system-related domain included the categories of financing and operation process. The F-measure of the proposed classification algorithm reached 0.816 on average. Symptoms (Cohen d=1.58, Δu=0.216, t=229.75, and P<.001) are more often mentioned by patients with acute diseases, whereas communication skills (Cohen d=-0.29, Δu=-0.038, t=-42.01, and P<.001), financing (Cohen d=-0.68, Δu=-0.098, t=-99.26, and P<.001), and diagnosis and pathogenesis (Cohen d=-0.55, Δu=-0.078, t=-80.09, and P<.001) are more often mentioned by patients with chronic diseases. Patients with mild diseases were more interested in medical ethics (Cohen d=0.25, Δu 0.039, t=8.33, and P<.001), operation process (Cohen d=0.57, Δu 0.060, t=18.75, and P<.001), patient profile (Cohen d=1.19, Δu 0.132, t=39.33, and P<.001), and symptoms (Cohen d=1.91, Δu=0.274, t=62.82, and P<.001). Meanwhile, patients with serious diseases were more interested in medical competence (Cohen d=-0.99, Δu=-0.165, t=-32.58, and P<.001), medical advice and prescription (Cohen d=-0.65, Δu=-0.082, t=-21.45, and P<.001), financing (Cohen d=-0.26, Δu=-0.018, t=-8.45, and P<.001), and diagnosis and pathogenesis (Cohen d=-1.55, Δu=-0.229, t=-50.93, and P<.001). CONCLUSIONS This mixed-methods approach, integrating literature reviews, data-driven topic discovery, and human annotation, is an effective and rigorous way to develop a physician review topic taxonomy. The proposed algorithm based on Labeled-Latent Dirichlet Allocation can achieve impressive classification results for mining patients' interests. Furthermore, the mining results reveal marked differences in patients' interests across different disease types, socioeconomic development levels, and hospital levels.
Collapse
Affiliation(s)
- Jia Li
- School of Business, East China University of Science and Technology, Shanghai, China
| | - Minghui Liu
- School of Business, East China University of Science and Technology, Shanghai, China
| | - Xiaojun Li
- Xi'an Research Institute of Hi-Tech, Xi'an, China
| | - Xuan Liu
- School of Business, East China University of Science and Technology, Shanghai, China
| | - Jingfang Liu
- School of Management, Shanghai University, Shanghai, China
| |
Collapse
|
45
|
Bhattacharya M, Jurkovitz C, Shatkay H. Co-occurrence of medical conditions: Exposing patterns through probabilistic topic modeling of snomed codes. J Biomed Inform 2018; 82:31-40. [PMID: 29655947 PMCID: PMC6510486 DOI: 10.1016/j.jbi.2018.04.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2018] [Revised: 04/10/2018] [Accepted: 04/11/2018] [Indexed: 01/03/2023]
Abstract
Patients associated with multiple co-occurring health conditions often face aggravated complications and less favorable outcomes. Co-occurring conditions are especially prevalent among individuals suffering from kidney disease, an increasingly widespread condition affecting 13% of the general population in the US. This study aims to identify and characterize patterns of co-occurring medical conditions in patients employing a probabilistic framework. Specifically, we apply topic modeling in a non-traditional way to find associations across SNOMED-CT codes assigned and recorded in the EHRs of >13,000 patients diagnosed with kidney disease. Unlike most prior work on topic modeling, we apply the method to codes rather than to natural language. Moreover, we quantitatively evaluate the topics, assessing their tightness and distinctiveness, and also assess the medical validity of our results. Our experiments show that each topic is succinctly characterized by a few highly probable and unique disease codes, indicating that the topics are tight. Furthermore, inter-topic distance between each pair of topics is typically high, illustrating distinctiveness. Last, most coded conditions grouped together within a topic, are indeed reported to co-occur in the medical literature. Notably, our results uncover a few indirect associations among conditions that have hitherto not been reported as correlated in the medical literature.
Collapse
Affiliation(s)
- Moumita Bhattacharya
- Computational Biomedicine Lab, Computer and Information Sciences, University of Delaware, Newark, DE, USA.
| | | | - Hagit Shatkay
- Computational Biomedicine Lab, Computer and Information Sciences, University of Delaware, Newark, DE, USA; Center for Bioinformatics and Computational Biology, Delaware Biotechnology Inst, University of Delaware, DE, USA; School of Computing, Queen's University, Kingston, ON K7L 3N6, Canada.
| |
Collapse
|
46
|
Abstract
PURPOSE OF REVIEW The purposes of this review are to identify population characteristics of important risk factors for the development and progression of diabetic kidney disease (DKD) in the United States and to discuss barriers and opportunities to improve awareness, management, and outcomes in patients with DKD. RECENT FINDINGS The major risk factors for the development and progression of DKD include hyperglycemia, hypertension, and albuminuria. DKD disproportionately affects minorities and individuals with low educational and socioeconomic status. Barriers to effective management of DKD include the following: (a) limited patient and healthcare provider awareness of DKD, (b) lack of timely referrals of patients to a nephrologist, (c) low patient healthcare literacy, and (d) insufficient access to healthcare and health insurance. Increased patient and physician awareness of DKD has been shown to enhance patient outcomes. Multifactorial and multidisciplinary interventions targeting multiple risk factors and patient/physician education may provide better outcomes in patients with DKD.
Collapse
Affiliation(s)
- O Kenrik Duru
- Department of Medicine, Division of General Internal Medicine/Health Services Research, David Geffen School of Medicine at the University of California, Los Angeles, 10940 Wilshire Blvd, Suite 700, Los Angeles, CA, 90024, USA.
| | | | | | - Keith Norris
- Department of Medicine, Division of General Internal Medicine/Health Services Research, David Geffen School of Medicine at the University of California, Los Angeles, 10940 Wilshire Blvd, Suite 700, Los Angeles, CA, 90024, USA
| |
Collapse
|
47
|
Feller DJ, Zucker J, Yin MT, Gordon P, Elhadad N. Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment. J Acquir Immune Defic Syndr 2018; 77:160-166. [PMID: 29084046 PMCID: PMC5762388 DOI: 10.1097/qai.0000000000001580] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
OBJECTIVE Universal HIV screening programs are costly, labor intensive, and often fail to identify high-risk individuals. Automated risk assessment methods that leverage longitudinal electronic health records (EHRs) could catalyze targeted screening programs. Although social and behavioral determinants of health are typically captured in narrative documentation, previous analyses have considered only structured EHR fields. We examined whether natural language processing (NLP) would improve predictive models of HIV diagnosis. METHODS One hundred eighty-one HIV+ individuals received care at New York Presbyterian Hospital before a confirmatory HIV diagnosis and 543 HIV negative controls were selected using propensity score matching and included in the study cohort. EHR data including demographics, laboratory tests, diagnosis codes, and unstructured notes before HIV diagnosis were extracted for modeling. Three predictive algorithms were developed using machine-learning algorithms: (1) a baseline model with only structured EHR data, (2) baseline plus NLP topics, and (3) baseline plus NLP clinical keywords. RESULTS Predictive models demonstrated a range of performance with F measures of 0.59 for the baseline model, 0.63 for the baseline + NLP topic model, and 0.74 for the baseline + NLP keyword model. The baseline + NLP keyword model yielded the highest precision by including keywords including "msm," "unprotected," "hiv," and "methamphetamine," and structured EHR data indicative of additional HIV risk factors. CONCLUSIONS NLP improved the predictive performance of automated HIV risk assessment by extracting terms in clinical text indicative of high-risk behavior. Future studies should explore more advanced techniques for extracting social and behavioral determinants from clinical text.
Collapse
Affiliation(s)
- Daniel J Feller
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Jason Zucker
- Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY
| | - Michael T Yin
- Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY
| | - Peter Gordon
- Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, NY
| |
Collapse
|
48
|
Vassy JL, Ho YL, Honerlaw J, Cho K, Gaziano JM, Wilson PWF, Gagnon DR. Yield and bias in defining a cohort study baseline from electronic health record data. J Biomed Inform 2018; 78:54-59. [PMID: 29305952 PMCID: PMC5846098 DOI: 10.1016/j.jbi.2017.12.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 11/07/2017] [Accepted: 12/31/2017] [Indexed: 01/24/2023]
Abstract
AIMS Despite growing interest in using electronic health records (EHR) to create longitudinal cohort studies, the distribution and missingness of EHR data might introduce selection bias and information bias to such analyses. We aimed to examine the yield and potential for these healthcare process biases in defining a study baseline using EHR data, using the example of cholesterol and blood pressure (BP) measurements. METHODS We created a virtual cohort study of cardiovascular disease (CVD) from patients with eligible cholesterol profiles in the New England (NE) and Southeast (SE) networks of the Veterans Health Administration in the United States. Using clinical data from the EHR, we plotted the yield of patients with BP measurements within an expanding timeframe around an index date of cholesterol testing. We compared three groups: (1) patients with BP from the exact index date; (2) patients with BP not on the index date but within the network-specific 90th percentile around the index date; and (3) patients with no BP within the network-specific 90th percentile. RESULTS Among 589,361 total patients in the two networks, 146,636 (61.0%) of 240,479 patients from NE and 289,906 (83.1%) of 348,882 patients from SE had BP measurements on the index date. Ninety percent had BP measured within 11 days of the index date in NE and within 5 days of the index date in SE. Group 3 in both networks had fewer available race data, fewer comorbidities and CVD medications, and fewer health system encounters. CONCLUSIONS Requiring same-day risk factor measurement in the creation of a virtual CVD cohort study from EHR data might exclude 40% of eligible patients, but including patients with infrequent visits might introduce bias. Data visualization can inform study-specific strategies to address these challenges for the research use of EHR data.
Collapse
Affiliation(s)
- Jason L Vassy
- VA Boston Healthcare System, Boston, MA, USA; Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Yuk-Lam Ho
- VA Boston Healthcare System, Boston, MA, USA
| | | | - Kelly Cho
- VA Boston Healthcare System, Boston, MA, USA; Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - J Michael Gaziano
- VA Boston Healthcare System, Boston, MA, USA; Division of Aging, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Peter W F Wilson
- Atlanta VA Medical Center, Atlanta, GA, USA; Emory University Schools of Medicine and Public Health, Atlanta, GA, USA
| | - David R Gagnon
- VA Boston Healthcare System, Boston, MA, USA; Boston University School of Public Health, Boston, MA, USA
| |
Collapse
|
49
|
Sobrinho A, da Silva LD, Perkusich A, Pinheiro ME, Cunha P. Design and evaluation of a mobile application to assist the self-monitoring of the chronic kidney disease in developing countries. BMC Med Inform Decis Mak 2018; 18:7. [PMID: 29329530 PMCID: PMC5767024 DOI: 10.1186/s12911-018-0587-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 01/05/2018] [Indexed: 12/13/2022] Open
Abstract
Background The chronic kidney disease (CKD) is a worldwide critical problem, especially in developing countries. CKD patients usually begin their treatment in advanced stages, which requires dialysis and kidney transplantation, and consequently, affects mortality rates. This issue is faced by a mobile health (mHealth) application (app) that aims to assist the early diagnosis and self-monitoring of the disease progression. Methods A user-centered design (UCD) approach involving health professionals (nurse and nephrologists) and target users guided the development process of the app between 2012 and 2016. In-depth interviews and prototyping were conducted along with healthcare professionals throughout the requirements elicitation process. Elicited requirements were translated into a native mHealth app targeting the Android platform. Afterward, the Cohen’s Kappa coefficient statistics was applied to evaluate the agreement between the app and three nephrologists who analyzed test results collected from 60 medical records. Finally, eight users tested the app and were interviewed about usability and user perceptions. Results A mHealth app was designed to assist the CKD early diagnosis and self-monitoring considering quality attributes such as safety, effectiveness, and usability. A global Kappa value of 0.7119 showed a substantial degree of agreement between the app and three nephrologists. Results of face-to-face interviews with target users indicated a good user satisfaction. However, the task of CKD self-monitoring proved difficult because most of the users did not fully understand the meaning of specific biomarkers (e.g., creatinine). Conclusion The UCD approach provided mechanisms to develop the app based on the real needs of users. Even with no perfect Kappa degree of agreement, results are satisfactory because it aims to refer patients to nephrologists in early stages, where they may confirm the CKD diagnosis.
Collapse
Affiliation(s)
- Alvaro Sobrinho
- Federal Rural University of the Semiarid, Rodovia BR-226, Pau dos Ferros, 59900-000, Brazil.
| | - Leandro Dias da Silva
- Federal University of Alagoas, Av. Lourival Melo Mota, S/N Tabuleiro do Martins, Maceió, 57072-900, Brazil
| | - Angelo Perkusich
- Federal University of Campina Grande, R. Aprígio Veloso, 882, Universitário, Paraíba, 58429-900, Brazil
| | - Maria Eliete Pinheiro
- Federal University of Alagoas, Av. Lourival Melo Mota, S/N Tabuleiro do Martins, Maceió, 57072-900, Brazil
| | - Paulo Cunha
- Federal Institute of Alagoas, R. Prof. Domingos Correia, 1207, Ouro Preto, Alagoas, 57300-010, Brazil
| |
Collapse
|
50
|
Wu M, Ghassemi M, Feng M, Celi LA, Szolovits P, Doshi-Velez F. Understanding vasopressor intervention and weaning: risk prediction in a public heterogeneous clinical time series database. J Am Med Inform Assoc 2017; 24:488-495. [PMID: 27707820 DOI: 10.1093/jamia/ocw138] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 08/23/2016] [Indexed: 12/25/2022] Open
Abstract
Background The widespread adoption of electronic health records allows us to ask evidence-based questions about the need for and benefits of specific clinical interventions in critical-care settings across large populations. Objective We investigated the prediction of vasopressor administration and weaning in the intensive care unit. Vasopressors are commonly used to control hypotension, and changes in timing and dosage can have a large impact on patient outcomes. Materials and Methods We considered a cohort of 15 695 intensive care unit patients without orders for reduced care who were alive 30 days post-discharge. A switching-state autoregressive model (SSAM) was trained to predict the multidimensional physiological time series of patients before, during, and after vasopressor administration. The latent states from the SSAM were used as predictors of vasopressor administration and weaning. Results The unsupervised SSAM features were able to predict patient vasopressor administration and successful patient weaning. Features derived from the SSAM achieved areas under the receiver operating curve of 0.92, 0.88, and 0.71 for predicting ungapped vasopressor administration, gapped vasopressor administration, and vasopressor weaning, respectively. We also demonstrated many cases where our model predicted weaning well in advance of a successful wean. Conclusion Models that used SSAM features increased performance on both predictive tasks. These improvements may reflect an underlying, and ultimately predictive, latent state detectable from the physiological time series.
Collapse
Affiliation(s)
- Mike Wu
- Department of Computer Science, Yale University, New Haven, CT, USA
| | - Marzyeh Ghassemi
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Mengling Feng
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Leo A Celi
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Peter Szolovits
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Finale Doshi-Velez
- Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| |
Collapse
|