1
|
Schulte T, Wurz T, Groene O, Bohnet-Joschko S. Big Data Analytics to Reduce Preventable Hospitalizations-Using Real-World Data to Predict Ambulatory Care-Sensitive Conditions. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:4693. [PMID: 36981600 PMCID: PMC10049041 DOI: 10.3390/ijerph20064693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 03/01/2023] [Accepted: 03/04/2023] [Indexed: 06/18/2023]
Abstract
The purpose of this study was to develop a prediction model to identify individuals and populations with a high risk of being hospitalized due to an ambulatory care-sensitive condition who might benefit from preventative actions or tailored treatment options to avoid subsequent hospital admission. A rate of 4.8% of all individuals observed had an ambulatory care-sensitive hospitalization in 2019 and 6389.3 hospital cases per 100,000 individuals could be observed. Based on real-world claims data, the predictive performance was compared between a machine learning model (Random Forest) and a statistical logistic regression model. One result was that both models achieve a generally comparable performance with c-values above 0.75, whereas the Random Forest model reached slightly higher c-values. The prediction models developed in this study reached c-values comparable to existing study results of prediction models for (avoidable) hospitalization from the literature. The prediction models were designed in such a way that they can support integrated care or public and population health interventions with little effort with an additional risk assessment tool in the case of availability of claims data. For the regions analyzed, the logistic regression revealed that switching to a higher age class or to a higher level of long-term care and unit from prior hospitalizations (all-cause and due to an ambulatory care-sensitive condition) increases the odds of having an ambulatory care-sensitive hospitalization in the upcoming year. This is also true for patients with prior diagnoses from the diagnosis groups of maternal disorders related to pregnancy, mental disorders due to alcohol/opioids, alcoholic liver disease and certain diseases of the circulatory system. Further model refinement activities and the integration of additional data, such as behavioral, social or environmental data would improve both model performance and the individual risk scores. The implementation of risk scores identifying populations potentially benefitting from public health and population health activities would be the next step to enable an evaluation of whether ambulatory care-sensitive hospitalizations can be prevented.
Collapse
Affiliation(s)
- Timo Schulte
- Faculty of Management, Economics and Society, Witten/Herdecke University, 58455 Witten, Germany
- Faculty of Health, Witten/Herdecke University, 58455 Witten, Germany
- Department of Business Analytics, Clinics of Maerkischer Kreis, 58515 Luedenscheid, Germany
| | - Tillmann Wurz
- Department of Project and Change Management, University Clinic Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Oliver Groene
- Faculty of Management, Economics and Society, Witten/Herdecke University, 58455 Witten, Germany
- Department of Research & Innovation, OptiMedis AG, 20095 Hamburg, Germany
| | - Sabine Bohnet-Joschko
- Faculty of Management, Economics and Society, Witten/Herdecke University, 58455 Witten, Germany
- Faculty of Health, Witten/Herdecke University, 58455 Witten, Germany
| |
Collapse
|
2
|
Review on Machine Learning Techniques for Medical Data Classification and Disease Diagnosis. REGENERATIVE ENGINEERING AND TRANSLATIONAL MEDICINE 2022. [DOI: 10.1007/s40883-022-00273-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
3
|
Malakar S, Roy SD, Das S, Sen S, Velásquez JD, Sarkar R. Computer Based Diagnosis of Some Chronic Diseases: A Medical Journey of the Last Two Decades. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2022; 29:5525-5567. [PMID: 35729963 PMCID: PMC9199478 DOI: 10.1007/s11831-022-09776-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 05/22/2022] [Indexed: 06/15/2023]
Abstract
Disease prediction from diagnostic reports and pathological images using artificial intelligence (AI) and machine learning (ML) is one of the fastest emerging applications in recent days. Researchers are striving to achieve near-perfect results using advanced hardware technologies in amalgamation with AI and ML based approaches. As a result, a large number of AI and ML based methods are found in the literature. A systematic survey describing the state-of-the-art disease prediction methods, specifically chronic disease prediction algorithms, will provide a clear idea about the recent models developed in this field. This will also help the researchers to identify the research gaps present there. To this end, this paper looks over the approaches in the literature designed for predicting chronic diseases like Breast Cancer, Lung Cancer, Leukemia, Heart Disease, Diabetes, Chronic Kidney Disease and Liver Disease. The advantages and disadvantages of various techniques are thoroughly explained. This paper also presents a detailed performance comparison of different methods. Finally, it concludes the survey by highlighting some future research directions in this field that can be addressed through the forthcoming research attempts.
Collapse
Affiliation(s)
- Samir Malakar
- Department of Computer Science, Asutosh College, Kolkata, India
| | - Soumya Deep Roy
- Department of Metallurgical and Material Engineering, Jadavpur University, Kolkata, India
| | - Soham Das
- Department of Metallurgical and Material Engineering, Jadavpur University, Kolkata, India
| | - Swaraj Sen
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Juan D. Velásquez
- Departament of Industrial Engineering, University of Chile, Santiago, Chile
- Instituto Sistemas Complejos de Ingeniería (ISCI), Santiago, Chile
| | - Ram Sarkar
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| |
Collapse
|
4
|
Yang R, Zhu D, Howard LE, De Hoedt A, Schroeck FR, Klaassen Z, Freedland SJ, Williams SB. Context-Based Identification of Muscle Invasion Status in Patients With Bladder Cancer Using Natural Language Processing. JCO Clin Cancer Inform 2022; 6:e2100097. [PMID: 35073149 DOI: 10.1200/cci.21.00097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Mortality from bladder cancer (BC) increases exponentially once it invades the muscle, with inherent challenges delineating at the population level. We sought to develop and validate a natural language processing (NLP) model for automatically identifying patients with muscle-invasive bladder cancer (MIBC). METHODS All patients with a Current Procedural Terminology code for transurethral resection of bladder tumor (TURBT; n = 76,060) were selected from the Department of Veterans Affairs (VA) database. A sample of 600 patients (with 2,337 full-text notes) who had TURBT and confirmed pathology results were selected for NLP model development and validation. The NLP performance was assessed by calculating the sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and overall accuracy at the individual note and patient levels. RESULTS In the validation cohort, the NLP model had average overall accuracies of 94% and 96% at the note and patient levels. Specifically, the F1 score and overall accuracy for predicting muscle invasion at the patient level were 0.87% and 96%, respectively. The model classified nonmuscle-invasive bladder cancer (NMIBC) with overall accuracies of 90% and 93% at the note and patient levels. When applying the model to 71,200 patients VA-wide, the model classified 13,642 (19%) as having MIBC and 47,595 (66%) as NMIBC and was able to identify invasion status for 96% of patients with TURBT at the population level. Inherent limitations include a relatively small training set, given the size of the VA population. CONCLUSION This NLP model, with high accuracy, may be a practical tool for efficiently identifying BC invasion status and aid in population-based BC research.
Collapse
Affiliation(s)
- Ruixin Yang
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Di Zhu
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Lauren E Howard
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC.,Duke Cancer Institute, Duke University School of Medicine, Durham, NC
| | - Amanda De Hoedt
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Florian R Schroeck
- White River Junction VA Medical Center, White River Junction, VT.,The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH
| | - Zachary Klaassen
- Division of Urology, Medical College of Georgia at Augusta University, Augusta, GA.,Georgia Cancer Center, Augusta, GA
| | - Stephen J Freedland
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC.,Division of Urology, Department of Surgery, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA.,Center for Integrated Research in Cancer and Lifestyle, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA
| | - Stephen B Williams
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC.,Department of Surgery, Division of Urology, The University of Texas Medical Branch at Galveston, Galveston, TX
| |
Collapse
|
5
|
Abdelkader W, Navarro T, Parrish R, Cotoi C, Germini F, Iorio A, Haynes RB, Lokker C. Machine Learning Approaches to Retrieve High-Quality, Clinically Relevant Evidence From the Biomedical Literature: Systematic Review. JMIR Med Inform 2021; 9:e30401. [PMID: 34499041 PMCID: PMC8461527 DOI: 10.2196/30401] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/15/2021] [Accepted: 07/25/2021] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND The rapid growth of the biomedical literature makes identifying strong evidence a time-consuming task. Applying machine learning to the process could be a viable solution that limits effort while maintaining accuracy. OBJECTIVE The goal of the research was to summarize the nature and comparative performance of machine learning approaches that have been applied to retrieve high-quality evidence for clinical consideration from the biomedical literature. METHODS We conducted a systematic review of studies that applied machine learning techniques to identify high-quality clinical articles in the biomedical literature. Multiple databases were searched to July 2020. Extracted data focused on the applied machine learning model, steps in the development of the models, and model performance. RESULTS From 3918 retrieved studies, 10 met our inclusion criteria. All followed a supervised machine learning approach and applied, from a limited range of options, a high-quality standard for the training of their model. The results show that machine learning can achieve a sensitivity of 95% while maintaining a high precision of 86%. CONCLUSIONS Machine learning approaches perform well in retrieving high-quality clinical studies. Performance may improve by applying more sophisticated approaches such as active learning and unsupervised machine learning approaches.
Collapse
Affiliation(s)
- Wael Abdelkader
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Tamara Navarro
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Rick Parrish
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Chris Cotoi
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Federico Germini
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
- Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Alfonso Iorio
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
- Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - R Brian Haynes
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
- Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Cynthia Lokker
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
6
|
Cui J, Zhu H, Deng H, Chen Z, Liu D. FeARH: Federated machine learning with anonymous random hybridization on electronic medical records. J Biomed Inform 2021; 117:103735. [PMID: 33711540 DOI: 10.1016/j.jbi.2021.103735] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 11/09/2020] [Accepted: 03/01/2021] [Indexed: 10/21/2022]
Abstract
Electrical medical records are restricted and difficult to centralize for machine learning model training due to privacy and regulatory issues. One solution is to train models in a distributed manner that involves many parties in the process. However, sometimes certain parties are not trustable, and in this project, we aim to propose an alternative method to traditional federated learning with central analyzer in order to conduct training in a situation without a trustable central analyzer. The proposed algorithm is called "federated machine learning with anonymous random hybridization (abbreviated as 'FeARH')", using mainly hybridization algorithm to degenerate the integration of connections between medical record data and models' parameters by adding randomization into the parameter sets shared to other parties. Based on our experiment, our new algorithm has similar AUCROC and AUCPR results compared with machine learning in a centralized manner and original federated machine learning.
Collapse
Affiliation(s)
- Jianfei Cui
- Viterbi School of Engineering, University of Southern California, Los Angeles, CA 90007, United States
| | - He Zhu
- The Hong Kong Polytechnic University, Hong Kong
| | - Hao Deng
- Harvard Medical School, Boston, MA 02115, United States; Massachusetts General Hospital, Boston, MA 02115, United States
| | - Ziwei Chen
- Beijing Jiaotong University, Beijing, China.
| | - Dianbo Liu
- Harvard Medical School, Boston, MA 02115, United States; Massachusetts General Hospital, Boston, MA 02115, United States; The Broad institute of MIT and Harvard, Cambridge, MA 02115, United States; Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, United States.
| |
Collapse
|
7
|
Predicting body mass index and isometric leg strength using soft tissue distributions from computed tomography scans. HEALTH AND TECHNOLOGY 2020. [DOI: 10.1007/s12553-020-00498-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
8
|
Lapadula P, Mecca G, Santoro D, Solimando L, Veltri E. Greg, ML – Machine Learning for Healthcare at a Scale. HEALTH AND TECHNOLOGY 2020. [DOI: 10.1007/s12553-020-00468-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
AbstractThis paper introduces the Greg, ML platform, a machine-learning engine and toolset conceived to generate automatic diagnostic suggestions based on patient profiles. Greg, ML departs from many other experiences in machine learning for healthcare in the fact that it was designed to handle a large number of different diagnoses, in the order of the hundreds. We discuss the architecture that stands at the core of Greg, ML, designed to handle the complex challenges posed by this ambitious goal, and confirm its effectiveness with experimental results based on the working prototype we have developed. Finally, we discuss challenges and opportunities related to the use of this kind of tools in medicine, and some important lessons learned while developing the tool. In this respect, we underline that Greg, ML should be conceived primarily as a support for expert doctors in their diagnostic decisions, and can hardly replace humans in their judgment.
Collapse
|
9
|
Recenti M, Ricciardi C, Edmunds K, Gislason MK, Gargiulo P. Machine learning predictive system based upon radiodensitometric distributions from mid-thigh CT images. Eur J Transl Myol 2020; 30:8892. [PMID: 32499893 PMCID: PMC7254455 DOI: 10.4081/ejtm.2019.8892] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 03/03/2020] [Indexed: 01/23/2023] Open
Abstract
The nonlinear trimodal regression analysis (NTRA) method based on radiodensitometric CT images distributions was developed for the quantitative characterization of soft tissue changes according to the lower extremity function of elderly subjects. In this regard, the NTRA method defines 11 subject-specific soft tissue parameters and has illustrated high sensitivity to changes in skeletal muscle form and function. The present work further explores the use of these 11 NTRA parameters in the construction of a machine learning (ML) system to predict body mass index and isometric leg strength using tree-based regression algorithms. Results obtained from these models demonstrate that when using an ML approach, these soft tissue features have a significant predictive value for these physiological parameters. These results further support the use of NTRA-based ML predictive assessment and support the future investigation of other physiological parameters and comorbidities.
Collapse
Affiliation(s)
- Marco Recenti
- Institute for Biomedical and Neural Engineering, Reykjavík University, Reykjavík, Iceland
| | - Carlo Ricciardi
- Institute for Biomedical and Neural Engineering, Reykjavík University, Reykjavík, Iceland.,Department of Advanced Biomedical Sciences, University Hospital of Naples 'Federico II', Naples, Italy
| | - Kyle Edmunds
- Institute for Biomedical and Neural Engineering, Reykjavík University, Reykjavík, Iceland
| | - Magnus K Gislason
- Institute for Biomedical and Neural Engineering, Reykjavík University, Reykjavík, Iceland
| | - Paolo Gargiulo
- Institute for Biomedical and Neural Engineering, Reykjavík University, Reykjavík, Iceland.,Department of Science, Landspítali, Reykjavík, Iceland
| |
Collapse
|
10
|
Recenti M, Ricciardi C, Gìslason M, Edmunds K, Carraro U, Gargiulo P. Machine Learning Algorithms Predict Body Mass Index Using Nonlinear Trimodal Regression Analysis from Computed Tomography Scans. IFMBE PROCEEDINGS 2020. [DOI: 10.1007/978-3-030-31635-8_100] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
11
|
Shou H, Hsu JY, Xie D, Yang W, Roy J, Anderson AH, Landis JR, Feldman HI, Parsa A, Jepson C. Analytic Considerations for Repeated Measures of eGFR in Cohort Studies of CKD. Clin J Am Soc Nephrol 2017; 12:1357-1365. [PMID: 28751576 PMCID: PMC5544518 DOI: 10.2215/cjn.11311116] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Repeated measures of various biomarkers provide opportunities for us to enhance understanding of many important clinical aspects of CKD, including patterns of disease progression, rates of kidney function decline under different risk factors, and the degree of heterogeneity in disease manifestations across patients. However, because of unique features, such as correlations across visits and time dependency, these data must be appropriately handled using longitudinal data analysis methods. We provide a general overview of the characteristics of data collected in cohort studies and compare appropriate statistical methods for the analysis of longitudinal exposures and outcomes. We use examples from the Chronic Renal Insufficiency Cohort Study to illustrate these methods. More specifically, we model longitudinal kidney outcomes over annual clinical visits and assess the association with both baseline and longitudinal risk factors.
Collapse
Affiliation(s)
- Haochang Shou
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jesse Y. Hsu
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Dawei Xie
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Wei Yang
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jason Roy
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Amanda H. Anderson
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - J. Richard Landis
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Harold I. Feldman
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Afshin Parsa
- Department of Medicine, Division of Nephrology, University of Maryland School of Medicine, Baltimore, Maryland; and
- Department of Medicine, Baltimore Veterans Affairs Medical Center, Baltimore, Maryland
| | - Christopher Jepson
- Department of Biostatistics, Epidemiology and Informatics and
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
12
|
Introduction to MAchine Learning & Knowledge Extraction (MAKE). MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2017. [DOI: 10.3390/make1010001] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The grand goal of Machine Learning is to develop software which can learn from previous experience—similar to how we humans do. Ultimately, to reach a level of usable intelligence, we need (1) to learn from prior data, (2) to extract knowledge, (3) to generalize—i.e., guessing where probability function mass/density concentrates, (4) to fight the curse of dimensionality, and (5) to disentangle underlying explanatory factors of the data—i.e., to make sense of the data in the context of an application domain. To address these challenges and to ensure successful machine learning applications in various domains an integrated machine learning approach is important. This requires a concerted international effort without boundaries, supporting collaborative, cross-domain, interdisciplinary and transdisciplinary work of experts from seven sections, ranging from data pre-processing to data visualization, i.e., to map results found in arbitrarily high dimensional spaces into the lower dimensions to make it accessible, usable and useful to the end user. An integrated machine learning approach needs also to consider issues of privacy, data protection, safety, security, user acceptance and social implications. This paper is the inaugural introduction to the new journal of MAchine Learning & Knowledge Extraction (MAKE). The goal is to provide an incomplete, personally biased, but consistent introduction into the concepts of MAKE and a brief overview of some selected topics to stimulate future research in the international research community.
Collapse
|
13
|
Grundmeier RW, Masino AJ, Casper TC, Dean JM, Bell J, Enriquez R, Deakyne S, Chamberlain JM, Alpern ER. Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement. Appl Clin Inform 2016; 7:1051-1068. [PMID: 27826610 DOI: 10.4338/aci-2016-08-ra-0129] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 09/26/2016] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Important information to support healthcare quality improvement is often recorded in free text documents such as radiology reports. Natural language processing (NLP) methods may help extract this information, but these methods have rarely been applied outside the research laboratories where they were developed. OBJECTIVE To implement and validate NLP tools to identify long bone fractures for pediatric emergency medicine quality improvement. METHODS Using freely available statistical software packages, we implemented NLP methods to identify long bone fractures from radiology reports. A sample of 1,000 radiology reports was used to construct three candidate classification models. A test set of 500 reports was used to validate the model performance. Blinded manual review of radiology reports by two independent physicians provided the reference standard. Each radiology report was segmented and word stem and bigram features were constructed. Common English "stop words" and rare features were excluded. We used 10-fold cross-validation to select optimal configuration parameters for each model. Accuracy, recall, precision and the F1 score were calculated. The final model was compared to the use of diagnosis codes for the identification of patients with long bone fractures. RESULTS There were 329 unique word stems and 344 bigrams in the training documents. A support vector machine classifier with Gaussian kernel performed best on the test set with accuracy=0.958, recall=0.969, precision=0.940, and F1 score=0.954. Optimal parameters for this model were cost=4 and gamma=0.005. The three classification models that we tested all performed better than diagnosis codes in terms of accuracy, precision, and F1 score (diagnosis code accuracy=0.932, recall=0.960, precision=0.896, and F1 score=0.927). CONCLUSIONS NLP methods using a corpus of 1,000 training documents accurately identified acute long bone fractures from radiology reports. Strategic use of straightforward NLP methods, implemented with freely available software, offers quality improvement teams new opportunities to extract information from narrative documents.
Collapse
Affiliation(s)
- Robert W Grundmeier
- Robert W. Grundmeier, MD, The Children's Hospital of Philadelphia, 3535 Market Street, Suite 1024, Philadelphia, PA 19104, Phone: 215-590-2857,
| | | | | | | | | | | | | | | | | | | |
Collapse
|